multiple statistical models: Topics by Science.gov

Sample records for multiple statistical models

Multiple commodities in statistical microeconomics: Model and market

NASA Astrophysics Data System (ADS)

Baaquie, Belal E.; Yu, Miao; Du, Xin

2016-11-01

A statistical generalization of microeconomics has been made in Baaquie (2013). In Baaquie et al. (2015), the market behavior of single commodities was analyzed and it was shown that market data provides strong support for the statistical microeconomic description of commodity prices. The case of multiple commodities is studied and a parsimonious generalization of the single commodity model is made for the multiple commodities case. Market data shows that the generalization can accurately model the simultaneous correlation functions of up to four commodities. To accurately model five or more commodities, further terms have to be included in the model. This study shows that the statistical microeconomics approach is a comprehensive and complete formulation of microeconomics, and which is independent to the mainstream formulation of microeconomics.
A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

NASA Astrophysics Data System (ADS)

Xu, Lei; Chen, Nengcheng; Zhang, Xiang

2018-02-01

Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.
Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

USGS Publications Warehouse

Lee, L.; Helsel, D.

2005-01-01

Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Ultrasound image filtering using the mutiplicative model

NASA Astrophysics Data System (ADS)

Navarrete, Hugo; Frery, Alejandro C.; Sanchez, Fermin; Anto, Joan

2002-04-01

Ultrasound images, as a special case of coherent images, are normally corrupted with multiplicative noise i.e. speckle noise. Speckle noise reduction is a difficult task due to its multiplicative nature, but good statistical models of speckle formation are useful to design adaptive speckle reduction filters. In this article a new statistical model, emerging from the Multiplicative Model framework, is presented and compared to previous models (Rayleigh, Rice and K laws). It is shown that the proposed model gives the best performance when modeling the statistics of ultrasound images. Finally, the parameters of the model can be used to quantify the extent of speckle formation; this quantification is applied to adaptive speckle reduction filter design. The effectiveness of the filter is demonstrated on typical in-vivo log-compressed B-scan images obtained by a clinical ultrasound system.
STATISTICAL METHODOLOGY FOR THE SIMULTANEOUS ANALYSIS OF MULTIPLE TYPES OF OUTCOMES IN NONLINEAR THRESHOLD MODELS.

EPA Science Inventory

Multiple outcomes are often measured on each experimental unit in toxicology experiments. These multiple observations typically imply the existence of correlation between endpoints, and a statistical analysis that incorporates it may result in improved inference. When both disc...
RooStatsCms: A tool for analysis modelling, combination and statistical studies

NASA Astrophysics Data System (ADS)

Piparo, D.; Schott, G.; Quast, G.

2010-04-01

RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.
MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS

EPA Science Inventory

Microarray Data Analysis Using Multiple Statistical Models

Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...
Time Series Model Identification by Estimating Information.

DTIC Science & Technology

1982-11-01

principle, Applications of Statistics, P. R. Krishnaiah , ed., North-Holland: Amsterdam, 27-41. Anderson, T. W. (1971). The Statistical Analysis of Time Series...E. (1969). Multiple Time Series Modeling, Multivariate Analysis II, edited by P. Krishnaiah , Academic Press: New York, 389-409. Parzen, E. (1981...Newton, H. J. (1980). Multiple Time Series Modeling, II Multivariate Analysis - V, edited by P. Krishnaiah , North Holland: Amsterdam, 181-197. Shibata, R
Bayesian models based on test statistics for multiple hypothesis testing problems.

PubMed

Ji, Yuan; Lu, Yiling; Mills, Gordon B

2008-04-01

We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Monte Carlo based statistical power analysis for mediation models: methods and software.

PubMed

Zhang, Zhiyong

2014-12-01

The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.
The MAX Statistic is Less Powerful for Genome Wide Association Studies Under Most Alternative Hypotheses.

PubMed

Shifflett, Benjamin; Huang, Rong; Edland, Steven D

2017-01-01

Genotypic association studies are prone to inflated type I error rates if multiple hypothesis testing is performed, e.g., sequentially testing for recessive, multiplicative, and dominant risk. Alternatives to multiple hypothesis testing include the model independent genotypic χ 2 test, the efficiency robust MAX statistic, which corrects for multiple comparisons but with some loss of power, or a single Armitage test for multiplicative trend, which has optimal power when the multiplicative model holds but with some loss of power when dominant or recessive models underlie the genetic association. We used Monte Carlo simulations to describe the relative performance of these three approaches under a range of scenarios. All three approaches maintained their nominal type I error rates. The genotypic χ 2 and MAX statistics were more powerful when testing a strictly recessive genetic effect or when testing a dominant effect when the allele frequency was high. The Armitage test for multiplicative trend was most powerful for the broad range of scenarios where heterozygote risk is intermediate between recessive and dominant risk. Moreover, all tests had limited power to detect recessive genetic risk unless the sample size was large, and conversely all tests were relatively well powered to detect dominant risk. Taken together, these results suggest the general utility of the multiplicative trend test when the underlying genetic model is unknown.
Statistical methods and neural network approaches for classification of data from multiple sources

NASA Technical Reports Server (NTRS)

Benediktsson, Jon Atli; Swain, Philip H.

1990-01-01

Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

PubMed

Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D

2017-01-01

If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
Multiplicity Control in Structural Equation Modeling

ERIC Educational Resources Information Center

Cribbie, Robert A.

2007-01-01

Researchers conducting structural equation modeling analyses rarely, if ever, control for the inflated probability of Type I errors when evaluating the statistical significance of multiple parameters in a model. In this study, the Type I error control, power and true model rates of famsilywise and false discovery rate controlling procedures were…
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

PubMed

Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

2015-05-01

Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

PubMed

Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

2016-12-01

In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Multiplicative point process as a model of trading activity

NASA Astrophysics Data System (ADS)

Gontis, V.; Kaulakys, B.

2004-11-01

Signals consisting of a sequence of pulses show that inherent origin of the 1/ f noise is a Brownian fluctuation of the average interevent time between subsequent pulses of the pulse sequence. In this paper, we generalize the model of interevent time to reproduce a variety of self-affine time series exhibiting power spectral density S( f) scaling as a power of the frequency f. Furthermore, we analyze the relation between the power-law correlations and the origin of the power-law probability distribution of the signal intensity. We introduce a stochastic multiplicative model for the time intervals between point events and analyze the statistical properties of the signal analytically and numerically. Such model system exhibits power-law spectral density S( f)∼1/ fβ for various values of β, including β= {1}/{2}, 1 and {3}/{2}. Explicit expressions for the power spectra in the low-frequency limit and for the distribution density of the interevent time are obtained. The counting statistics of the events is analyzed analytically and numerically, as well. The specific interest of our analysis is related with the financial markets, where long-range correlations of price fluctuations largely depend on the number of transactions. We analyze the spectral density and counting statistics of the number of transactions. The model reproduces spectral properties of the real markets and explains the mechanism of power-law distribution of trading activity. The study provides evidence that the statistical properties of the financial markets are enclosed in the statistics of the time interval between trades. A multiplicative point process serves as a consistent model generating this statistics.
Exploring Contextual Models in Chemical Patent Search

NASA Astrophysics Data System (ADS)

Urbain, Jay; Frieder, Ophir

We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections.
Statistical field theory of futures commodity prices

NASA Astrophysics Data System (ADS)

Baaquie, Belal E.; Yu, Miao

2018-02-01

The statistical theory of commodity prices has been formulated by Baaquie (2013). Further empirical studies of single (Baaquie et al., 2015) and multiple commodity prices (Baaquie et al., 2016) have provided strong evidence in support the primary assumptions of the statistical formulation. In this paper, the model for spot prices (Baaquie, 2013) is extended to model futures commodity prices using a statistical field theory of futures commodity prices. The futures prices are modeled as a two dimensional statistical field and a nonlinear Lagrangian is postulated. Empirical studies provide clear evidence in support of the model, with many nontrivial features of the model finding unexpected support from market data.
Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

PubMed Central

Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

2015-01-01

Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600

INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

EPA Science Inventory

Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Analysis of Multiple Contingency Tables by Exact Conditional Tests for Zero Partial Association.

ERIC Educational Resources Information Center

Kreiner, Svend

The tests for zero partial association in a multiple contingency table have gained new importance with the introduction of graphical models. It is shown how these may be performed as exact conditional tests, using as test criteria either the ordinary likelihood ratio, the standard x squared statistic, or any other appropriate statistics. A…
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

PubMed

Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

2013-08-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

PubMed Central

Kim, Yoonsang; Emery, Sherry

2013-01-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

EPA Science Inventory

As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Risk Prediction Models for Other Cancers or Multiple Sites

Cancer.gov

Developing statistical models that estimate the probability of developing other multiple cancers over a defined period of time will help clinicians identify individuals at higher risk of specific cancers, allowing for earlier or more frequent screening and counseling of behavioral changes to decrease risk.
The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia

NASA Astrophysics Data System (ADS)

Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.

2013-10-01

Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.
Cure Models as a Useful Statistical Tool for Analyzing Survival

PubMed Central

Othus, Megan; Barlogie, Bart; LeBlanc, Michael L.; Crowley, John J.

2013-01-01

Cure models are a popular topic within statistical literature but are not as widely known in the clinical literature. Many patients with cancer can be long-term survivors of their disease, and cure models can be a useful tool to analyze and describe cancer survival data. The goal of this article is to review what a cure model is, explain when cure models can be used, and use cure models to describe multiple myeloma survival trends. Multiple myeloma is generally considered an incurable disease, and this article shows that by using cure models, rather than the standard Cox proportional hazards model, we can evaluate whether there is evidence that therapies at the University of Arkansas for Medical Sciences induce a proportion of patients to be long-term survivors. PMID:22675175
Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

USGS Publications Warehouse

Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

2013-01-01

Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.
Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

PubMed

Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

2016-02-01

Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.
Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

PubMed Central

Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

2015-01-01

Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979
MULTIVARIATE STATISTICAL MODELS FOR EFFECTS OF PM AND COPOLLUTANTS IN A DAILY TIME SERIES EPIDEMIOLOGY STUDY

EPA Science Inventory

Most analyses of daily time series epidemiology data relate mortality or morbidity counts to PM and other air pollutants by means of single-outcome regression models using multiple predictors, without taking into account the complex statistical structure of the predictor variable...
Conducting Multilevel Analyses in Medical Education

ERIC Educational Resources Information Center

Zyphur, Michael J.; Kaplan, Seth A.; Islam, Gazi; Barsky, Adam P.; Franklin, Michael S.

2008-01-01

A significant body of education literature has begun using multilevel statistical models to examine data that reside at multiple levels of analysis. In order to provide a primer for medical education researchers, the current work gives a brief overview of some issues associated with multilevel statistical modeling. To provide an example of this…
Segmentation of prostate boundaries from ultrasound images using statistical shape model.

PubMed

Shen, Dinggang; Zhan, Yiqiang; Davatzikos, Christos

2003-04-01

This paper presents a statistical shape model for the automatic prostate segmentation in transrectal ultrasound images. A Gabor filter bank is first used to characterize the prostate boundaries in ultrasound images in both multiple scales and multiple orientations. The Gabor features are further reconstructed to be invariant to the rotation of the ultrasound probe and incorporated in the prostate model as image attributes for guiding the deformable segmentation. A hierarchical deformation strategy is then employed, in which the model adaptively focuses on the similarity of different Gabor features at different deformation stages using a multiresolution technique, i.e., coarse features first and fine features later. A number of successful experiments validate the algorithm.
Multiple-solution problems in a statistics classroom: an example

NASA Astrophysics Data System (ADS)

Chu, Chi Wing; Chan, Kevin L. T.; Chan, Wai-Sum; Kwong, Koon-Shing

2017-11-01

The mathematics education literature shows that encouraging students to develop multiple solutions for given problems has a positive effect on students' understanding and creativity. In this paper, we present an example of multiple-solution problems in statistics involving a set of non-traditional dice. In particular, we consider the exact probability mass distribution for the sum of face values. Four different ways of solving the problem are discussed. The solutions span various basic concepts in different mathematical disciplines (sample space in probability theory, the probability generating function in statistics, integer partition in basic combinatorics and individual risk model in actuarial science) and thus promotes upper undergraduate students' awareness of knowledge connections between their courses. All solutions of the example are implemented using the R statistical software package.
Statistical reconstruction for cosmic ray muon tomography.

PubMed

Schultz, Larry J; Blanpied, Gary S; Borozdin, Konstantin N; Fraser, Andrew M; Hengartner, Nicolas W; Klimenko, Alexei V; Morris, Christopher L; Orum, Chris; Sossong, Michael J

2007-08-01

Highly penetrating cosmic ray muons constantly shower the earth at a rate of about 1 muon per cm2 per minute. We have developed a technique which exploits the multiple Coulomb scattering of these particles to perform nondestructive inspection without the use of artificial radiation. In prior work [1]-[3], we have described heuristic methods for processing muon data to create reconstructed images. In this paper, we present a maximum likelihood/expectation maximization tomographic reconstruction algorithm designed for the technique. This algorithm borrows much from techniques used in medical imaging, particularly emission tomography, but the statistics of muon scattering dictates differences. We describe the statistical model for multiple scattering, derive the reconstruction algorithm, and present simulated examples. We also propose methods to improve the robustness of the algorithm to experimental errors and events departing from the statistical model.
Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

PubMed Central

Liu, Zhonghua; Lin, Xihong

2017-01-01

Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391
Multiple phenotype association tests using summary statistics in genome-wide association studies.

PubMed

Liu, Zhonghua; Lin, Xihong

2018-03-01

We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes

NASA Astrophysics Data System (ADS)

Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.

2013-10-01

In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.
The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

ERIC Educational Resources Information Center

Fanning, Fred; Newman, Isadore

Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

Assistive Technologies for Second-Year Statistics Students Who Are Blind

ERIC Educational Resources Information Center

Erhardt, Robert J.; Shuman, Michael P.

2015-01-01

At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…
[Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

PubMed

Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

2011-01-01

The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

NASA Astrophysics Data System (ADS)

Guadagnini, A.; Riva, M.; Dell'Oca, A.

2017-12-01

We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.
Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

ERIC Educational Resources Information Center

Li, Spencer D.

2011-01-01

Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
A Comparison of Latent Growth Models for Constructs Measured by Multiple Items

ERIC Educational Resources Information Center

Leite, Walter L.

2007-01-01

Univariate latent growth modeling (LGM) of composites of multiple items (e.g., item means or sums) has been frequently used to analyze the growth of latent constructs. This study evaluated whether LGM of composites yields unbiased parameter estimates, standard errors, chi-square statistics, and adequate fit indexes. Furthermore, LGM was compared…
Conjoint Analysis: A Study of the Effects of Using Person Variables.

ERIC Educational Resources Information Center

Fraas, John W.; Newman, Isadore

Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Society of Thoracic Surgeons 2008 cardiac risk models predict in-hospital mortality of heart valve surgery in a Chinese population: a multicenter study.

PubMed

Wang, Lv; Lu, Fang-Lin; Wang, Chong; Tan, Meng-Wei; Xu, Zhi-yun

2014-12-01

The Society of Thoracic Surgeons 2008 cardiac surgery risk models have been developed for heart valve surgery with and without coronary artery bypass grafting. The aim of our study was to evaluate the performance of Society of Thoracic Surgeons 2008 cardiac risk models in Chinese patients undergoing single valve surgery and the predicted mortality rates of those undergoing multiple valve surgery derived from the Society of Thoracic Surgeons 2008 risk models. A total of 12,170 patients underwent heart valve surgery from January 2008 to December 2011. Combined congenital heart surgery and aortal surgery cases were excluded. A relatively small number of valve surgery combinations were excluded. The final research population included the following isolated heart valve surgery types: aortic valve replacement, mitral valve replacement, and mitral valve repair. The following combined valve surgery types were included: mitral valve replacement plus tricuspid valve repair, mitral valve replacement plus aortic valve replacement, and mitral valve replacement plus aortic valve replacement and tricuspid valve repair. Evaluation was performed by using the Hosmer-Lemeshow test and C-statistics. Data from 9846 patients were analyzed. The Society of Thoracic Surgeons 2008 cardiac risk models showed reasonable discrimination and poor calibration (C-statistic, 0.712; P = .00006 in Hosmer-Lemeshow test). Society of Thoracic Surgeons 2008 models had better discrimination (C-statistic, 0.734) and calibration (P = .5805) in patients undergoing isolated valve surgery than in patients undergoing multiple valve surgery (C-statistic, 0.694; P = .00002 in Hosmer-Lemeshow test). Estimates derived from the Society of Thoracic Surgeons 2008 models exceeded the mortality rates of multiple valve surgery (observed/expected ratios of 1.44 for multiple valve surgery and 1.17 for single valve surgery). The Society of Thoracic Surgeons 2008 cardiac surgery risk models performed well when predicting the mortality for Chinese patients undergoing valve surgery. The Society of Thoracic Surgeons 2008 models were suitable for single valve surgery in a Chinese population; estimates of mortality for multiple valve surgery derived from the Society of Thoracic Surgeons 2008 models were less accurate. Copyright © 2014 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
The Research of Multiple Attenuation Based on Feedback Iteration and Independent Component Analysis

NASA Astrophysics Data System (ADS)

Xu, X.; Tong, S.; Wang, L.

2017-12-01

How to solve the problem of multiple suppression is a difficult problem in seismic data processing. The traditional technology for multiple attenuation is based on the principle of the minimum output energy of the seismic signal, this criterion is based on the second order statistics, and it can't achieve the multiple attenuation when the primaries and multiples are non-orthogonal. In order to solve the above problems, we combine the feedback iteration method based on the wave equation and the improved independent component analysis (ICA) based on high order statistics to suppress the multiple waves. We first use iterative feedback method to predict the free surface multiples of each order. Then, in order to predict multiples from real multiple in amplitude and phase, we design an expanded pseudo multi-channel matching filtering method to get a more accurate matching multiple result. Finally, we present the improved fast ICA algorithm which is based on the maximum non-Gauss criterion of output signal to the matching multiples and get better separation results of the primaries and the multiples. The advantage of our method is that we don't need any priori information to the prediction of the multiples, and can have a better separation result. The method has been applied to several synthetic data generated by finite-difference model technique and the Sigsbee2B model multiple data, the primaries and multiples are non-orthogonal in these models. The experiments show that after three to four iterations, we can get the perfect multiple results. Using our matching method and Fast ICA adaptive multiple subtraction, we can not only effectively preserve the effective wave energy in seismic records, but also can effectively suppress the free surface multiples, especially the multiples related to the middle and deep areas.
Equilibrium statistical-thermal models in high-energy physics

NASA Astrophysics Data System (ADS)

Tawfik, Abdel Nasser

2014-05-01

We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948, an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it, the simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analyzed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-particle systems can be studied with the help of statistical-thermal methods. The analysis of yield multiplicities in high-energy collisions gives an overwhelming evidence for the chemical equilibrium in the final state. The strange particles might be an exception, as they are suppressed at lower beam energies. However, their relative yields fulfill statistical equilibrium, as well. We review the equilibrium statistical-thermal models for particle production, fluctuations and collective flow in heavy-ion experiments. We also review their reproduction of the lattice QCD thermodynamics at vanishing and finite chemical potential. During the last decade, five conditions have been suggested to describe the universal behavior of the chemical freeze-out parameters. The higher order moments of multiplicity have been discussed. They offer deep insights about particle production and to critical fluctuations. Therefore, we use them to describe the freeze-out parameters and suggest the location of the QCD critical endpoint. Various extensions have been proposed in order to take into consideration the possible deviations of the ideal hadron gas. We highlight various types of interactions, dissipative properties and location-dependences (spatial rapidity). Furthermore, we review three models combining hadronic with partonic phases; quasi-particle model, linear sigma model with Polyakov potentials and compressible bag model.
Multiple-Point statistics for stochastic modeling of aquifers, where do we stand?

NASA Astrophysics Data System (ADS)

Renard, P.; Julien, S.

2017-12-01

In the last 20 years, multiple-point statistics have been a focus of much research, successes and disappointments. The aim of this geostatistical approach was to integrate geological information into stochastic models of aquifer heterogeneity to better represent the connectivity of high or low permeability structures in the underground. Many different algorithms (ENESIM, SNESIM, SIMPAT, CCSIM, QUILTING, IMPALA, DEESSE, FILTERSIM, HYPPS, etc.) have been and are still proposed. They are all based on the concept of a training data set from which spatial statistics are derived and used in a further step to generate conditional realizations. Some of these algorithms evaluate the statistics of the spatial patterns for every pixel, other techniques consider the statistics at the scale of a patch or a tile. While the method clearly succeeded in enabling modelers to generate realistic models, several issues are still the topic of debate both from a practical and theoretical point of view, and some issues such as training data set availability are often hindering the application of the method in practical situations. In this talk, the aim is to present a review of the status of these approaches both from a theoretical and practical point of view using several examples at different scales (from pore network to regional aquifer).
Validation of sea ice models using an uncertainty-based distance metric for multiple model variables: NEW METRIC FOR SEA ICE MODEL VALIDATION

DOE Office of Scientific and Technical Information (OSTI.GOV)

Urrego-Blanco, Jorge R.; Hunke, Elizabeth C.; Urban, Nathan M.

Here, we implement a variance-based distance metric (D n) to objectively assess skill of sea ice models when multiple output variables or uncertainties in both model predictions and observations need to be considered. The metric compares observations and model data pairs on common spatial and temporal grids improving upon highly aggregated metrics (e.g., total sea ice extent or volume) by capturing the spatial character of model skill. The D n metric is a gamma-distributed statistic that is more general than the χ 2 statistic commonly used to assess model fit, which requires the assumption that the model is unbiased andmore » can only incorporate observational error in the analysis. The D n statistic does not assume that the model is unbiased, and allows the incorporation of multiple observational data sets for the same variable and simultaneously for different variables, along with different types of variances that can characterize uncertainties in both observations and the model. This approach represents a step to establish a systematic framework for probabilistic validation of sea ice models. The methodology is also useful for model tuning by using the D n metric as a cost function and incorporating model parametric uncertainty as part of a scheme to optimize model functionality. We apply this approach to evaluate different configurations of the standalone Los Alamos sea ice model (CICE) encompassing the parametric uncertainty in the model, and to find new sets of model configurations that produce better agreement than previous configurations between model and observational estimates of sea ice concentration and thickness.« less
Validation of sea ice models using an uncertainty-based distance metric for multiple model variables: NEW METRIC FOR SEA ICE MODEL VALIDATION

DOE PAGES

Urrego-Blanco, Jorge R.; Hunke, Elizabeth C.; Urban, Nathan M.; ...

2017-04-01

Here, we implement a variance-based distance metric (D n) to objectively assess skill of sea ice models when multiple output variables or uncertainties in both model predictions and observations need to be considered. The metric compares observations and model data pairs on common spatial and temporal grids improving upon highly aggregated metrics (e.g., total sea ice extent or volume) by capturing the spatial character of model skill. The D n metric is a gamma-distributed statistic that is more general than the χ 2 statistic commonly used to assess model fit, which requires the assumption that the model is unbiased andmore » can only incorporate observational error in the analysis. The D n statistic does not assume that the model is unbiased, and allows the incorporation of multiple observational data sets for the same variable and simultaneously for different variables, along with different types of variances that can characterize uncertainties in both observations and the model. This approach represents a step to establish a systematic framework for probabilistic validation of sea ice models. The methodology is also useful for model tuning by using the D n metric as a cost function and incorporating model parametric uncertainty as part of a scheme to optimize model functionality. We apply this approach to evaluate different configurations of the standalone Los Alamos sea ice model (CICE) encompassing the parametric uncertainty in the model, and to find new sets of model configurations that produce better agreement than previous configurations between model and observational estimates of sea ice concentration and thickness.« less
Transfer Student Success: Educationally Purposeful Activities Predictive of Undergraduate GPA

ERIC Educational Resources Information Center

Fauria, Renee M.; Fuller, Matthew B.

2015-01-01

Researchers evaluated the effects of Educationally Purposeful Activities (EPAs) on transfer and nontransfer students' cumulative GPAs. Hierarchical, linear, and multiple regression models yielded seven statistically significant educationally purposeful items that influenced undergraduate student GPAs. Statistically significant positive EPAs for…
A Meta-Meta-Analysis: Empirical Review of Statistical Power, Type I Error Rates, Effect Sizes, and Model Selection of Meta-Analyses Published in Psychology

ERIC Educational Resources Information Center

Cafri, Guy; Kromrey, Jeffrey D.; Brannick, Michael T.

2010-01-01

This article uses meta-analyses published in "Psychological Bulletin" from 1995 to 2005 to describe meta-analyses in psychology, including examination of statistical power, Type I errors resulting from multiple comparisons, and model choice. Retrospective power estimates indicated that univariate categorical and continuous moderators, individual…
A new statistical method for transfer coefficient calculations in the framework of the general multiple-compartment model of transport for radionuclides in biological systems.

PubMed

Garcia, F; Arruda-Neto, J D; Manso, M V; Helene, O M; Vanin, V R; Rodriguez, O; Mesa, J; Likhachev, V P; Filho, J W; Deppman, A; Perez, G; Guzman, F; de Camargo, S P

1999-10-01

A new and simple statistical procedure (STATFLUX) for the calculation of transfer coefficients of radionuclide transport to animals and plants is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. By using experimentally available curves of radionuclide concentrations versus time, for each animal compartment (organs), flow parameters were estimated by employing a least-squares procedure, whose consistency is tested. Some numerical results are presented in order to compare the STATFLUX transfer coefficients with those from other works and experimental data.
Statistical hadronization with exclusive channels in e +e - annihilation

DOE PAGES

Ferroni, L.; Becattini, F.

2012-01-01

We present a systematic analysis of exclusive hadronic channels in e +e - collisions at centre-of-mass energies between 2.1 and 2.6 GeV within the statistical hadronization model. Because of the low multiplicities involved, calculations have been carried out in the full microcanonical ensemble, including conservation of energy-momentum, angular momentum, parity, isospin, and all relevant charges. We show that the data is in an overall good agreement with the model for an energy density of about 0.5 GeV/fm 3 and an extra strangeness suppression parameter γ S 0:7, essentially the same values found with fits to inclusive multiplicities at higher energy.
Modeling the Development of Audiovisual Cue Integration in Speech Perception

PubMed Central

Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.

2017-01-01

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558
Modeling the Development of Audiovisual Cue Integration in Speech Perception.

PubMed

Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

2017-03-21

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Advanced statistics: linear regression, part II: multiple linear regression.

PubMed

Marill, Keith A

2004-01-01

The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content

PubMed Central

Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven

2015-01-01

Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855

Targeted versus statistical approaches to selecting parameters for modelling sediment provenance

NASA Astrophysics Data System (ADS)

Laceby, J. Patrick

2017-04-01

One effective field-based approach to modelling sediment provenance is the source fingerprinting technique. Arguably, one of the most important steps for this approach is selecting the appropriate suite of parameters or fingerprints used to model source contributions. Accordingly, approaches to selecting parameters for sediment source fingerprinting will be reviewed. Thereafter, opportunities and limitations of these approaches and some future research directions will be presented. For properties to be effective tracers of sediment, they must discriminate between sources whilst behaving conservatively. Conservative behavior is characterized by constancy in sediment properties, where the properties of sediment sources remain constant, or at the very least, any variation in these properties should occur in a predictable and measurable way. Therefore, properties selected for sediment source fingerprinting should remain constant through sediment detachment, transportation and deposition processes, or vary in a predictable and measurable way. One approach to select conservative properties for sediment source fingerprinting is to identify targeted tracers, such as caesium-137, that provide specific source information (e.g. surface versus subsurface origins). A second approach is to use statistical tests to select an optimal suite of conservative properties capable of modelling sediment provenance. In general, statistical approaches use a combination of a discrimination (e.g. Kruskal Wallis H-test, Mann-Whitney U-test) and parameter selection statistics (e.g. Discriminant Function Analysis or Principle Component Analysis). The challenge is that modelling sediment provenance is often not straightforward and there is increasing debate in the literature surrounding the most appropriate approach to selecting elements for modelling. Moving forward, it would be beneficial if researchers test their results with multiple modelling approaches, artificial mixtures, and multiple lines of evidence to provide secondary support to their initial modelling results. Indeed, element selection can greatly impact modelling results and having multiple lines of evidence will help provide confidence when modelling sediment provenance.
Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models

PubMed Central

Chiu, Chi-yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-ling; Xiong, Momiao; Fan, Ruzong

2017-01-01

To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data. PMID:28000696
Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models.

PubMed

Chiu, Chi-Yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-Ling; Xiong, Momiao; Fan, Ruzong

2017-02-01

To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.
Association analysis of multiple traits by an approach of combining P values.

PubMed

Chen, Lili; Wang, Yong; Zhou, Yajing

2018-03-01

Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.
Multivariate space-time modelling of multiple air pollutants and their health effects accounting for exposure uncertainty.

PubMed

Huang, Guowen; Lee, Duncan; Scott, E Marian

2018-03-30

The long-term health effects of air pollution are often estimated using a spatio-temporal ecological areal unit study, but this design leads to the following statistical challenges: (1) how to estimate spatially representative pollution concentrations for each areal unit; (2) how to allow for the uncertainty in these estimated concentrations when estimating their health effects; and (3) how to simultaneously estimate the joint effects of multiple correlated pollutants. This article proposes a novel 2-stage Bayesian hierarchical model for addressing these 3 challenges, with inference based on Markov chain Monte Carlo simulation. The first stage is a multivariate spatio-temporal fusion model for predicting areal level average concentrations of multiple pollutants from both monitored and modelled pollution data. The second stage is a spatio-temporal model for estimating the health impact of multiple correlated pollutants simultaneously, which accounts for the uncertainty in the estimated pollution concentrations. The novel methodology is motivated by a new study of the impact of both particulate matter and nitrogen dioxide concentrations on respiratory hospital admissions in Scotland between 2007 and 2011, and the results suggest that both pollutants exhibit substantial and independent health effects. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Statistical hadronization and microcanonical ensemble

DOE PAGES

Becattini, F.; Ferroni, L.

2004-01-01

We present a Monte Carlo calculation of the microcanonical ensemble of the of the ideal hadron-resonance gas including all known states up to a mass of 1. 8 GeV, taking into account quantum statistics. The computing method is a development of a previous one based on a Metropolis Monte Carlo algorithm, with a the grand-canonical limit of the multi-species multiplicity distribution as proposal matrix. The microcanonical average multiplicities of the various hadron species are found to converge to the canonical ones for moderately low values of the total energy. This algorithm opens the way for event generators based for themore » statistical hadronization model.« less
Multiplicative Modeling of Children's Growth and Its Statistical Properties

NASA Astrophysics Data System (ADS)

Kuninaka, Hiroto; Matsushita, Mitsugu

2014-03-01

We develop a numerical growth model that can predict the statistical properties of the height distribution of Japanese children. Our previous studies have clarified that the height distribution of schoolchildren shows a transition from the lognormal distribution to the normal distribution during puberty. In this study, we demonstrate by simulation that the transition occurs owing to the variability of the onset of puberty.
Spatial Statistical and Modeling Strategy for Inventorying and Monitoring Ecosystem Resources at Multiple Scales and Resolution Levels

Treesearch

Robin M. Reich; C. Aguirre-Bravo; M.S. Williams

2006-01-01

A statistical strategy for spatial estimation and modeling of natural and environmental resource variables and indicators is presented. This strategy is part of an inventory and monitoring pilot study that is being carried out in the Mexican states of Jalisco and Colima. Fine spatial resolution estimates of key variables and indicators are outputs that will allow the...
Probabilistic inversion with graph cuts: Application to the Boise Hydrogeophysical Research Site

NASA Astrophysics Data System (ADS)

Pirot, Guillaume; Linde, Niklas; Mariethoz, Grégoire; Bradford, John H.

2017-02-01

Inversion methods that build on multiple-point statistics tools offer the possibility to obtain model realizations that are not only in agreement with field data, but also with conceptual geological models that are represented by training images. A recent inversion approach based on patch-based geostatistical resimulation using graph cuts outperforms state-of-the-art multiple-point statistics methods when applied to synthetic inversion examples featuring continuous and discontinuous property fields. Applications of multiple-point statistics tools to field data are challenging due to inevitable discrepancies between actual subsurface structure and the assumptions made in deriving the training image. We introduce several amendments to the original graph cut inversion algorithm and present a first-ever field application by addressing porosity estimation at the Boise Hydrogeophysical Research Site, Boise, Idaho. We consider both a classical multi-Gaussian and an outcrop-based prior model (training image) that are in agreement with available porosity data. When conditioning to available crosshole ground-penetrating radar data using Markov chain Monte Carlo, we find that the posterior realizations honor overall both the characteristics of the prior models and the geophysical data. The porosity field is inverted jointly with the measurement error and the petrophysical parameters that link dielectric permittivity to porosity. Even though the multi-Gaussian prior model leads to posterior realizations with higher likelihoods, the outcrop-based prior model shows better convergence. In addition, it offers geologically more realistic posterior realizations and it better preserves the full porosity range of the prior.
The FORE-SCE model: a practical approach for projecting land cover change using scenario-based modeling

USGS Publications Warehouse

Sohl, Terry L.; Sayler, Kristi L.; Drummond, Mark A.; Loveland, Thomas R.

2007-01-01

A wide variety of ecological applications require spatially explicit, historic, current, and projected land use and land cover data. The U.S. Land Cover Trends project is analyzing contemporary (1973–2000) land-cover change in the conterminous United States. The newly developed FORE-SCE model used Land Cover Trends data and theoretical, statistical, and deterministic modeling techniques to project future land cover change through 2020 for multiple plausible scenarios. Projected proportions of future land use were initially developed, and then sited on the lands with the highest potential for supporting that land use and land cover using a statistically based stochastic allocation procedure. Three scenarios of 2020 land cover were mapped for the western Great Plains in the US. The model provided realistic, high-resolution, scenario-based land-cover products suitable for multiple applications, including studies of climate and weather variability, carbon dynamics, and regional hydrology.
Predicting perceptual quality of images in realistic scenario using deep filter banks

NASA Astrophysics Data System (ADS)

Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang

2018-03-01

Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.
The optimal hormonal replacement modality selection for multiple organ procurement from brain-dead organ donors

PubMed Central

Mi, Zhibao; Novitzky, Dimitri; Collins, Joseph F; Cooper, David KC

2015-01-01

The management of brain-dead organ donors is complex. The use of inotropic agents and replacement of depleted hormones (hormonal replacement therapy) is crucial for successful multiple organ procurement, yet the optimal hormonal replacement has not been identified, and the statistical adjustment to determine the best selection is not trivial. Traditional pair-wise comparisons between every pair of treatments, and multiple comparisons to all (MCA), are statistically conservative. Hsu’s multiple comparisons with the best (MCB) – adapted from the Dunnett’s multiple comparisons with control (MCC) – has been used for selecting the best treatment based on continuous variables. We selected the best hormonal replacement modality for successful multiple organ procurement using a two-step approach. First, we estimated the predicted margins by constructing generalized linear models (GLM) or generalized linear mixed models (GLMM), and then we applied the multiple comparison methods to identify the best hormonal replacement modality given that the testing of hormonal replacement modalities is independent. Based on 10-year data from the United Network for Organ Sharing (UNOS), among 16 hormonal replacement modalities, and using the 95% simultaneous confidence intervals, we found that the combination of thyroid hormone, a corticosteroid, antidiuretic hormone, and insulin was the best modality for multiple organ procurement for transplantation. PMID:25565890
Congruence analysis of geodetic networks - hypothesis tests versus model selection by information criteria

NASA Astrophysics Data System (ADS)

Lehmann, Rüdiger; Lösler, Michael

2017-12-01

Geodetic deformation analysis can be interpreted as a model selection problem. The null model indicates that no deformation has occurred. It is opposed to a number of alternative models, which stipulate different deformation patterns. A common way to select the right model is the usage of a statistical hypothesis test. However, since we have to test a series of deformation patterns, this must be a multiple test. As an alternative solution for the test problem, we propose the p-value approach. Another approach arises from information theory. Here, the Akaike information criterion (AIC) or some alternative is used to select an appropriate model for a given set of observations. Both approaches are discussed and applied to two test scenarios: A synthetic levelling network and the Delft test data set. It is demonstrated that they work but behave differently, sometimes even producing different results. Hypothesis tests are well-established in geodesy, but may suffer from an unfavourable choice of the decision error rates. The multiple test also suffers from statistical dependencies between the test statistics, which are neglected. Both problems are overcome by applying information criterions like AIC.
Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body, Democratic Republic of Congo

NASA Astrophysics Data System (ADS)

Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik

2016-03-01

A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
Using Multilevel Modeling in Language Assessment Research: A Conceptual Introduction

ERIC Educational Resources Information Center

Barkaoui, Khaled

2013-01-01

This article critiques traditional single-level statistical approaches (e.g., multiple regression analysis) to examining relationships between language test scores and variables in the assessment setting. It highlights the conceptual, methodological, and statistical problems associated with these techniques in dealing with multilevel or nested…
Noise limitations in optical linear algebra processors.

PubMed

Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

1990-05-10

A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.
A Constrained Linear Estimator for Multiple Regression

ERIC Educational Resources Information Center

Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

2010-01-01

"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Can the Site-Frequency Spectrum Distinguish Exponential Population Growth from Multiple-Merger Coalescents?

PubMed Central

Eldon, Bjarki; Birkner, Matthias; Blath, Jochen; Freund, Fabian

2015-01-01

The ability of the site-frequency spectrum (SFS) to reflect the particularities of gene genealogies exhibiting multiple mergers of ancestral lines as opposed to those obtained in the presence of population growth is our focus. An excess of singletons is a well-known characteristic of both population growth and multiple mergers. Other aspects of the SFS, in particular, the weight of the right tail, are, however, affected in specific ways by the two model classes. Using an approximate likelihood method and minimum-distance statistics, our estimates of statistical power indicate that exponential and algebraic growth can indeed be distinguished from multiple-merger coalescents, even for moderate sample sizes, if the number of segregating sites is high enough. A normalized version of the SFS (nSFS) is also used as a summary statistic in an approximate Bayesian computation (ABC) approach. The results give further positive evidence as to the general eligibility of the SFS to distinguish between the different histories. PMID:25575536
The Effects of Statistical Multiplicity of Infection on Virus Quantification and Infectivity Assays.

PubMed

Mistry, Bhaven A; D'Orsogna, Maria R; Chou, Tom

2018-06-19

Many biological assays are employed in virology to quantify parameters of interest. Two such classes of assays, virus quantification assays (VQAs) and infectivity assays (IAs), aim to estimate the number of viruses present in a solution and the ability of a viral strain to successfully infect a host cell, respectively. VQAs operate at extremely dilute concentrations, and results can be subject to stochastic variability in virus-cell interactions. At the other extreme, high viral-particle concentrations are used in IAs, resulting in large numbers of viruses infecting each cell, enough for measurable change in total transcription activity. Furthermore, host cells can be infected at any concentration regime by multiple particles, resulting in a statistical multiplicity of infection and yielding potentially significant variability in the assay signal and parameter estimates. We develop probabilistic models for statistical multiplicity of infection at low and high viral-particle-concentration limits and apply them to the plaque (VQA), endpoint dilution (VQA), and luciferase reporter (IA) assays. A web-based tool implementing our models and analysis is also developed and presented. We test our proposed new methods for inferring experimental parameters from data using numerical simulations and show improvement on existing procedures in all limits. Copyright © 2018 Biophysical Society. Published by Elsevier Inc. All rights reserved.
DEVELOPMENT OF THE VIRTUAL BEACH MODEL, PHASE 1: AN EMPIRICAL MODEL

EPA Science Inventory

With increasing attention focused on the use of multiple linear regression (MLR) modeling of beach fecal bacteria concentration, the validity of the entire statistical process should be carefully evaluated to assure satisfactory predictions. This work aims to identify pitfalls an...

Mathematical and Statistical Software Index.

DTIC Science & Technology

1986-08-01

geometric) mean HMEAN - harmonic mean MEDIAN - median MODE - mode QUANT - quantiles OGIVE - distribution curve IQRNG - interpercentile range RANGE ... range mutliphase pivoting algorithm cross-classification multiple discriminant analysis cross-tabul ation mul tipl e-objecti ve model curve fitting...Statistics). .. .. .... ...... ..... ...... ..... .. 21 *RANGEX (Correct Correlations for Curtailment of Range ). .. .. .... ...... ... 21 *RUMMAGE II (Analysis
A Review of Meta-Analysis Packages in R

ERIC Educational Resources Information Center

Polanin, Joshua R.; Hennessy, Emily A.; Tanner-Smith, Emily E.

2017-01-01

Meta-analysis is a statistical technique that allows an analyst to synthesize effect sizes from multiple primary studies. To estimate meta-analysis models, the open-source statistical environment R is quickly becoming a popular choice. The meta-analytic community has contributed to this growth by developing numerous packages specific to…
A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis

ERIC Educational Resources Information Center

Gonzalez, Oscar; MacKinnon, David P.

2018-01-01

Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to…
Using Artificial Neural Networks in Educational Research: Some Comparisons with Linear Statistical Models.

ERIC Educational Resources Information Center

Everson, Howard T.; And Others

This paper explores the feasibility of neural computing methods such as artificial neural networks (ANNs) and abductory induction mechanisms (AIM) for use in educational measurement. ANNs and AIMS methods are contrasted with more traditional statistical techniques, such as multiple regression and discriminant function analyses, for making…
Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

PubMed Central

Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

2015-01-01

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579
LATENT SPACE MODELS FOR MULTIVIEW NETWORK DATA

PubMed Central

Salter-Townshend, Michael; McCormick, Tyler H.

2018-01-01

Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (e.g., an individual will not trust all of his/her acquaintances). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper, we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks [see, e.g., J. Amer. Statist. Assoc. 97 (2002) 1090–1098]. These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India [Banerjee et al. (2013)]. PMID:29721127
LATENT SPACE MODELS FOR MULTIVIEW NETWORK DATA.

PubMed

Salter-Townshend, Michael; McCormick, Tyler H

2017-09-01

Social relationships consist of interactions along multiple dimensions. In social networks, this means that individuals form multiple types of relationships with the same person (e.g., an individual will not trust all of his/her acquaintances). Statistical models for these data require understanding two related types of dependence structure: (i) structure within each relationship type, or network view, and (ii) the association between views. In this paper, we propose a statistical framework that parsimoniously represents dependence between relationship types while also maintaining enough flexibility to allow individuals to serve different roles in different relationship types. Our approach builds on work on latent space models for networks [see, e.g., J. Amer. Statist. Assoc. 97 (2002) 1090-1098]. These models represent the propensity for two individuals to form edges as conditionally independent given the distance between the individuals in an unobserved social space. Our work departs from previous work in this area by representing dependence structure between network views through a multivariate Bernoulli likelihood, providing a representation of between-view association. This approach infers correlations between views not explained by the latent space model. Using our method, we explore 6 multiview network structures across 75 villages in rural southern Karnataka, India [Banerjee et al. (2013)].
Detecting signals of drug-drug interactions in a spontaneous reports database.

PubMed

Thakrar, Bharat T; Grundschober, Sabine Borel; Doessegger, Lucette

2007-10-01

The spontaneous reports database is widely used for detecting signals of ADRs. We have extended the methodology to include the detection of signals of ADRs that are associated with drug-drug interactions (DDI). In particular, we have investigated two different statistical assumptions for detecting signals of DDI. Using the FDA's spontaneous reports database, we investigated two models, a multiplicative and an additive model, to detect signals of DDI. We applied the models to four known DDIs (methotrexate-diclofenac and bone marrow depression, simvastatin-ciclosporin and myopathy, ketoconazole-terfenadine and torsades de pointes, and cisapride-erythromycin and torsades de pointes) and to four drug-event combinations where there is currently no evidence of a DDI (fexofenadine-ketoconazole and torsades de pointes, methotrexade-rofecoxib and bone marrow depression, fluvastatin-ciclosporin and myopathy, and cisapride-azithromycine and torsade de pointes) and estimated the measure of interaction on the two scales. The additive model correctly identified all four known DDIs by giving a statistically significant (P < 0.05) positive measure of interaction. The multiplicative model identified the first two of the known DDIs as having a statistically significant or borderline significant (P < 0.1) positive measure of interaction term, gave a nonsignificant positive trend for the third interaction (P = 0.27), and a negative trend for the last interaction. Both models correctly identified the four known non interactions by estimating a negative measure of interaction. The spontaneous reports database is a valuable resource for detecting signals of DDIs. In particular, the additive model is more sensitive in detecting such signals. The multiplicative model may further help qualify the strength of the signal detected by the additive model.
Detecting signals of drug–drug interactions in a spontaneous reports database

PubMed Central

Thakrar, Bharat T; Grundschober, Sabine Borel; Doessegger, Lucette

2007-01-01

Aims The spontaneous reports database is widely used for detecting signals of ADRs. We have extended the methodology to include the detection of signals of ADRs that are associated with drug–drug interactions (DDI). In particular, we have investigated two different statistical assumptions for detecting signals of DDI. Methods Using the FDA's spontaneous reports database, we investigated two models, a multiplicative and an additive model, to detect signals of DDI. We applied the models to four known DDIs (methotrexate-diclofenac and bone marrow depression, simvastatin-ciclosporin and myopathy, ketoconazole-terfenadine and torsades de pointes, and cisapride-erythromycin and torsades de pointes) and to four drug-event combinations where there is currently no evidence of a DDI (fexofenadine-ketoconazole and torsades de pointes, methotrexade-rofecoxib and bone marrow depression, fluvastatin-ciclosporin and myopathy, and cisapride-azithromycine and torsade de pointes) and estimated the measure of interaction on the two scales. Results The additive model correctly identified all four known DDIs by giving a statistically significant (P< 0.05) positive measure of interaction. The multiplicative model identified the first two of the known DDIs as having a statistically significant or borderline significant (P< 0.1) positive measure of interaction term, gave a nonsignificant positive trend for the third interaction (P= 0.27), and a negative trend for the last interaction. Both models correctly identified the four known non interactions by estimating a negative measure of interaction. Conclusions The spontaneous reports database is a valuable resource for detecting signals of DDIs. In particular, the additive model is more sensitive in detecting such signals. The multiplicative model may further help qualify the strength of the signal detected by the additive model. PMID:17506784
A Statistical Multimodel Ensemble Approach to Improving Long-Range Forecasting in Pakistan

DTIC Science & Technology

2012-03-01

Impact of global warming on monsoon variability in Pakistan. J. Anim. Pl. Sci., 21, no. 1, 107–110. Gillies, S., T. Murphree, and D. Meyer, 2012...are generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The...generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The predictands are
Joint Seasonal ARMA Approach for Modeling of Load Forecast Errors in Planning Studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hafen, Ryan P.; Samaan, Nader A.; Makarov, Yuri V.

2014-04-14

To make informed and robust decisions in the probabilistic power system operation and planning process, it is critical to conduct multiple simulations of the generated combinations of wind and load parameters and their forecast errors to handle the variability and uncertainty of these time series. In order for the simulation results to be trustworthy, the simulated series must preserve the salient statistical characteristics of the real series. In this paper, we analyze day-ahead load forecast error data from multiple balancing authority locations and characterize statistical properties such as mean, standard deviation, autocorrelation, correlation between series, time-of-day bias, and time-of-day autocorrelation.more » We then construct and validate a seasonal autoregressive moving average (ARMA) model to model these characteristics, and use the model to jointly simulate day-ahead load forecast error series for all BAs.« less
Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

DOE PAGES

Lewis, John R.; Zhang, Adah; Anderson-Cook, Christine Michaela

2017-10-29

Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors ( X) of some underlying causal model producing the observables or responses (Y = g ( X) + error). Here, this paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research inmore » which inverse predictions, along with an assessment of predictive capability, are desired.« less
Comparing multiple statistical methods for inverse prediction in nuclear forensics applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lewis, John R.; Zhang, Adah; Anderson-Cook, Christine Michaela

Forensic science seeks to predict source characteristics using measured observables. Statistically, this objective can be thought of as an inverse problem where interest is in the unknown source characteristics or factors ( X) of some underlying causal model producing the observables or responses (Y = g ( X) + error). Here, this paper reviews several statistical methods for use in inverse problems and demonstrates that comparing results from multiple methods can be used to assess predictive capability. Motivation for assessing inverse predictions comes from the desired application to historical and future experiments involving nuclear material production for forensics research inmore » which inverse predictions, along with an assessment of predictive capability, are desired.« less
Statistical Exposé of a Multiple-Compartment Anaerobic Reactor Treating Domestic Wastewater.

PubMed

Pfluger, Andrew R; Hahn, Martha J; Hering, Amanda S; Munakata-Marr, Junko; Figueroa, Linda

2018-06-01

Mainstream anaerobic treatment of domestic wastewater is a promising energy-generating treatment strategy; however, such reactors operated in colder regions are not well characterized. Performance data from a pilot-scale, multiple-compartment anaerobic reactor taken over 786 days were subjected to comprehensive statistical analyses. Results suggest that chemical oxygen demand (COD) was a poor proxy for organics in anaerobic systems as oxygen demand from dissolved inorganic material, dissolved methane, and colloidal material influence dissolved and particulate COD measurements. Additionally, univariate and functional boxplots were useful in visualizing variability in contaminant concentrations and identifying statistical outliers. Further, significantly different dissolved organic removal and methane production was observed between operational years, suggesting that anaerobic reactor systems may not achieve steady-state performance within one year. Last, modeling multiple-compartment reactor systems will require data collected over at least two years to capture seasonal variations of the major anaerobic microbial functions occurring within each reactor compartment.
Function modeling improves the efficiency of spatial modeling using big data from remote sensing

Treesearch

John Hogland; Nathaniel Anderson

2017-01-01

Spatial modeling is an integral component of most geographic information systems (GISs). However, conventional GIS modeling techniques can require substantial processing time and storage space and have limited statistical and machine learning functionality. To address these limitations, many have parallelized spatial models using multiple coding libraries and have...
Toward statistical modeling of saccadic eye-movement and visual saliency.

PubMed

Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

2014-11-01

In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.

PubMed

Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao

2015-08-01

Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.
Accurate Modeling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

NASA Astrophysics Data System (ADS)

Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron; Scoccimarro, Roman

2015-01-01

The large-scale distribution of galaxies can be explained fairly simply by assuming (i) a cosmological model, which determines the dark matter halo distribution, and (ii) a simple connection between galaxies and the halos they inhabit. This conceptually simple framework, called the halo model, has been remarkably successful at reproducing the clustering of galaxies on all scales, as observed in various galaxy redshift surveys. However, none of these previous studies have carefully modeled the systematics and thus truly tested the halo model in a statistically rigorous sense. We present a new accurate and fully numerical halo model framework and test it against clustering measurements from two luminosity samples of galaxies drawn from the SDSS DR7. We show that the simple ΛCDM cosmology + halo model is not able to simultaneously reproduce the galaxy projected correlation function and the group multiplicity function. In particular, the more luminous sample shows significant tension with theory. We discuss the implications of our findings and how this work paves the way for constraining galaxy formation by accurate simultaneous modeling of multiple galaxy clustering statistics.
A model for multiple-drop-impact erosion of brittle solids

NASA Technical Reports Server (NTRS)

Engel, O. G.

1971-01-01

A statistical model for the multiple-drop-impact erosion of brittle solids was developed. An equation for calculating the rate of erosion is given. The development is not complete since two quantities that are needed to calculate the rate of erosion with use of the equation must be assessed from experimental data. A partial test of the equation shows that it gives results that are in good agreement with experimental observation.
Automation of Ocean Product Metrics

DTIC Science & Technology

2008-09-30

Presented in: Ocean Sciences 2008 Conf., 5 Mar 2008. Shriver, J., J. D. Dykes, and J. Fabre: Automation of Operational Ocean Product Metrics. Presented in 2008 EGU General Assembly , 14 April 2008. 9 ...processing (multiple data cuts per day) and multiple-nested models. Routines for generating automated evaluations of model forecast statistics will be...developed and pre-existing tools will be collected to create a generalized tool set, which will include user-interface tools to the metrics data

A basket two-part model to analyze medical expenditure on interdependent multiple sectors.

PubMed

Sugawara, Shinya; Wu, Tianyi; Yamanishi, Kenji

2018-05-01

This study proposes a novel statistical methodology to analyze expenditure on multiple medical sectors using consumer data. Conventionally, medical expenditure has been analyzed by two-part models, which separately consider purchase decision and amount of expenditure. We extend the traditional two-part models by adding the step of basket analysis for dimension reduction. This new step enables us to analyze complicated interdependence between multiple sectors without an identification problem. As an empirical application for the proposed method, we analyze data of 13 medical sectors from the Medical Expenditure Panel Survey. In comparison with the results of previous studies that analyzed the multiple sector independently, our method provides more detailed implications of the impacts of individual socioeconomic status on the composition of joint purchases from multiple medical sectors; our method has a better prediction performance.
Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.

ERIC Educational Resources Information Center

Bulcock, J. W.

The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…
A Multidimensional Scaling Approach to Dimensionality Assessment for Measurement Instruments Modeled by Multidimensional Item Response Theory

ERIC Educational Resources Information Center

Toro, Maritsa

2011-01-01

The statistical assessment of dimensionality provides evidence of the underlying constructs measured by a survey or test instrument. This study focuses on educational measurement, specifically tests comprised of items described as multidimensional. That is, items that require examinee proficiency in multiple content areas and/or multiple cognitive…
Forecasting defoliation by the gypsy moth in oak stands

Treesearch

Robert W. Campbell; Joseph P. Standaert

1974-01-01

A multiple-regression model is presented that reflects statistically significant correlations between defoliation by the gypsy moth, the dependent variable, and a series of biotic and physical independent variables. Both possible uses and shortcomings of this model are discussed.
Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

ERIC Educational Resources Information Center

Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

2008-01-01

Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Modeling Smoke Plume-Rise and Dispersion from Southern United States Prescribed Burns with Daysmoke

Treesearch

G L Achtemeier; S L Goodrick; Y Liu; F Garcia-Menendez; Y Hu; M. Odman

2011-01-01

We present Daysmoke, an empirical-statistical plume rise and dispersion model for simulating smoke from prescribed burns. Prescribed fires are characterized by complex plume structure including multiple-core updrafts which makes modeling with simple plume models difficult. Daysmoke accounts for plume structure in a three-dimensional veering/sheering atmospheric...
Bayesian models: A statistical primer for ecologists

USGS Publications Warehouse

Hobbs, N. Thompson; Hooten, Mevin B.

2015-01-01

Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
Influence of neural adaptation on dynamics and equilibrium state of neural activities in a ring neural network

NASA Astrophysics Data System (ADS)

Takiyama, Ken

2017-12-01

How neural adaptation affects neural information processing (i.e. the dynamics and equilibrium state of neural activities) is a central question in computational neuroscience. In my previous works, I analytically clarified the dynamics and equilibrium state of neural activities in a ring-type neural network model that is widely used to model the visual cortex, motor cortex, and several other brain regions. The neural dynamics and the equilibrium state in the neural network model corresponded to a Bayesian computation and statistically optimal multiple information integration, respectively, under a biologically inspired condition. These results were revealed in an analytically tractable manner; however, adaptation effects were not considered. Here, I analytically reveal how the dynamics and equilibrium state of neural activities in a ring neural network are influenced by spike-frequency adaptation (SFA). SFA is an adaptation that causes gradual inhibition of neural activity when a sustained stimulus is applied, and the strength of this inhibition depends on neural activities. I reveal that SFA plays three roles: (1) SFA amplifies the influence of external input in neural dynamics; (2) SFA allows the history of the external input to affect neural dynamics; and (3) the equilibrium state corresponds to the statistically optimal multiple information integration independent of the existence of SFA. In addition, the equilibrium state in a ring neural network model corresponds to the statistically optimal integration of multiple information sources under biologically inspired conditions, independent of the existence of SFA.
Assimilating Flow Data into Complex Multiple-Point Statistical Facies Models Using Pilot Points Method

NASA Astrophysics Data System (ADS)

Ma, W.; Jafarpour, B.

2017-12-01

We develop a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information:: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) and its multiple data assimilation variant (ES-MDA) are adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at select locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.
Accounting for heterogeneity in meta-analysis using a multiplicative model-an empirical study.

PubMed

Mawdsley, David; Higgins, Julian P T; Sutton, Alex J; Abrams, Keith R

2017-03-01

In meta-analysis, the random-effects model is often used to account for heterogeneity. The model assumes that heterogeneity has an additive effect on the variance of effect sizes. An alternative model, which assumes multiplicative heterogeneity, has been little used in the medical statistics community, but is widely used by particle physicists. In this paper, we compare the two models using a random sample of 448 meta-analyses drawn from the Cochrane Database of Systematic Reviews. In general, differences in goodness of fit are modest. The multiplicative model tends to give results that are closer to the null, with a narrower confidence interval. Both approaches make different assumptions about the outcome of the meta-analysis. In our opinion, the selection of the more appropriate model will often be guided by whether the multiplicative model's assumption of a single effect size is plausible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Simultaneous statistical bias correction of multiplePM2.5 species from a regional photochemical grid model

EPA Science Inventory

In recent years environmental epidemiologists have begun utilizing regionalscale air quality computer models to predict ambient air pollution concentrations in health studies instead of or in addition to monitoring data from central sites. The advantages of using such models i...
Astrostatistical Analysis in Solar and Stellar Physics

NASA Astrophysics Data System (ADS)

Stenning, David Craig

This dissertation focuses on developing statistical models and methods to address data-analytic challenges in astrostatistics---a growing interdisciplinary field fostering collaborations between statisticians and astrophysicists. The astrostatistics projects we tackle can be divided into two main categories: modeling solar activity and Bayesian analysis of stellar evolution. These categories from Part I and Part II of this dissertation, respectively. The first line of research we pursue involves classification and modeling of evolving solar features. Advances in space-based observatories are increasing both the quality and quantity of solar data, primarily in the form of high-resolution images. To analyze massive streams of solar image data, we develop a science-driven dimension reduction methodology to extract scientifically meaningful features from images. This methodology utilizes mathematical morphology to produce a concise numerical summary of the magnetic flux distribution in solar "active regions'' that (i) is far easier to work with than the source images, (ii) encapsulates scientifically relevant information in a more informative manner than existing schemes (i.e., manual classification schemes), and (iii) is amenable to sophisticated statistical analyses. In a related line of research, we perform a Bayesian analysis of the solar cycle using multiple proxy variables, such as sunspot numbers. We take advantage of patterns and correlations among the proxy variables to model solar activity using data from proxies that have become available more recently, while also taking advantage of the long history of observations of sunspot numbers. This model is an extension of the Yu et al. (2012) Bayesian hierarchical model for the solar cycle that used the sunspot numbers alone. Since proxies have different temporal coverage, we devise a multiple imputation scheme to account for missing data. We find that incorporating multiple proxies reveals important features of the solar cycle that are missed when the model is fit using only the sunspot numbers. In Part II of this dissertation we focus on two related lines of research involving Bayesian analysis of stellar evolution. We first focus on modeling multiple stellar populations in star clusters. It has long been assumed that all star clusters are comprised of single stellar populations---stars that formed at roughly the same time from a common molecular cloud. However, recent studies have produced evidence that some clusters host multiple populations, which has far-reaching scientific implications. We develop a Bayesian hierarchical model for multiple-population star clusters, extending earlier statistical models of stellar evolution (e.g., van Dyk et al. 2009, Stein et al. 2013). We also devise an adaptive Markov chain Monte Carlo algorithm to explore the complex posterior distribution. We use numerical studies to demonstrate that our method can recover parameters of multiple-population clusters, and also show how model misspecification can be diagnosed. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We also explore statistical properties of the estimators and determine that the influence of the prior distribution does not diminish with larger sample sizes, leading to non-standard asymptotics. In a final line of research, we present the first-ever attempt to estimate the carbon fraction of white dwarfs. This quantity has important implications for both astrophysics and fundamental nuclear physics, but is currently unknown. We use a numerical study to demonstrate that assuming an incorrect value for the carbon fraction leads to incorrect white-dwarf ages of star clusters. Finally, we present our attempt to estimate the carbon fraction of the white dwarfs in the well-studied star cluster 47 Tucanae.
MAI statistics estimation and analysis in a DS-CDMA system

NASA Astrophysics Data System (ADS)

Alami Hassani, A.; Zouak, M.; Mrabti, M.; Abdi, F.

2018-05-01

A primary limitation of Direct Sequence Code Division Multiple Access DS-CDMA link performance and system capacity is multiple access interference (MAI). To examine the performance of CDMA systems in the presence of MAI, i.e., in a multiuser environment, several works assumed that the interference can be approximated by a Gaussian random variable. In this paper, we first develop a new and simple approach to characterize the MAI in a multiuser system. In addition to statistically quantifying the MAI power, the paper also proposes a statistical model for both variance and mean of the MAI for synchronous and asynchronous CDMA transmission. We show that the MAI probability density function (PDF) is Gaussian for the equal-received-energy case and validate it by computer simulations.
PCTO-SIM: Multiple-point geostatistical modeling using parallel conditional texture optimization

NASA Astrophysics Data System (ADS)

Pourfard, Mohammadreza; Abdollahifard, Mohammad J.; Faez, Karim; Motamedi, Sayed Ahmad; Hosseinian, Tahmineh

2017-05-01

Multiple-point Geostatistics is a well-known general statistical framework by which complex geological phenomena have been modeled efﬁciently. Pixel-based and patch-based are two major categories of these methods. In this paper, the optimization-based category is used which has a dual concept in texture synthesis as texture optimization. Our extended version of texture optimization uses the energy concept to model geological phenomena. While honoring the hard point, the minimization of our proposed cost function forces simulation grid pixels to be as similar as possible to training images. Our algorithm has a self-enrichment capability and creates a richer training database from a sparser one through mixing the information of all surrounding patches of the simulation nodes. Therefore, it preserves pattern continuity in both continuous and categorical variables very well. It also shows a fuzzy result in its every realization similar to the expected result of multi realizations of other statistical models. While the main core of most previous Multiple-point Geostatistics methods is sequential, the parallel main core of our algorithm enabled it to use GPU efficiently to reduce the CPU time. One new validation method for MPS has also been proposed in this paper.
Continuum mesoscopic framework for multiple interacting species and processes on multiple site types and/or crystallographic planes.

PubMed

Chatterjee, Abhijit; Vlachos, Dionisios G

2007-07-21

While recently derived continuum mesoscopic equations successfully bridge the gap between microscopic and macroscopic physics, so far they have been derived only for simple lattice models. In this paper, general deterministic continuum mesoscopic equations are derived rigorously via nonequilibrium statistical mechanics to account for multiple interacting surface species and multiple processes on multiple site types and/or different crystallographic planes. Adsorption, desorption, reaction, and surface diffusion are modeled. It is demonstrated that contrary to conventional phenomenological continuum models, microscopic physics, such as the interaction potential, determines the final form of the mesoscopic equation. Models of single component diffusion and binary diffusion of interacting particles on single-type site lattice and of single component diffusion on complex microporous materials' lattices consisting of two types of sites are derived, as illustrations of the mesoscopic framework. Simplification of the diffusion mesoscopic model illustrates the relation to phenomenological models, such as the Fickian and Maxwell-Stefan transport models. It is demonstrated that the mesoscopic equations are in good agreement with lattice kinetic Monte Carlo simulations for several prototype examples studied.
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

PubMed

Harrington, Peter de Boves

2018-01-02

Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Data-optimized source modeling with the Backwards Liouville Test–Kinetic method

DOE PAGES

Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.; ...

2017-09-14

In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less
Statistical tools for transgene copy number estimation based on real-time PCR.

PubMed

Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

2007-11-01

As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Kandler; Shi, Ying; Santhanagopalan, Shriram

Predictive models of Li-ion battery lifetime must consider a multiplicity of electrochemical, thermal, and mechanical degradation modes experienced by batteries in application environments. To complicate matters, Li-ion batteries can experience different degradation trajectories that depend on storage and cycling history of the application environment. Rates of degradation are controlled by factors such as temperature history, electrochemical operating window, and charge/discharge rate. We present a generalized battery life prognostic model framework for battery systems design and control. The model framework consists of trial functions that are statistically regressed to Li-ion cell life datasets wherein the cells have been aged under differentmore » levels of stress. Degradation mechanisms and rate laws dependent on temperature, storage, and cycling condition are regressed to the data, with multiple model hypotheses evaluated and the best model down-selected based on statistics. The resulting life prognostic model, implemented in state variable form, is extensible to arbitrary real-world scenarios. The model is applicable in real-time control algorithms to maximize battery life and performance. We discuss efforts to reduce lifetime prediction error and accommodate its inevitable impact in controller design.« less
United States Census 2000 Population with Bridged Race Categories. Vital and Health Statistics. Data Evaluation and Methods Research.

ERIC Educational Resources Information Center

Ingram, Deborah D.; Parker, Jennifer D.; Schenker, Nathaniel; Weed, James A.; Hamilton, Brady; Arias, Elizabeth; Madans, Jennifer H.

This report documents the National Center for Health Statistics' (NCHS) methods for bridging the Census 2000 multiple-race resident population to single-race categories and describing bridged race resident population estimates. Data came from the pooled 1997-2000 National Health Interview Surveys. The bridging models included demographic and…

Role of diversity in ICA and IVA: theory and applications

NASA Astrophysics Data System (ADS)

Adalı, Tülay

2016-05-01

Independent component analysis (ICA) has been the most popular approach for solving the blind source separation problem. Starting from a simple linear mixing model and the assumption of statistical independence, ICA can recover a set of linearly-mixed sources to within a scaling and permutation ambiguity. It has been successfully applied to numerous data analysis problems in areas as diverse as biomedicine, communications, finance, geo- physics, and remote sensing. ICA can be achieved using different types of diversity—statistical property—and, can be posed to simultaneously account for multiple types of diversity such as higher-order-statistics, sample dependence, non-circularity, and nonstationarity. A recent generalization of ICA, independent vector analysis (IVA), generalizes ICA to multiple data sets and adds the use of one more type of diversity, statistical dependence across the data sets, for jointly achieving independent decomposition of multiple data sets. With the addition of each new diversity type, identification of a broader class of signals become possible, and in the case of IVA, this includes sources that are independent and identically distributed Gaussians. We review the fundamentals and properties of ICA and IVA when multiple types of diversity are taken into account, and then ask the question whether diversity plays an important role in practical applications as well. Examples from various domains are presented to demonstrate that in many scenarios it might be worthwhile to jointly account for multiple statistical properties. This paper is submitted in conjunction with the talk delivered for the "Unsupervised Learning and ICA Pioneer Award" at the 2016 SPIE Conference on Sensing and Analysis Technologies for Biomedical and Cognitive Applications.
A Flexible Approach for the Statistical Visualization of Ensemble Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Potter, K.; Wilson, A.; Bremer, P.

2009-09-29

Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
Application of Linear Mixed-Effects Models in Human Neuroscience Research: A Comparison with Pearson Correlation in Two Auditory Electrophysiology Studies.

PubMed

Koerner, Tess K; Zhang, Yang

2017-02-27

Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers.
Statistical mechanical model of coupled transcription from multiple promoters due to transcription factor titration

PubMed Central

Rydenfelt, Mattias; Cox, Robert Sidney; Garcia, Hernan; Phillips, Rob

2014-01-01

Transcription factors (TFs) with regulatory action at multiple promoter targets is the rule rather than the exception, with examples ranging from the cAMP receptor protein (CRP) in E. coli that regulates hundreds of different genes simultaneously to situations involving multiple copies of the same gene, such as plasmids, retrotransposons, or highly replicated viral DNA. When the number of TFs heavily exceeds the number of binding sites, TF binding to each promoter can be regarded as independent. However, when the number of TF molecules is comparable to the number of binding sites, TF titration will result in correlation (“promoter entanglement”) between transcription of different genes. We develop a statistical mechanical model which takes the TF titration effect into account and use it to predict both the level of gene expression for a general set of promoters and the resulting correlation in transcription rates of different genes. Our results show that the TF titration effect could be important for understanding gene expression in many regulatory settings. PMID:24580252
Predicting Final GPA of Graduate School Students: Comparing Artificial Neural Networking and Simultaneous Multiple Regression

ERIC Educational Resources Information Center

Anderson, Joan L.

2006-01-01

Data from graduate student applications at a large Western university were used to determine which factors were the best predictors of success in graduate school, as defined by cumulative graduate grade point average. Two statistical models were employed and compared: artificial neural networking and simultaneous multiple regression. Both models…
Attitude determination using an adaptive multiple model filtering Scheme

NASA Technical Reports Server (NTRS)

Lam, Quang; Ray, Surendra N.

1995-01-01

Attitude determination has been considered as a permanent topic of active research and perhaps remaining as a forever-lasting interest for spacecraft system designers. Its role is to provide a reference for controls such as pointing the directional antennas or solar panels, stabilizing the spacecraft or maneuvering the spacecraft to a new orbit. Least Square Estimation (LSE) technique was utilized to provide attitude determination for the Nimbus 6 and G. Despite its poor performance (estimation accuracy consideration), LSE was considered as an effective and practical approach to meet the urgent need and requirement back in the 70's. One reason for this poor performance associated with the LSE scheme is the lack of dynamic filtering or 'compensation'. In other words, the scheme is based totally on the measurements and no attempts were made to model the dynamic equations of motion of the spacecraft. We propose an adaptive filtering approach which employs a bank of Kalman filters to perform robust attitude estimation. The proposed approach, whose architecture is depicted, is essentially based on the latest proof on the interactive multiple model design framework to handle the unknown of the system noise characteristics or statistics. The concept fundamentally employs a bank of Kalman filter or submodel, instead of using fixed values for the system noise statistics for each submodel (per operating condition) as the traditional multiple model approach does, we use an on-line dynamic system noise identifier to 'identify' the system noise level (statistics) and update the filter noise statistics using 'live' information from the sensor model. The advanced noise identifier, whose architecture is also shown, is implemented using an advanced system identifier. To insure the robust performance for the proposed advanced system identifier, it is also further reinforced by a learning system which is implemented (in the outer loop) using neural networks to identify other unknown quantities such as spacecraft dynamics parameters, gyro biases, dynamic disturbances, or environment variations.
Attitude determination using an adaptive multiple model filtering Scheme

NASA Astrophysics Data System (ADS)

Lam, Quang; Ray, Surendra N.

1995-05-01

Attitude determination has been considered as a permanent topic of active research and perhaps remaining as a forever-lasting interest for spacecraft system designers. Its role is to provide a reference for controls such as pointing the directional antennas or solar panels, stabilizing the spacecraft or maneuvering the spacecraft to a new orbit. Least Square Estimation (LSE) technique was utilized to provide attitude determination for the Nimbus 6 and G. Despite its poor performance (estimation accuracy consideration), LSE was considered as an effective and practical approach to meet the urgent need and requirement back in the 70's. One reason for this poor performance associated with the LSE scheme is the lack of dynamic filtering or 'compensation'. In other words, the scheme is based totally on the measurements and no attempts were made to model the dynamic equations of motion of the spacecraft. We propose an adaptive filtering approach which employs a bank of Kalman filters to perform robust attitude estimation. The proposed approach, whose architecture is depicted, is essentially based on the latest proof on the interactive multiple model design framework to handle the unknown of the system noise characteristics or statistics. The concept fundamentally employs a bank of Kalman filter or submodel, instead of using fixed values for the system noise statistics for each submodel (per operating condition) as the traditional multiple model approach does, we use an on-line dynamic system noise identifier to 'identify' the system noise level (statistics) and update the filter noise statistics using 'live' information from the sensor model. The advanced noise identifier, whose architecture is also shown, is implemented using an advanced system identifier. To insure the robust performance for the proposed advanced system identifier, it is also further reinforced by a learning system which is implemented (in the outer loop) using neural networks to identify other unknown quantities such as spacecraft dynamics parameters, gyro biases, dynamic disturbances, or environment variations.
Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

NASA Astrophysics Data System (ADS)

Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

2018-04-01

Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
Modelling Long Term Disability following Injury: Comparison of Three Approaches for Handling Multiple Injuries

PubMed Central

Gabbe, Belinda J.; Harrison, James E.; Lyons, Ronan A.; Jolley, Damien

2011-01-01

Background Injury is a leading cause of the global burden of disease (GBD). Estimates of non-fatal injury burden have been limited by a paucity of empirical outcomes data. This study aimed to (i) establish the 12-month disability associated with each GBD 2010 injury health state, and (ii) compare approaches to modelling the impact of multiple injury health states on disability as measured by the Glasgow Outcome Scale – Extended (GOS-E). Methods 12-month functional outcomes for 11,337 survivors to hospital discharge were drawn from the Victorian State Trauma Registry and the Victorian Orthopaedic Trauma Outcomes Registry. ICD-10 diagnosis codes were mapped to the GBD 2010 injury health states. Cases with a GOS-E score >6 were defined as “recovered.” A split dataset approach was used. Cases were randomly assigned to development or test datasets. Probability of recovery for each health state was calculated using the development dataset. Three logistic regression models were evaluated: a) additive, multivariable; b) “worst injury;” and c) multiplicative. Models were adjusted for age and comorbidity and investigated for discrimination and calibration. Findings A single injury health state was recorded for 46% of cases (1–16 health states per case). The additive (C-statistic 0.70, 95% CI: 0.69, 0.71) and “worst injury” (C-statistic 0.70; 95% CI: 0.68, 0.71) models demonstrated higher discrimination than the multiplicative (C-statistic 0.68; 95% CI: 0.67, 0.70) model. The additive and “worst injury” models demonstrated acceptable calibration. Conclusions The majority of patients survived with persisting disability at 12-months, highlighting the importance of improving estimates of non-fatal injury burden. Additive and “worst” injury models performed similarly. GBD 2010 injury states were moderately predictive of recovery 1-year post-injury. Further evaluation using additional measures of health status and functioning and comparison with the GBD 2010 disability weights will be needed to optimise injury states for future GBD studies. PMID:21984951
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

PubMed

Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

2016-04-01

To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.
A Multiplicative Cascade Model for High-Resolution Space-Time Downscaling of Rainfall

NASA Astrophysics Data System (ADS)

Raut, Bhupendra A.; Seed, Alan W.; Reeder, Michael J.; Jakob, Christian

2018-02-01

Distributions of rainfall with the time and space resolutions of minutes and kilometers, respectively, are often needed to drive the hydrological models used in a range of engineering, environmental, and urban design applications. The work described here is the first step in constructing a model capable of downscaling rainfall to scales of minutes and kilometers from time and space resolutions of several hours and a hundred kilometers. A multiplicative random cascade model known as the Short-Term Ensemble Prediction System is run with parameters from the radar observations at Melbourne (Australia). The orographic effects are added through multiplicative correction factor after the model is run. In the first set of model calculations, 112 significant rain events over Melbourne are simulated 100 times. Because of the stochastic nature of the cascade model, the simulations represent 100 possible realizations of the same rain event. The cascade model produces realistic spatial and temporal patterns of rainfall at 6 min and 1 km resolution (the resolution of the radar data), the statistical properties of which are in close agreement with observation. In the second set of calculations, the cascade model is run continuously for all days from January 2008 to August 2015 and the rainfall accumulations are compared at 12 locations in the greater Melbourne area. The statistical properties of the observations lie with envelope of the 100 ensemble members. The model successfully reproduces the frequency distribution of the 6 min rainfall intensities, storm durations, interarrival times, and autocorrelation function.
Data Analysis Techniques for Physical Scientists

NASA Astrophysics Data System (ADS)

Pruneau, Claude A.

2017-10-01

Preface; How to read this book; 1. The scientific method; Part I. Foundation in Probability and Statistics: 2. Probability; 3. Probability models; 4. Classical inference I: estimators; 5. Classical inference II: optimization; 6. Classical inference III: confidence intervals and statistical tests; 7. Bayesian inference; Part II. Measurement Techniques: 8. Basic measurements; 9. Event reconstruction; 10. Correlation functions; 11. The multiple facets of correlation functions; 12. Data correction methods; Part III. Simulation Techniques: 13. Monte Carlo methods; 14. Collision and detector modeling; List of references; Index.
Modified two-sources quantum statistical model and multiplicity fluctuation in the finite rapidity region

NASA Astrophysics Data System (ADS)

Ghosh, Dipak; Sarkar, Sharmila; Sen, Sanjib; Roy, Jaya

1995-06-01

In this paper the behavior of factorial moments with rapidity window size, which is usually explained in terms of ``intermittency,'' has been interpreted by simple quantum statistical properties of the emitting system using the concept of ``modified two-source model'' as recently proposed by Ghosh and Sarkar [Phys. Lett. B 278, 465 (1992)]. The analysis has been performed using our own data of 16Ag/Br and 24Ag/Br interactions at a few tens of GeV energy regime.
A Multiple Group Measurement Model of Children's Reports of Parental Socioeconomic Status. Discussion Papers No. 531-78.

ERIC Educational Resources Information Center

Mare, Robert D.; Mason, William M.

An important class of applications of measurement error or constrained factor analytic models consists of comparing models for several populations. In such cases, it is appropriate to make explicit statistical tests of model similarity across groups and to constrain some parameters of the models to be equal across groups using a priori substantive…
Modeling Success: Using Preenrollment Data to Identify Academically At-Risk Students

ERIC Educational Resources Information Center

Gansemer-Topf, Ann M.; Compton, Jonathan; Wohlgemuth, Darin; Forbes, Greg; Ralston, Ekaterina

2015-01-01

Improving student success and degree completion is one of the core principles of strategic enrollment management. To address this principle, institutional data were used to develop a statistical model to identify academically at-risk students. The model employs multiple linear regression techniques to predict students at risk of earning below a…
Microgenetic Patterns of Children's Multiplication Learning: Confirming the Overlapping Waves Model by Latent Growth Modeling

ERIC Educational Resources Information Center

van der Ven, Sanne H. G.; Boom, Jan; Kroesbergen, Evelyn H.; Leseman, Paul P. M.

2012-01-01

Variability in strategy selection is an important characteristic of learning new skills such as mathematical skills. Strategies gradually come and go during this development. In 1996, Siegler described this phenomenon as ''overlapping waves.'' In the current microgenetic study, we attempted to model these overlapping waves statistically. In…
Hidden Markov models of biological primary sequence information.

PubMed Central

Baldi, P; Chauvin, Y; Hunkapiller, T; McClure, M A

1994-01-01

Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences. PMID:8302831
Modeling Rabbit Responses to Single and Multiple Aerosol ...

EPA Pesticide Factsheets

Journal Article Survival models are developed here to predict response and time-to-response for mortality in rabbits following exposures to single or multiple aerosol doses of Bacillus anthracis spores. Hazard function models were developed for a multiple dose dataset to predict the probability of death through specifying dose-response functions and the time between exposure and the time-to-death (TTD). Among the models developed, the best-fitting survival model (baseline model) has an exponential dose-response model with a Weibull TTD distribution. Alternative models assessed employ different underlying dose-response functions and use the assumption that, in a multiple dose scenario, earlier doses affect the hazard functions of each subsequent dose. In addition, published mechanistic models are analyzed and compared with models developed in this paper. None of the alternative models that were assessed provided a statistically significant improvement in fit over the baseline model. The general approach utilizes simple empirical data analysis to develop parsimonious models with limited reliance on mechanistic assumptions. The baseline model predicts TTDs consistent with reported results from three independent high-dose rabbit datasets. More accurate survival models depend upon future development of dose-response datasets specifically designed to assess potential multiple dose effects on response and time-to-response. The process used in this paper to dev
Effects of preprocessing Landsat MSS data on derived features

NASA Technical Reports Server (NTRS)

Parris, T. M.; Cicone, R. C.

1983-01-01

Important to the use of multitemporal Landsat MSS data for earth resources monitoring, such as agricultural inventories, is the ability to minimize the effects of varying atmospheric and satellite viewing conditions, while extracting physically meaningful features from the data. In general, the approaches to the preprocessing problem have been derived from either physical or statistical models. This paper compares three proposed algorithms; XSTAR haze correction, Color Normalization, and Multiple Acquisition Mean Level Adjustment. These techniques represent physical, statistical, and hybrid physical-statistical models, respectively. The comparisons are made in the context of three feature extraction techniques; the Tasseled Cap, the Cate Color Cube. and Normalized Difference.
Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

PubMed Central

Marcum, Christopher Steven; Butts, Carter T.

2015-01-01

The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488

A retrospective likelihood approach for efficient integration of multiple omics factors in case-control association studies.

PubMed

Balliu, Brunilda; Tsonaka, Roula; Boehringer, Stefan; Houwing-Duistermaat, Jeanine

2015-03-01

Integrative omics, the joint analysis of outcome and multiple types of omics data, such as genomics, epigenomics, and transcriptomics data, constitute a promising approach for powerful and biologically relevant association studies. These studies often employ a case-control design, and often include nonomics covariates, such as age and gender, that may modify the underlying omics risk factors. An open question is how to best integrate multiple omics and nonomics information to maximize statistical power in case-control studies that ascertain individuals based on the phenotype. Recent work on integrative omics have used prospective approaches, modeling case-control status conditional on omics, and nonomics risk factors. Compared to univariate approaches, jointly analyzing multiple risk factors with a prospective approach increases power in nonascertained cohorts. However, these prospective approaches often lose power in case-control studies. In this article, we propose a novel statistical method for integrating multiple omics and nonomics factors in case-control association studies. Our method is based on a retrospective likelihood function that models the joint distribution of omics and nonomics factors conditional on case-control status. The new method provides accurate control of Type I error rate and has increased efficiency over prospective approaches in both simulated and real data. © 2015 Wiley Periodicals, Inc.
Modelling multiple sources of dissemination bias in meta-analysis.

PubMed

Bowden, Jack; Jackson, Dan; Thompson, Simon G

2010-03-30

Asymmetry in the funnel plot for a meta-analysis suggests the presence of dissemination bias. This may be caused by publication bias through the decisions of journal editors, by selective reporting of research results by authors or by a combination of both. Typically, study results that are statistically significant or have larger estimated effect sizes are more likely to appear in the published literature, hence giving a biased picture of the evidence-base. Previous statistical approaches for addressing dissemination bias have assumed only a single selection mechanism. Here we consider a more realistic scenario in which multiple dissemination processes, involving both the publishing authors and journals, are operating. In practical applications, the methods can be used to provide sensitivity analyses for the potential effects of multiple dissemination biases operating in meta-analysis.
Waste generated in high-rise buildings construction: a quantification model based on statistical multiple regression.

PubMed

Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana

2015-05-01

Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.
Quantifying geological uncertainty for flow and transport modeling in multi-modal heterogeneous formations

NASA Astrophysics Data System (ADS)

Feyen, Luc; Caers, Jef

2006-06-01

In this work, we address the problem of characterizing the heterogeneity and uncertainty of hydraulic properties for complex geological settings. Hereby, we distinguish between two scales of heterogeneity, namely the hydrofacies structure and the intrafacies variability of the hydraulic properties. We employ multiple-point geostatistics to characterize the hydrofacies architecture. The multiple-point statistics are borrowed from a training image that is designed to reflect the prior geological conceptualization. The intrafacies variability of the hydraulic properties is represented using conventional two-point correlation methods, more precisely, spatial covariance models under a multi-Gaussian spatial law. We address the different levels and sources of uncertainty in characterizing the subsurface heterogeneity, and explore their effect on groundwater flow and transport predictions. Typically, uncertainty is assessed by way of many images, termed realizations, of a fixed statistical model. However, in many cases, sampling from a fixed stochastic model does not adequately represent the space of uncertainty. It neglects the uncertainty related to the selection of the stochastic model and the estimation of its input parameters. We acknowledge the uncertainty inherent in the definition of the prior conceptual model of aquifer architecture and in the estimation of global statistics, anisotropy, and correlation scales. Spatial bootstrap is used to assess the uncertainty of the unknown statistical parameters. As an illustrative example, we employ a synthetic field that represents a fluvial setting consisting of an interconnected network of channel sands embedded within finer-grained floodplain material. For this highly non-stationary setting we quantify the groundwater flow and transport model prediction uncertainty for various levels of hydrogeological uncertainty. Results indicate the importance of accurately describing the facies geometry, especially for transport predictions.
Economic Impacts of Infrastructure Damages on Industrial Sector

NASA Astrophysics Data System (ADS)

Kajitani, Yoshio

This paper proposes a basic model for evaluating economic impacts on industrial sectors under the conditions that multiple infrastructures are simultaneously damaged during the earthquake disasters. Especially, focusing on the available economic data developed in the smallest spatial scale in Japan (small area statistics), economic loss estimation model based on the small area statistics and its applicability are investigated on. In the detail, a loss estimation framework, utilizing survey results on firms' activities under electricity, water and gas disruptions, and route choice models in Transportation Engineering, are applied to the case of 2004 Mid-Niigata Earthquake.
A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.

PubMed

Chen, Li; Wang, Chi; Qin, Zhaohui S; Wu, Hao

2015-06-15

ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability.

PubMed

Webster, R J; Williams, A; Marchetti, F; Yauk, C L

2018-07-01

Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Experimental design matters for statistical analysis: how to handle blocking.

PubMed

Jensen, Signe M; Schaarschmidt, Frank; Onofri, Andrea; Ritz, Christian

2018-03-01

Nowadays, evaluation of the effects of pesticides often relies on experimental designs that involve multiple concentrations of the pesticide of interest or multiple pesticides at specific comparable concentrations and, possibly, secondary factors of interest. Unfortunately, the experimental design is often more or less neglected when analysing data. Two data examples were analysed using different modelling strategies. First, in a randomized complete block design, mean heights of maize treated with a herbicide and one of several adjuvants were compared. Second, translocation of an insecticide applied to maize as a seed treatment was evaluated using incomplete data from an unbalanced design with several layers of hierarchical sampling. Extensive simulations were carried out to further substantiate the effects of different modelling strategies. It was shown that results from suboptimal approaches (two-sample t-tests and ordinary ANOVA assuming independent observations) may be both quantitatively and qualitatively different from the results obtained using an appropriate linear mixed model. The simulations demonstrated that the different approaches may lead to differences in coverage percentages of confidence intervals and type 1 error rates, confirming that misleading conclusions can easily happen when an inappropriate statistical approach is chosen. To ensure that experimental data are summarized appropriately, avoiding misleading conclusions, the experimental design should duly be reflected in the choice of statistical approaches and models. We recommend that author guidelines should explicitly point out that authors need to indicate how the statistical analysis reflects the experimental design. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Robustness of statistical tests for multiplicative terms in the additive main effects and multiplicative interaction model for cultivar trials.

PubMed

Piepho, H P

1995-03-01

The additive main effects multiplicative interaction model is frequently used in the analysis of multilocation trials. In the analysis of such data it is of interest to decide how many of the multiplicative interaction terms are significant. Several tests for this task are available, all of which assume that errors are normally distributed with a common variance. This paper investigates the robustness of several tests (Gollob, F GH1, FGH2, FR)to departures from these assumptions. It is concluded that, because of its better robustness, the F Rtest is preferable. If the other tests are to be used, preliminary tests for the validity of assumptions should be performed.
Accounting for multiple sources of uncertainty in impact assessments: The example of the BRACE study

NASA Astrophysics Data System (ADS)

O'Neill, B. C.

2015-12-01

Assessing climate change impacts often requires the use of multiple scenarios, types of models, and data sources, leading to a large number of potential sources of uncertainty. For example, a single study might require a choice of a forcing scenario, climate model, bias correction and/or downscaling method, societal development scenario, model (typically several) for quantifying elements of societal development such as economic and population growth, biophysical model (such as for crop yields or hydrology), and societal impact model (e.g. economic or health model). Some sources of uncertainty are reduced or eliminated by the framing of the question. For example, it may be useful to ask what an impact outcome would be conditional on a given societal development pathway, forcing scenario, or policy. However many sources of uncertainty remain, and it is rare for all or even most of these sources to be accounted for. I use the example of a recent integrated project on the Benefits of Reduced Anthropogenic Climate changE (BRACE) to explore useful approaches to uncertainty across multiple components of an impact assessment. BRACE comprises 23 papers that assess the differences in impacts between two alternative climate futures: those associated with Representative Concentration Pathways (RCPs) 4.5 and 8.5. It quantifies difference in impacts in terms of extreme events, health, agriculture, tropical cyclones, and sea level rise. Methodologically, it includes climate modeling, statistical analysis, integrated assessment modeling, and sector-specific impact modeling. It employs alternative scenarios of both radiative forcing and societal development, but generally uses a single climate model (CESM), partially accounting for climate uncertainty by drawing heavily on large initial condition ensembles. Strengths and weaknesses of the approach to uncertainty in BRACE are assessed. Options under consideration for improving the approach include the use of perturbed physics ensembles of CESM, employing results from multiple climate models, and combining the results from single impact models with statistical representations of uncertainty across multiple models. A key consideration is the relationship between the question being addressed and the uncertainty approach.
Statistical methods for incomplete data: Some results on model misspecification.

PubMed

McIsaac, Michael; Cook, R J

2017-02-01

Inverse probability weighted estimating equations and multiple imputation are two of the most studied frameworks for dealing with incomplete data in clinical and epidemiological research. We examine the limiting behaviour of estimators arising from inverse probability weighted estimating equations, augmented inverse probability weighted estimating equations and multiple imputation when the requisite auxiliary models are misspecified. We compute limiting values for settings involving binary responses and covariates and illustrate the effects of model misspecification using simulations based on data from a breast cancer clinical trial. We demonstrate that, even when both auxiliary models are misspecified, the asymptotic biases of double-robust augmented inverse probability weighted estimators are often smaller than the asymptotic biases of estimators arising from complete-case analyses, inverse probability weighting or multiple imputation. We further demonstrate that use of inverse probability weighting or multiple imputation with slightly misspecified auxiliary models can actually result in greater asymptotic bias than the use of naïve, complete case analyses. These asymptotic results are shown to be consistent with empirical results from simulation studies.
Maternal factors predicting cognitive and behavioral characteristics of children with fetal alcohol spectrum disorders.

PubMed

May, Philip A; Tabachnick, Barbara G; Gossage, J Phillip; Kalberg, Wendy O; Marais, Anna-Susan; Robinson, Luther K; Manning, Melanie A; Blankenship, Jason; Buckley, David; Hoyme, H Eugene; Adnams, Colleen M

2013-06-01

To provide an analysis of multiple predictors of cognitive and behavioral traits for children with fetal alcohol spectrum disorders (FASDs). Multivariate correlation techniques were used with maternal and child data from epidemiologic studies in a community in South Africa. Data on 561 first-grade children with fetal alcohol syndrome (FAS), partial FAS (PFAS), and not FASD and their mothers were analyzed by grouping 19 maternal variables into categories (physical, demographic, childbearing, and drinking) and used in structural equation models (SEMs) to assess correlates of child intelligence (verbal and nonverbal) and behavior. A first SEM using only 7 maternal alcohol use variables to predict cognitive/behavioral traits was statistically significant (B = 3.10, p < .05) but explained only 17.3% of the variance. The second model incorporated multiple maternal variables and was statistically significant explaining 55.3% of the variance. Significantly correlated with low intelligence and problem behavior were demographic (B = 3.83, p < .05) (low maternal education, low socioeconomic status [SES], and rural residence) and maternal physical characteristics (B = 2.70, p < .05) (short stature, small head circumference, and low weight). Childbearing history and alcohol use composites were not statistically significant in the final complex model and were overpowered by SES and maternal physical traits. Although other analytic techniques have amply demonstrated the negative effects of maternal drinking on intelligence and behavior, this highly controlled analysis of multiple maternal influences reveals that maternal demographics and physical traits make a significant enabling or disabling contribution to child functioning in FASD.
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.

PubMed

Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin

2017-01-01

Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.
Analysis and prediction of flow from local source in a river basin using a Neuro-fuzzy modeling tool.

PubMed

Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi

2007-10-01

Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

PubMed Central

Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

2016-01-01

Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330
Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

NASA Astrophysics Data System (ADS)

Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

2018-07-01

Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
Analysis of in vitro fertilization data with multiple outcomes using discrete time-to-event analysis

PubMed Central

Maity, Arnab; Williams, Paige; Ryan, Louise; Missmer, Stacey; Coull, Brent; Hauser, Russ

2014-01-01

In vitro fertilization (IVF) is an increasingly common method of assisted reproductive technology. Because of the careful observation and followup required as part of the procedure, IVF studies provide an ideal opportunity to identify and assess clinical and demographic factors along with environmental exposures that may impact successful reproduction. A major challenge in analyzing data from IVF studies is handling the complexity and multiplicity of outcome, resulting from both multiple opportunities for pregnancy loss within a single IVF cycle in addition to multiple IVF cycles. To date, most evaluations of IVF studies do not make use of full data due to its complex structure. In this paper, we develop statistical methodology for analysis of IVF data with multiple cycles and possibly multiple failure types observed for each individual. We develop a general analysis framework based on a generalized linear modeling formulation that allows implementation of various types of models including shared frailty models, failure specific frailty models, and transitional models, using standard software. We apply our methodology to data from an IVF study conducted at the Brigham and Women’s Hospital, Massachusetts. We also summarize the performance of our proposed methods based on a simulation study. PMID:24317880
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

ERIC Educational Resources Information Center

Camilleri, Liberato; Cefai, Carmel

2013-01-01

Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Statistical inference for Hardy-Weinberg proportions in the presence of missing genotype information.

PubMed

Graffelman, Jan; Sánchez, Milagros; Cook, Samantha; Moreno, Victor

2013-01-01

In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.
Effect of genetic polymorphisms on development of gout.

PubMed

Urano, Wako; Taniguchi, Atsuo; Inoue, Eisuke; Sekita, Chieko; Ichikawa, Naomi; Koseki, Yumi; Kamatani, Naoyuki; Yamanaka, Hisashi

2013-08-01

To validate the association between genetic polymorphisms and gout in Japanese patients, and to investigate the cumulative effects of multiple genetic factors on the development of gout. Subjects were 153 Japanese male patients with gout and 532 male controls. The genotypes of 11 polymorphisms in the 10 genes that have been indicated to be associated with serum uric acid levels or gout were determined. The cumulative effects of the genetic polymorphisms were investigated using a weighted genotype risk score (wGRS) based on the number of risk alleles and the OR for gout. A model to discriminate between patients with gout and controls was constructed by incorporating the wGRS and clinical factors. C statistics method was applied to evaluate the capability of the model to discriminate gout patients from controls. Seven polymorphisms were shown to be associated with gout. The mean wGRS was significantly higher in patients with gout (15.2 ± 2.01) compared to controls (13.4 ± 2.10; p < 0.0001). The C statistic for the model using genetic information alone was 0.72, while the C statistic was 0.81 for the full model that incorporated all genetic and clinical factors. Accumulation of multiple genetic factors is associated with the development of gout. A prediction model for gout that incorporates genetic and clinical factors may be useful for identifying individuals who are at risk of gout.

Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

PubMed Central

Jackson, Dan; White, Ian R; Riley, Richard D

2012-01-01

Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
Statistical Learning is Related to Early Literacy-Related Skills

PubMed Central

Spencer, Mercedes; Kaschak, Michael P.; Jones, John L.; Lonigan, Christopher J.

2015-01-01

It has been demonstrated that statistical learning, or the ability to use statistical information to learn the structure of one’s environment, plays a role in young children’s acquisition of linguistic knowledge. Although most research on statistical learning has focused on language acquisition processes, such as the segmentation of words from fluent speech and the learning of syntactic structure, some recent studies have explored the extent to which individual differences in statistical learning are related to literacy-relevant knowledge and skills. The present study extends on this literature by investigating the relations between two measures of statistical learning and multiple measures of skills that are critical to the development of literacy—oral language, vocabulary knowledge, and phonological processing—within a single model. Our sample included a total of 553 typically developing children from prekindergarten through second grade. Structural equation modeling revealed that statistical learning accounted for a unique portion of the variance in these literacy-related skills. Practical implications for instruction and assessment are discussed. PMID:26478658
Coupled local facilitation and global hydrologic inhibition drive landscape geometry in a patterned peatland

NASA Astrophysics Data System (ADS)

Acharya, S.; Kaplan, D. A.; Casey, S.; Cohen, M. J.; Jawitz, J. W.

2015-05-01

Self-organized landscape patterning can arise in response to multiple processes. Discriminating among alternative patterning mechanisms, particularly where experimental manipulations are untenable, requires process-based models. Previous modeling studies have attributed patterning in the Everglades (Florida, USA) to sediment redistribution and anisotropic soil hydraulic properties. In this work, we tested an alternate theory, the self-organizing-canal (SOC) hypothesis, by developing a cellular automata model that simulates pattern evolution via local positive feedbacks (i.e., facilitation) coupled with a global negative feedback based on hydrology. The model is forced by global hydroperiod that drives stochastic transitions between two patch types: ridge (higher elevation) and slough (lower elevation). We evaluated model performance using multiple criteria based on six statistical and geostatistical properties observed in reference portions of the Everglades landscape: patch density, patch anisotropy, semivariogram ranges, power-law scaling of ridge areas, perimeter area fractal dimension, and characteristic pattern wavelength. Model results showed strong statistical agreement with reference landscapes, but only when anisotropically acting local facilitation was coupled with hydrologic global feedback, for which several plausible mechanisms exist. Critically, the model correctly generated fractal landscapes that had no characteristic pattern wavelength, supporting the invocation of global rather than scale-specific negative feedbacks.
Coupled local facilitation and global hydrologic inhibition drive landscape geometry in a patterned peatland

NASA Astrophysics Data System (ADS)

Acharya, S.; Kaplan, D. A.; Casey, S.; Cohen, M. J.; Jawitz, J. W.

2015-01-01

Self-organized landscape patterning can arise in response to multiple processes. Discriminating among alternative patterning mechanisms, particularly where experimental manipulations are untenable, requires process-based models. Previous modeling studies have attributed patterning in the Everglades (Florida, USA) to sediment redistribution and anisotropic soil hydraulic properties. In this work, we tested an alternate theory, the self-organizing canal (SOC) hypothesis, by developing a cellular automata model that simulates pattern evolution via local positive feedbacks (i.e., facilitation) coupled with a global negative feedback based on hydrology. The model is forced by global hydroperiod that drives stochastic transitions between two patch types: ridge (higher elevation) and slough (lower elevation). We evaluated model performance using multiple criteria based on six statistical and geostatistical properties observed in reference portions of the Everglades landscape: patch density, patch anisotropy, semivariogram ranges, power-law scaling of ridge areas, perimeter area fractal dimension, and characteristic pattern wavelength. Model results showed strong statistical agreement with reference landscapes, but only when anisotropically acting local facilitation was coupled with hydrologic global feedback, for which several plausible mechanisms exist. Critically, the model correctly generated fractal landscapes that had no characteristic pattern wavelength, supporting the invocation of global rather than scale-specific negative feedbacks.
A model-based approach to wildland fire reconstruction using sediment charcoal records

USGS Publications Warehouse

Itter, Malcolm S.; Finley, Andrew O.; Hooten, Mevin B.; Higuera, Philip E.; Marlon, Jennifer R.; Kelly, Ryan; McLachlan, Jason S.

2017-01-01

Lake sediment charcoal records are used in paleoecological analyses to reconstruct fire history, including the identification of past wildland fires. One challenge of applying sediment charcoal records to infer fire history is the separation of charcoal associated with local fire occurrence and charcoal originating from regional fire activity. Despite a variety of methods to identify local fires from sediment charcoal records, an integrated statistical framework for fire reconstruction is lacking. We develop a Bayesian point process model to estimate the probability of fire associated with charcoal counts from individual-lake sediments and estimate mean fire return intervals. A multivariate extension of the model combines records from multiple lakes to reduce uncertainty in local fire identification and estimate a regional mean fire return interval. The univariate and multivariate models are applied to 13 lakes in the Yukon Flats region of Alaska. Both models resulted in similar mean fire return intervals (100–350 years) with reduced uncertainty under the multivariate model due to improved estimation of regional charcoal deposition. The point process model offers an integrated statistical framework for paleofire reconstruction and extends existing methods to infer regional fire history from multiple lake records with uncertainty following directly from posterior distributions.
Accounting for disease modifying therapy in models of clinical progression in multiple sclerosis.

PubMed

Healy, Brian C; Engler, David; Gholipour, Taha; Weiner, Howard; Bakshi, Rohit; Chitnis, Tanuja

2011-04-15

Identifying predictors of clinical progression in patients with relapsing-remitting multiple sclerosis (RRMS) is complicated in the era of disease modifying therapy (DMT) because patients follow many different DMT regimens. To investigate predictors of progression in a treated RRMS sample, a cohort of RRMS patients was prospectively followed in the Comprehensive Longitudinal Investigation of Multiple Sclerosis at the Brigham and Women's Hospital (CLIMB). Enrollment criteria were exposure to either interferon-β (IFN-β, n=164) or glatiramer acetate (GA, n=114) for at least 6 months prior to study entry. Baseline demographic and clinical features were used as candidate predictors of longitudinal clinical change on the Expanded Disability Status Scale (EDSS). We compared three approaches to account for DMT effects in statistical modeling. In all approaches, we analyzed all patients together and stratified based on baseline DMT. Model 1 used all available longitudinal EDSS scores, even those after on-study DMT changes. Model 2 used only clinical observations prior to changing DMT. Model 3 used causal statistical models to identify predictors of clinical change. When all patients were considered using Model 1, patients with a motor symptom as the first relapse had significantly larger change in EDSS scores during follow-up (p=0.04); none of the other clinical or demographic variables significantly predicted change. In Models 2 and 3, results were generally unchanged. DMT modeling choice had a modest impact on the variables classified as predictors of EDSS score change. Importantly, however, interpretation of these predictors is dependent upon modeling choice. Copyright © 2011 Elsevier B.V. All rights reserved.
Application of Linear Mixed-Effects Models in Human Neuroscience Research: A Comparison with Pearson Correlation in Two Auditory Electrophysiology Studies

PubMed Central

Koerner, Tess K.; Zhang, Yang

2017-01-01

Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers. PMID:28264422
The influence of the interactions between anthropogenic activities and multiple ecological factors on land surface temperatures of urban forests

NASA Astrophysics Data System (ADS)

Ren, Y.

2017-12-01

Context Land surface temperatures (LSTs) spatio-temporal distribution pattern of urban forests are influenced by many ecological factors; the identification of interaction between these factors can improve simulations and predictions of spatial patterns of urban cold islands. This quantitative research requires an integrated method that combines multiple sources data with spatial statistical analysis. Objectives The purpose of this study was to clarify urban forest LST influence interaction between anthropogenic activities and multiple ecological factors using cluster analysis of hot and cold spots and Geogdetector model. We introduced the hypothesis that anthropogenic activity interacts with certain ecological factors, and their combination influences urban forests LST. We also assumed that spatio-temporal distributions of urban forest LST should be similar to those of ecological factors and can be represented quantitatively. Methods We used Jinjiang as a representative city in China as a case study. Population density was employed to represent anthropogenic activity. We built up a multi-source data (forest inventory, digital elevation models (DEM), population, and remote sensing imagery) on a unified urban scale to support urban forest LST influence interaction research. Through a combination of spatial statistical analysis results, multi-source spatial data, and Geogdetector model, the interaction mechanisms of urban forest LST were revealed. Results Although different ecological factors have different influences on forest LST, in two periods with different hot spots and cold spots, the patch area and dominant tree species were the main factors contributing to LST clustering in urban forests. The interaction between anthropogenic activity and multiple ecological factors increased LST in urban forest stands, linearly and nonlinearly. Strong interactions between elevation and dominant species were generally observed and were prevalent in either hot or cold spots areas in different years. Conclusions In conclusion, a combination of spatial statistics and GeogDetector models should be effective for quantitatively evaluating interactive relationships among ecological factors, anthropogenic activity and LST.
Development and Validation of a Statistical Shape Modeling-Based Finite Element Model of the Cervical Spine Under Low-Level Multiple Direction Loading Conditions

PubMed Central

Bredbenner, Todd L.; Eliason, Travis D.; Francis, W. Loren; McFarland, John M.; Merkle, Andrew C.; Nicolella, Daniel P.

2014-01-01

Cervical spinal injuries are a significant concern in all trauma injuries. Recent military conflicts have demonstrated the substantial risk of spinal injury for the modern warfighter. Finite element models used to investigate injury mechanisms often fail to examine the effects of variation in geometry or material properties on mechanical behavior. The goals of this study were to model geometric variation for a set of cervical spines, to extend this model to a parametric finite element model, and, as a first step, to validate the parametric model against experimental data for low-loading conditions. Individual finite element models were created using cervical spine (C3–T1) computed tomography data for five male cadavers. Statistical shape modeling (SSM) was used to generate a parametric finite element model incorporating variability of spine geometry, and soft-tissue material property variation was also included. The probabilistic loading response of the parametric model was determined under flexion-extension, axial rotation, and lateral bending and validated by comparison to experimental data. Based on qualitative and quantitative comparison of the experimental loading response and model simulations, we suggest that the model performs adequately under relatively low-level loading conditions in multiple loading directions. In conclusion, SSM methods coupled with finite element analyses within a probabilistic framework, along with the ability to statistically validate the overall model performance, provide innovative and important steps toward describing the differences in vertebral morphology, spinal curvature, and variation in material properties. We suggest that these methods, with additional investigation and validation under injurious loading conditions, will lead to understanding and mitigating the risks of injury in the spine and other musculoskeletal structures. PMID:25506051
ASCS online fault detection and isolation based on an improved MPCA

NASA Astrophysics Data System (ADS)

Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

2014-09-01

Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
Modified two-sources quantum statistical model and multiplicity fluctuation in the finite rapidity region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghosh, D.; Sarkar, S.; Sen, S.

1995-06-01

In this paper the behavior of factorial moments with rapidity window size, which is usually explained in terms of ``intermittency,`` has been interpreted by simple quantum statistical properties of the emitting system using the concept of ``modified two-source model`` as recently proposed by Ghosh and Sarkar [Phys. Lett. B 278, 465 (1992)]. The analysis has been performed using our own data of {sup 16}O-Ag/Br and {sup 24}Mg-Ag/Br interactions at a few tens of GeV energy regime.
Statistical sensor fusion analysis of near-IR polarimetric and thermal imagery for the detection of minelike targets

NASA Astrophysics Data System (ADS)

Weisenseel, Robert A.; Karl, William C.; Castanon, David A.; DiMarzio, Charles A.

1999-02-01

We present an analysis of statistical model based data-level fusion for near-IR polarimetric and thermal data, particularly for the detection of mines and mine-like targets. Typical detection-level data fusion methods, approaches that fuse detections from individual sensors rather than fusing at the level of the raw data, do not account rationally for the relative reliability of different sensors, nor the redundancy often inherent in multiple sensors. Representative examples of such detection-level techniques include logical AND/OR operations on detections from individual sensors and majority vote methods. In this work, we exploit a statistical data model for the detection of mines and mine-like targets to compare and fuse multiple sensor channels. Our purpose is to quantify the amount of knowledge that each polarimetric or thermal channel supplies to the detection process. With this information, we can make reasonable decisions about the usefulness of each channel. We can use this information to improve the detection process, or we can use it to reduce the number of required channels.
Modelling a real-world buried valley system with vertical non-stationarity using multiple-point statistics

NASA Astrophysics Data System (ADS)

He, Xiulan; Sonnenborg, Torben O.; Jørgensen, Flemming; Jensen, Karsten H.

2017-03-01

Stationarity has traditionally been a requirement of geostatistical simulations. A common way to deal with non-stationarity is to divide the system into stationary sub-regions and subsequently merge the realizations for each region. Recently, the so-called partition approach that has the flexibility to model non-stationary systems directly was developed for multiple-point statistics simulation (MPS). The objective of this study is to apply the MPS partition method with conventional borehole logs and high-resolution airborne electromagnetic (AEM) data, for simulation of a real-world non-stationary geological system characterized by a network of connected buried valleys that incise deeply into layered Miocene sediments (case study in Denmark). The results show that, based on fragmented information of the formation boundaries, the MPS partition method is able to simulate a non-stationary system including valley structures embedded in a layered Miocene sequence in a single run. Besides, statistical information retrieved from the AEM data improved the simulation of the geology significantly, especially for the deep-seated buried valley sediments where borehole information is sparse.
No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

PubMed

Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

2016-05-13

It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.
Estimating times of surgeries with two component procedures: comparison of the lognormal and normal models.

PubMed

Strum, David P; May, Jerrold H; Sampson, Allan R; Vargas, Luis G; Spangler, William E

2003-01-01

Variability inherent in the duration of surgical procedures complicates surgical scheduling. Modeling the duration and variability of surgeries might improve time estimates. Accurate time estimates are important operationally to improve utilization, reduce costs, and identify surgeries that might be considered outliers. Surgeries with multiple procedures are difficult to model because they are difficult to segment into homogenous groups and because they are performed less frequently than single-procedure surgeries. The authors studied, retrospectively, 10,740 surgeries each with exactly two CPTs and 46,322 surgical cases with only one CPT from a large teaching hospital to determine if the distribution of dual-procedure surgery times fit more closely a lognormal or a normal model. The authors tested model goodness of fit to their data using Shapiro-Wilk tests, studied factors affecting the variability of time estimates, and examined the impact of coding permutations (ordered combinations) on modeling. The Shapiro-Wilk tests indicated that the lognormal model is statistically superior to the normal model for modeling dual-procedure surgeries. Permutations of component codes did not appear to differ significantly with respect to total procedure time and surgical time. To improve individual models for infrequent dual-procedure surgeries, permutations may be reduced and estimates may be based on the longest component procedure and type of anesthesia. The authors recommend use of the lognormal model for estimating surgical times for surgeries with two component procedures. Their results help legitimize the use of log transforms to normalize surgical procedure times prior to hypothesis testing using linear statistical models. Multiple-procedure surgeries may be modeled using the longest (statistically most important) component procedure and type of anesthesia.
Does rational selection of training and test sets improve the outcome of QSAR modeling?

PubMed

Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander

2012-10-22

Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Statistical label fusion with hierarchical performance models

PubMed Central

Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.

2014-01-01

Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809
Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

PubMed

Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

2005-09-01

To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.
Multi-modal data fusion using source separation: Two effective models based on ICA and IVA and their properties

PubMed Central

Adali, Tülay; Levin-Schwartz, Yuri; Calhoun, Vince D.

2015-01-01

Fusion of information from multiple sets of data in order to extract a set of features that are most useful and relevant for the given task is inherent to many problems we deal with today. Since, usually, very little is known about the actual interaction among the datasets, it is highly desirable to minimize the underlying assumptions. This has been the main reason for the growing importance of data-driven methods, and in particular of independent component analysis (ICA) as it provides useful decompositions with a simple generative model and using only the assumption of statistical independence. A recent extension of ICA, independent vector analysis (IVA) generalizes ICA to multiple datasets by exploiting the statistical dependence across the datasets, and hence, as we discuss in this paper, provides an attractive solution to fusion of data from multiple datasets along with ICA. In this paper, we focus on two multivariate solutions for multi-modal data fusion that let multiple modalities fully interact for the estimation of underlying features that jointly report on all modalities. One solution is the Joint ICA model that has found wide application in medical imaging, and the second one is the the Transposed IVA model introduced here as a generalization of an approach based on multi-set canonical correlation analysis. In the discussion, we emphasize the role of diversity in the decompositions achieved by these two models, present their properties and implementation details to enable the user make informed decisions on the selection of a model along with its associated parameters. Discussions are supported by simulation results to help highlight the main issues in the implementation of these methods. PMID:26525830
Statistical Downscaling and Bias Correction of Climate Model Outputs for Climate Change Impact Assessment in the U.S. Northeast

NASA Technical Reports Server (NTRS)

Ahmed, Kazi Farzan; Wang, Guiling; Silander, John; Wilson, Adam M.; Allen, Jenica M.; Horton, Radley; Anyah, Richard

2013-01-01

Statistical downscaling can be used to efficiently downscale a large number of General Circulation Model (GCM) outputs to a fine temporal and spatial scale. To facilitate regional impact assessments, this study statistically downscales (to 1/8deg spatial resolution) and corrects the bias of daily maximum and minimum temperature and daily precipitation data from six GCMs and four Regional Climate Models (RCMs) for the northeast United States (US) using the Statistical Downscaling and Bias Correction (SDBC) approach. Based on these downscaled data from multiple models, five extreme indices were analyzed for the future climate to quantify future changes of climate extremes. For a subset of models and indices, results based on raw and bias corrected model outputs for the present-day climate were compared with observations, which demonstrated that bias correction is important not only for GCM outputs, but also for RCM outputs. For future climate, bias correction led to a higher level of agreements among the models in predicting the magnitude and capturing the spatial pattern of the extreme climate indices. We found that the incorporation of dynamical downscaling as an intermediate step does not lead to considerable differences in the results of statistical downscaling for the study domain.

Application of Bayesian methods to habitat selection modeling of the northern spotted owl in California: new statistical methods for wildlife research

Treesearch

Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk

2005-01-01

We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...
Individual Change and the Timing and Onset of Important Life Events: Methods, Models, and Assumptions

ERIC Educational Resources Information Center

Grimm, Kevin; Marcoulides, Katerina

2016-01-01

Researchers are often interested in studying how the timing of a specific event affects concurrent and future development. When faced with such research questions there are multiple statistical models to consider and those models are the focus of this paper as well as their theoretical underpinnings and assumptions regarding the nature of the…
Groundwater-level prediction using multiple linear regression and artificial neural network techniques: a comparative assessment

NASA Astrophysics Data System (ADS)

Sahoo, Sasmita; Jha, Madan K.

2013-12-01

The potential of multiple linear regression (MLR) and artificial neural network (ANN) techniques in predicting transient water levels over a groundwater basin were compared. MLR and ANN modeling was carried out at 17 sites in Japan, considering all significant inputs: rainfall, ambient temperature, river stage, 11 seasonal dummy variables, and influential lags of rainfall, ambient temperature, river stage and groundwater level. Seventeen site-specific ANN models were developed, using multi-layer feed-forward neural networks trained with Levenberg-Marquardt backpropagation algorithms. The performance of the models was evaluated using statistical and graphical indicators. Comparison of the goodness-of-fit statistics of the MLR models with those of the ANN models indicated that there is better agreement between the ANN-predicted groundwater levels and the observed groundwater levels at all the sites, compared to the MLR. This finding was supported by the graphical indicators and the residual analysis. Thus, it is concluded that the ANN technique is superior to the MLR technique in predicting spatio-temporal distribution of groundwater levels in a basin. However, considering the practical advantages of the MLR technique, it is recommended as an alternative and cost-effective groundwater modeling tool.
A model of distributed phase aberration for deblurring phase estimated from scattering.

PubMed

Tillett, Jason C; Astheimer, Jeffrey P; Waag, Robert C

2010-01-01

Correction of aberration in ultrasound imaging uses the response of a point reflector or its equivalent to characterize the aberration. Because a point reflector is usually unavailable, its equivalent is obtained using statistical methods, such as processing reflections from multiple focal regions in a random medium. However, the validity of methods that use reflections from multiple points is limited to isoplanatic patches for which the aberration is essentially the same. In this study, aberration is modeled by an offset phase screen to relax the isoplanatic restriction. Methods are developed to determine the depth and phase of the screen and to use the model for compensation of aberration as the beam is steered. Use of the model to enhance the performance of the noted statistical estimation procedure is also described. Experimental results obtained with tissue-mimicking phantoms that implement different models and produce different amounts of aberration are presented to show the efficacy of these methods. The improvement in b-scan resolution realized with the model is illustrated. The results show that the isoplanatic patch assumption for estimation of aberration can be relaxed and that propagation-path characteristics and aberration estimation are closely related.
Modeling Stochastic Kinetics of Molecular Machines at Multiple Levels: From Molecules to Modules

PubMed Central

Chowdhury, Debashish

2013-01-01

A molecular machine is either a single macromolecule or a macromolecular complex. In spite of the striking superficial similarities between these natural nanomachines and their man-made macroscopic counterparts, there are crucial differences. Molecular machines in a living cell operate stochastically in an isothermal environment far from thermodynamic equilibrium. In this mini-review we present a catalog of the molecular machines and an inventory of the essential toolbox for theoretically modeling these machines. The tool kits include 1), nonequilibrium statistical-physics techniques for modeling machines and machine-driven processes; and 2), statistical-inference methods for reverse engineering a functional machine from the empirical data. The cell is often likened to a microfactory in which the machineries are organized in modular fashion; each module consists of strongly coupled multiple machines, but different modules interact weakly with each other. This microfactory has its own automated supply chain and delivery system. Buoyed by the success achieved in modeling individual molecular machines, we advocate integration of these models in the near future to develop models of functional modules. A system-level description of the cell from the perspective of molecular machinery (the mechanome) is likely to emerge from further integrations that we envisage here. PMID:23746505
Linking stressors and ecological responses

USGS Publications Warehouse

Gentile, J.H.; Solomon, K.R.; Butcher, J.B.; Harrass, M.; Landis, W.G.; Power, M.; Rattner, B.A.; Warren-Hicks, W.J.; Wenger, R.; Foran, Jeffery A.; Ferenc, Susan A.

1999-01-01

To characterize risk, it is necessary to quantify the linkages and interactions between chemical, physical and biological stressors and endpoints in the conceptual framework for ecological risk assessment (ERA). This can present challenges in a multiple stressor analysis, and it will not always be possible to develop a quantitative stressor-response profile. This review commences with a conceptual representation of the problem of developing a linkage analysis for multiple stressors and responses. The remainder of the review surveys a variety of mathematical and statistical methods (e.g., ranking methods, matrix models, multivariate dose-response for mixtures, indices, visualization, simulation modeling and decision-oriented methods) for accomplishing the linkage analysis for multiple stressors. Describing the relationships between multiple stressors and ecological effects are critical components of 'effects assessment' in the ecological risk assessment framework.
Modeling uncertainty of evapotranspiration measurements from multiple eddy covariance towers over a crop canopy

USDA-ARS?s Scientific Manuscript database

All measurements have random error associated with them. With fluxes in an eddy covariance system, measurement error can been modelled in several ways, often involving a statistical description of turbulence at its core. Using a field experiment with four towers, we generated four replicates of meas...
Multiple point statistical simulation using uncertain (soft) conditional data

NASA Astrophysics Data System (ADS)

Hansen, Thomas Mejer; Vu, Le Thanh; Mosegaard, Klaus; Cordua, Knud Skou

2018-05-01

Geostatistical simulation methods have been used to quantify spatial variability of reservoir models since the 80s. In the last two decades, state of the art simulation methods have changed from being based on covariance-based 2-point statistics to multiple-point statistics (MPS), that allow simulation of more realistic Earth-structures. In addition, increasing amounts of geo-information (geophysical, geological, etc.) from multiple sources are being collected. This pose the problem of integration of these different sources of information, such that decisions related to reservoir models can be taken on an as informed base as possible. In principle, though difficult in practice, this can be achieved using computationally expensive Monte Carlo methods. Here we investigate the use of sequential simulation based MPS simulation methods conditional to uncertain (soft) data, as a computational efficient alternative. First, it is demonstrated that current implementations of sequential simulation based on MPS (e.g. SNESIM, ENESIM and Direct Sampling) do not account properly for uncertain conditional information, due to a combination of using only co-located information, and a random simulation path. Then, we suggest two approaches that better account for the available uncertain information. The first make use of a preferential simulation path, where more informed model parameters are visited preferentially to less informed ones. The second approach involves using non co-located uncertain information. For different types of available data, these approaches are demonstrated to produce simulation results similar to those obtained by the general Monte Carlo based approach. These methods allow MPS simulation to condition properly to uncertain (soft) data, and hence provides a computationally attractive approach for integration of information about a reservoir model.
VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.

PubMed

Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro

2016-01-01

In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.
A location-based multiple point statistics method: modelling the reservoir with non-stationary characteristics

NASA Astrophysics Data System (ADS)

Yin, Yanshu; Feng, Wenjie

2017-12-01

In this paper, a location-based multiple point statistics method is developed to model a non-stationary reservoir. The proposed method characterizes the relationship between the sedimentary pattern and the deposit location using the relative central position distance function, which alleviates the requirement that the training image and the simulated grids have the same dimension. The weights in every direction of the distance function can be changed to characterize the reservoir heterogeneity in various directions. The local integral replacements of data events, structured random path, distance tolerance and multi-grid strategy are applied to reproduce the sedimentary patterns and obtain a more realistic result. This method is compared with the traditional Snesim method using a synthesized 3-D training image of Poyang Lake and a reservoir model of Shengli Oilfield in China. The results indicate that the new method can reproduce the non-stationary characteristics better than the traditional method and is more suitable for simulation of delta-front deposits. These results show that the new method is a powerful tool for modelling a reservoir with non-stationary characteristics.
A first principles calculation and statistical mechanics modeling of defects in Al-H system

NASA Astrophysics Data System (ADS)

Ji, Min; Wang, Cai-Zhuang; Ho, Kai-Ming

2007-03-01

The behavior of defects and hydrogen in Al was investigated by first principles calculations and statistical mechanics modeling. The formation energy of different defects in Al+H system such as Al vacancy, H in institution and multiple H in Al vacancy were calculated by first principles method. Defect concentration in thermodynamical equilibrium was studied by total free energy calculation including configuration entropy and defect-defect interaction from low concentration limit to hydride limit. In our grand canonical ensemble model, hydrogen chemical potential under different environment plays an important role in determing the defect concentration and properties in Al-H system.
Sensitivity study of experimental measures for the nuclear liquid-gas phase transition in the statistical multifragmentation model

NASA Astrophysics Data System (ADS)

Lin, W.; Ren, P.; Zheng, H.; Liu, X.; Huang, M.; Wada, R.; Qu, G.

2018-05-01

The experimental measures of the multiplicity derivatives—the moment parameters, the bimodal parameter, the fluctuation of maximum fragment charge number (normalized variance of Zmax, or NVZ), the Fisher exponent (τ ), and the Zipf law parameter (ξ )—are examined to search for the liquid-gas phase transition in nuclear multifragmention processes within the framework of the statistical multifragmentation model (SMM). The sensitivities of these measures are studied. All these measures predict a critical signature at or near to the critical point both for the primary and secondary fragments. Among these measures, the total multiplicity derivative and the NVZ provide accurate measures for the critical point from the final cold fragments as well as the primary fragments. The present study will provide a guide for future experiments and analyses in the study of the nuclear liquid-gas phase transition.
Regression modeling of ground-water flow

USGS Publications Warehouse

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Fine-mapping additive and dominant SNP effects using group-LASSO and Fractional Resample Model Averaging

PubMed Central

Sabourin, Jeremy; Nobel, Andrew B.; Valdar, William

2014-01-01

Genomewide association studies sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple SNPs simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow-up studies. Current multi-SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA-dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights; it estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing a SNP prioritization that best identifies underlying true signals, we show that: our method easily outperforms a single marker analysis; when additive-only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive-only effects; and, when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation. PMID:25417853
QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

PubMed

Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

2013-03-01

A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin

Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

DOE PAGES

Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...

2017-05-15

Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

PubMed

Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

2010-11-01

The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.
Quantifying and Generalizing Hydrologic Responses to Dam Regulation using a Statistical Modeling Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

McManamay, Ryan A

2014-01-01

Despite the ubiquitous existence of dams within riverscapes, much of our knowledge about dams and their environmental effects remains context-specific. Hydrology, more than any other environmental variable, has been studied in great detail with regard to dam regulation. While much progress has been made in generalizing the hydrologic effects of regulation by large dams, many aspects of hydrology show site-specific fidelity to dam operations, small dams (including diversions), and regional hydrologic regimes. A statistical modeling framework is presented to quantify and generalize hydrologic responses to varying degrees of dam regulation. Specifically, the objectives were to 1) compare the effects ofmore » local versus cumulative dam regulation, 2) determine the importance of different regional hydrologic regimes in influencing hydrologic responses to dams, and 3) evaluate how different regulation contexts lead to error in predicting hydrologic responses to dams. Overall, model performance was poor in quantifying the magnitude of hydrologic responses, but performance was sufficient in classifying hydrologic responses as negative or positive. Responses of some hydrologic indices to dam regulation were highly dependent upon hydrologic class membership and the purpose of the dam. The opposing coefficients between local and cumulative-dam predictors suggested that hydrologic responses to cumulative dam regulation are complex, and predicting the hydrology downstream of individual dams, as opposed to multiple dams, may be more easy accomplished using statistical approaches. Results also suggested that particular contexts, including multipurpose dams, high cumulative regulation by multiple dams, diversions, close proximity to dams, and certain hydrologic classes are all sources of increased error when predicting hydrologic responses to dams. Statistical models, such as the ones presented herein, show promise in their ability to model the effects of dam regulation effects at large spatial scales as to generalize the directionality of hydrologic responses.« less
Markov chains and semi-Markov models in time-to-event analysis.

PubMed

Abner, Erin L; Charnigo, Richard J; Kryscio, Richard J

2013-10-25

A variety of statistical methods are available to investigators for analysis of time-to-event data, often referred to as survival analysis. Kaplan-Meier estimation and Cox proportional hazards regression are commonly employed tools but are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. Markov chain models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty, and non-constant survival probabilities. Markov chain models, though often overlooked by investigators in time-to-event analysis, have long been used in clinical studies and have widespread application in other fields.

Markov chains and semi-Markov models in time-to-event analysis

PubMed Central

Abner, Erin L.; Charnigo, Richard J.; Kryscio, Richard J.

2014-01-01

A variety of statistical methods are available to investigators for analysis of time-to-event data, often referred to as survival analysis. Kaplan-Meier estimation and Cox proportional hazards regression are commonly employed tools but are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. Markov chain models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty, and non-constant survival probabilities. Markov chain models, though often overlooked by investigators in time-to-event analysis, have long been used in clinical studies and have widespread application in other fields. PMID:24818062
An integrated data model to estimate spatiotemporal occupancy, abundance, and colonization dynamics

USGS Publications Warehouse

Williams, Perry J.; Hooten, Mevin B.; Womble, Jamie N.; Esslinger, George G.; Bower, Michael R.; Hefley, Trevor J.

2017-01-01

Ecological invasions and colonizations occur dynamically through space and time. Estimating the distribution and abundance of colonizing species is critical for efficient management or conservation. We describe a statistical framework for simultaneously estimating spatiotemporal occupancy and abundance dynamics of a colonizing species. Our method accounts for several issues that are common when modeling spatiotemporal ecological data including multiple levels of detection probability, multiple data sources, and computational limitations that occur when making fine-scale inference over a large spatiotemporal domain. We apply the model to estimate the colonization dynamics of sea otters (Enhydra lutris) in Glacier Bay, in southeastern Alaska.
Prediction of crime occurrence from multi-modal data using deep learning

PubMed Central

Kang, Hyeon-Woo

2017-01-01

In recent years, various studies have been conducted on the prediction of crime occurrences. This predictive capability is intended to assist in crime prevention by facilitating effective implementation of police patrols. Previous studies have used data from multiple domains such as demographics, economics, and education. Their prediction models treat data from different domains equally. These methods have problems in crime occurrence prediction, such as difficulty in discovering highly nonlinear relationships, redundancies, and dependencies between multiple datasets. In order to enhance crime prediction models, we consider environmental context information, such as broken windows theory and crime prevention through environmental design. In this paper, we propose a feature-level data fusion method with environmental context based on a deep neural network (DNN). Our dataset consists of data collected from various online databases of crime statistics, demographic and meteorological data, and images in Chicago, Illinois. Prior to generating training data, we select crime-related data by conducting statistical analyses. Finally, we train our DNN, which consists of the following four kinds of layers: spatial, temporal, environmental context, and joint feature representation layers. Coupled with crucial data extracted from various domains, our fusion DNN is a product of an efficient decision-making process that statistically analyzes data redundancy. Experimental performance results show that our DNN model is more accurate in predicting crime occurrence than other prediction models. PMID:28437486
Prediction of crime occurrence from multi-modal data using deep learning.

PubMed

Kang, Hyeon-Woo; Kang, Hang-Bong

2017-01-01

In recent years, various studies have been conducted on the prediction of crime occurrences. This predictive capability is intended to assist in crime prevention by facilitating effective implementation of police patrols. Previous studies have used data from multiple domains such as demographics, economics, and education. Their prediction models treat data from different domains equally. These methods have problems in crime occurrence prediction, such as difficulty in discovering highly nonlinear relationships, redundancies, and dependencies between multiple datasets. In order to enhance crime prediction models, we consider environmental context information, such as broken windows theory and crime prevention through environmental design. In this paper, we propose a feature-level data fusion method with environmental context based on a deep neural network (DNN). Our dataset consists of data collected from various online databases of crime statistics, demographic and meteorological data, and images in Chicago, Illinois. Prior to generating training data, we select crime-related data by conducting statistical analyses. Finally, we train our DNN, which consists of the following four kinds of layers: spatial, temporal, environmental context, and joint feature representation layers. Coupled with crucial data extracted from various domains, our fusion DNN is a product of an efficient decision-making process that statistically analyzes data redundancy. Experimental performance results show that our DNN model is more accurate in predicting crime occurrence than other prediction models.
Maternal Factors Predicting Cognitive and Behavioral Characteristics of Children with Fetal Alcohol Spectrum Disorders

PubMed Central

May, Philip A.; Tabachnick, Barbara G.; Gossage, J. Phillip; Kalberg, Wendy O.; Marais, Anna-Susan; Robinson, Luther K.; Manning, Melanie A.; Blankenship, Jason; Buckley, David; Hoyme, H. Eugene; Adnams, Colleen M.

2013-01-01

Objective To provide an analysis of multiple predictors of cognitive and behavioral traits for children with fetal alcohol spectrum disorders (FASD). Method Multivariate correlation techniques were employed with maternal and child data from epidemiologic studies in a community in South Africa. Data on 561 first grade children with fetal alcohol syndrome (FAS), partial FAS (PFAS), and not FASD and their mothers were analyzed by grouping 19 maternal variables into categories (physical, demographic, childbearing, and drinking) and employed in structural equation models (SEM) to assess correlates of child intelligence (verbal and non-verbal) and behavior. Results A first SEM utilizing only seven maternal alcohol use variables to predict cognitive/behavioral traits was statistically significant (B = 3.10, p < .05), but explained only 17.3% of the variance. The second model incorporated multiple maternal variables and was statistically significant explaining 55.3% of the variance. Significantly correlated with low intelligence and problem behavior were demographic (B = 3.83, p < .05) (low maternal education, low socioeconomic status (SES), and rural residence) and maternal physical characteristics (B = 2.70, p < .05) (short stature, small head circumference, and low weight). Childbearing history and alcohol use composites were not statistically significant in the final complex model, and were overpowered by SES and maternal physical traits. Conclusions While other analytic techniques have amply demonstrated the negative effects of maternal drinking on intelligence and behavior, this highly-controlled analysis of multiple maternal influences reveals that maternal demographics and physical traits make a significant enabling or disabling contribution to child functioning in FASD. PMID:23751886
A Non-Gaussian Stock Price Model: Options, Credit and a Multi-Timescale Memory

NASA Astrophysics Data System (ADS)

Borland, L.

We review a recently proposed model of stock prices, based on astatistical feedback model that results in a non-Gaussian distribution of price changes. Applications to option pricing and the pricing of debt is discussed. A generalization to account for feedback effects over multiple timescales is also presented. This model reproduces most of the stylized facts (ie statistical anomalies) observed in real financial markets.
Statistical Analysis of Atmospheric Forecast Model Accuracy - A Focus on Multiple Atmospheric Variables and Location-Based Analysis

DTIC Science & Technology

2014-04-01

WRF ) model is a numerical weather prediction system designed for operational forecasting and atmospheric research. This report examined WRF model... WRF , weather research and forecasting, atmospheric effects 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT SAR 18. NUMBER OF...and Forecasting ( WRF ) model. The authors would also like to thank Ms. Sherry Larson, STS Systems Integration, LLC, ARL Technical Publishing Branch
Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models.

PubMed

Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A; Tong, Xiaoren; Jadhav, Sneha; Lu, Qing

2016-04-01

Recent technological advances equipped researchers with capabilities that go beyond traditional genotyping of loci known to be polymorphic in a general population. Genetic sequences of study participants can now be assessed directly. This capability removed technology-driven bias toward scoring predominantly common polymorphisms and let researchers reveal a wealth of rare and sample-specific variants. Although the relative contributions of rare and common polymorphisms to trait variation are being debated, researchers are faced with the need for new statistical tools for simultaneous evaluation of all variants within a region. Several research groups demonstrated flexibility and good statistical power of the functional linear model approach. In this work we extend previous developments to allow inclusion of multiple traits and adjustment for additional covariates. Our functional approach is unique in that it provides a nuanced depiction of effects and interactions for the variables in the model by representing them as curves varying over a genetic region. We demonstrate flexibility and competitive power of our approach by contrasting its performance with commonly used statistical tools and illustrate its potential for discovery and characterization of genetic architecture of complex traits using sequencing data from the Dallas Heart Study. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Models for Scoring Missing Responses to Multiple-Choice Items. Program Statistics Research Technical Report No. 94-1.

ERIC Educational Resources Information Center

Longford, Nicholas T.

This study is a critical evaluation of the roles for coding and scoring of missing responses to multiple-choice items in educational tests. The focus is on tests in which the test-takers have little or no motivation; in such tests omitting and not reaching (as classified by the currently adopted operational rules) is quite frequent. Data from the…
A Survey of Insider Attack Detection Research

DTIC Science & Technology

2008-08-25

modeling of statistical features , such as the frequency of events, the duration of events, the co-occurrence of multiple events combined through...forms of attack that have been reported [Error! Reference source not found.]. For example: • Unauthorized extraction , duplication, or exfiltration...network level. Schultz pointed out that not one approach will work but solutions need to be based on multiple sensors to be able to find any combination
An evaluation of talker localization based on direction of arrival estimation and statistical sound source identification

NASA Astrophysics Data System (ADS)

Nishiura, Takanobu; Nakamura, Satoshi

2002-11-01

It is very important to capture distant-talking speech for a hands-free speech interface with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization algorithms in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization algorithm consisting of two algorithms. One is DOA (direction of arrival) estimation algorithm for multiple sound source localization based on CSP (cross-power spectrum phase) coefficient addition method. The other is statistical sound source identification algorithm based on GMM (Gaussian mixture model) for localizing the target talker position among localized multiple sound sources. In this paper, we particularly focus on the talker localization performance based on the combination of these two algorithms with a microphone array. We conducted evaluation experiments in real noisy reverberant environments. As a result, we confirmed that multiple sound signals can be identified accurately between ''speech'' or ''non-speech'' by the proposed algorithm. [Work supported by ATR, and MEXT of Japan.
Statistical Evaluation of Time Series Analysis Techniques

NASA Technical Reports Server (NTRS)

Benignus, V. A.

1973-01-01

The performance of a modified version of NASA's multivariate spectrum analysis program is discussed. A multiple regression model was used to make the revisions. Performance improvements were documented and compared to the standard fast Fourier transform by Monte Carlo techniques.
Evaluating the decision accuracy and speed of clinical data visualizations.

PubMed

Pieczkiewicz, David S; Finkelstein, Stanley M

2010-01-01

Clinicians face an increasing volume of biomedical data. Assessing the efficacy of systems that enable accurate and timely clinical decision making merits corresponding attention. This paper discusses the multiple-reader multiple-case (MRMC) experimental design and linear mixed models as means of assessing and comparing decision accuracy and latency (time) for decision tasks in which clinician readers must interpret visual displays of data. These tools can assess and compare decision accuracy and latency (time). These experimental and statistical techniques, used extensively in radiology imaging studies, offer a number of practical and analytic advantages over more traditional quantitative methods such as percent-correct measurements and ANOVAs, and are recommended for their statistical efficiency and generalizability. An example analysis using readily available, free, and commercial statistical software is provided as an appendix. While these techniques are not appropriate for all evaluation questions, they can provide a valuable addition to the evaluative toolkit of medical informatics research.
Analysing and correcting the differences between multi-source and multi-scale spatial remote sensing observations.

PubMed

Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

2014-01-01

Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation.
Analysing and Correcting the Differences between Multi-Source and Multi-Scale Spatial Remote Sensing Observations

PubMed Central

Dong, Yingying; Luo, Ruisen; Feng, Haikuan; Wang, Jihua; Zhao, Jinling; Zhu, Yining; Yang, Guijun

2014-01-01

Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation. PMID:25405760
[Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

PubMed

Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

2016-05-01

Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Incremental Implicit Learning of Bundles of Statistical Patterns

PubMed Central

Qian, Ting; Jaeger, T. Florian; Aslin, Richard N.

2016-01-01

Forming an accurate representation of a task environment often takes place incrementally as the information relevant to learning the representation only unfolds over time. This incremental nature of learning poses an important problem: it is usually unclear whether a sequence of stimuli consists of only a single pattern, or multiple patterns that are spliced together. In the former case, the learner can directly use each observed stimulus to continuously revise its representation of the task environment. In the latter case, however, the learner must first parse the sequence of stimuli into different bundles, so as to not conflate the multiple patterns. We created a video-game statistical learning paradigm and investigated 1) whether learners without prior knowledge of the existence of multiple “stimulus bundles” — subsequences of stimuli that define locally coherent statistical patterns — could detect their presence in the input, and 2) whether learners are capable of constructing a rich representation that encodes the various statistical patterns associated with bundles. By comparing human learning behavior to the predictions of three computational models, we find evidence that learners can handle both tasks successfully. In addition, we discuss the underlying reasons for why the learning of stimulus bundles occurs even when such behavior may seem irrational. PMID:27639552
A method for automatic feature points extraction of human vertebrae three-dimensional model

NASA Astrophysics Data System (ADS)

Wu, Zhen; Wu, Junsheng

2017-05-01

A method for automatic extraction of the feature points of the human vertebrae three-dimensional model is presented. Firstly, the statistical model of vertebrae feature points is established based on the results of manual vertebrae feature points extraction. Then anatomical axial analysis of the vertebrae model is performed according to the physiological and morphological characteristics of the vertebrae. Using the axial information obtained from the analysis, a projection relationship between the statistical model and the vertebrae model to be extracted is established. According to the projection relationship, the statistical model is matched with the vertebrae model to get the estimated position of the feature point. Finally, by analyzing the curvature in the spherical neighborhood with the estimated position of feature points, the final position of the feature points is obtained. According to the benchmark result on multiple test models, the mean relative errors of feature point positions are less than 5.98%. At more than half of the positions, the error rate is less than 3% and the minimum mean relative error is 0.19%, which verifies the effectiveness of the method.
Statistical Analysis of CFD Solutions From the Fifth AIAA Drag Prediction Workshop

NASA Technical Reports Server (NTRS)

Morrison, Joseph H.

2013-01-01

A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using a common grid sequence and multiple turbulence models for the June 2012 fifth Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was the Common Research Model subsonic transport wing-body previously used for the 4th Drag Prediction Workshop. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.
Normality of raw data in general linear models: The most widespread myth in statistics

USGS Publications Warehouse

Kery, Marc; Hatfield, Jeff S.

2003-01-01

In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.

deltaGseg: macrostate estimation via molecular dynamics simulations and multiscale time series analysis.

PubMed

Low, Diana H P; Motakis, Efthymios

2013-10-01

Binding free energy calculations obtained through molecular dynamics simulations reflect intermolecular interaction states through a series of independent snapshots. Typically, the free energies of multiple simulated series (each with slightly different starting conditions) need to be estimated. Previous approaches carry out this task by moving averages at certain decorrelation times, assuming that the system comes from a single conformation description of binding events. Here, we discuss a more general approach that uses statistical modeling, wavelets denoising and hierarchical clustering to estimate the significance of multiple statistically distinct subpopulations, reflecting potential macrostates of the system. We present the deltaGseg R package that performs macrostate estimation from multiple replicated series and allows molecular biologists/chemists to gain physical insight into the molecular details that are not easily accessible by experimental techniques. deltaGseg is a Bioconductor R package available at http://bioconductor.org/packages/release/bioc/html/deltaGseg.html.
Statistical Power in Evaluations That Investigate Effects on Multiple Outcomes: A Guide for Researchers

ERIC Educational Resources Information Center

Porter, Kristin E.

2018-01-01

Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical…
Application of Multiregressive Linear Models, Dynamic Kriging Models and Neural Network Models to Predictive Maintenance of Hydroelectric Power Systems

NASA Astrophysics Data System (ADS)

Lucifredi, A.; Mazzieri, C.; Rossi, M.

2000-05-01

Since the operational conditions of a hydroelectric unit can vary within a wide range, the monitoring system must be able to distinguish between the variations of the monitored variable caused by variations of the operation conditions and those due to arising and progressing of failures and misoperations. The paper aims to identify the best technique to be adopted for the monitoring system. Three different methods have been implemented and compared. Two of them use statistical techniques: the first, the linear multiple regression, expresses the monitored variable as a linear function of the process parameters (independent variables), while the second, the dynamic kriging technique, is a modified technique of multiple linear regression representing the monitored variable as a linear combination of the process variables in such a way as to minimize the variance of the estimate error. The third is based on neural networks. Tests have shown that the monitoring system based on the kriging technique is not affected by some problems common to the other two models e.g. the requirement of a large amount of data for their tuning, both for training the neural network and defining the optimum plane for the multiple regression, not only in the system starting phase but also after a trivial operation of maintenance involving the substitution of machinery components having a direct impact on the observed variable. Or, in addition, the necessity of different models to describe in a satisfactory way the different ranges of operation of the plant. The monitoring system based on the kriging statistical technique overrides the previous difficulties: it does not require a large amount of data to be tuned and is immediately operational: given two points, the third can be immediately estimated; in addition the model follows the system without adapting itself to it. The results of the experimentation performed seem to indicate that a model based on a neural network or on a linear multiple regression is not optimal, and that a different approach is necessary to reduce the amount of work during the learning phase using, when available, all the information stored during the initial phase of the plant to build the reference baseline, elaborating, if it is the case, the raw information available. A mixed approach using the kriging statistical technique and neural network techniques could optimise the result.
Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

PubMed

Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

2009-07-01

Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
Combining information from multiple flood projections in a hierarchical Bayesian framework

NASA Astrophysics Data System (ADS)

Le Vine, Nataliya

2016-04-01

This study demonstrates, in the context of flood frequency analysis, the potential of a recently proposed hierarchical Bayesian approach to combine information from multiple models. The approach explicitly accommodates shared multimodel discrepancy as well as the probabilistic nature of the flood estimates, and treats the available models as a sample from a hypothetical complete (but unobserved) set of models. The methodology is applied to flood estimates from multiple hydrological projections (the Future Flows Hydrology data set) for 135 catchments in the UK. The advantages of the approach are shown to be: (1) to ensure adequate "baseline" with which to compare future changes; (2) to reduce flood estimate uncertainty; (3) to maximize use of statistical information in circumstances where multiple weak predictions individually lack power, but collectively provide meaningful information; (4) to diminish the importance of model consistency when model biases are large; and (5) to explicitly consider the influence of the (model performance) stationarity assumption. Moreover, the analysis indicates that reducing shared model discrepancy is the key to further reduction of uncertainty in the flood frequency analysis. The findings are of value regarding how conclusions about changing exposure to flooding are drawn, and to flood frequency change attribution studies.
Quest for consistent modelling of statistical decay of the compound nucleus

NASA Astrophysics Data System (ADS)

Banerjee, Tathagata; Nath, S.; Pal, Santanu

2018-01-01

A statistical model description of heavy ion induced fusion-fission reactions is presented where shell effects, collective enhancement of level density, tilting away effect of compound nuclear spin and dissipation are included. It is shown that the inclusion of all these effects provides a consistent picture of fission where fission hindrance is required to explain the experimental values of both pre-scission neutron multiplicities and evaporation residue cross-sections in contrast to some of the earlier works where a fission hindrance is required for pre-scission neutrons but a fission enhancement for evaporation residue cross-sections.
Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression

ERIC Educational Resources Information Center

Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W.

2012-01-01

Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…
SAR Speckle Noise Reduction Using Wiener Filter

NASA Technical Reports Server (NTRS)

Joo, T. H.; Held, D. N.

1983-01-01

Synthetic aperture radar (SAR) images are degraded by speckle. A multiplicative speckle noise model for SAR images is presented. Using this model, a Wiener filter is derived by minimizing the mean-squared error using the known speckle statistics. Implementation of the Wiener filter is discussed and experimental results are presented. Finally, possible improvements to this method are explored.
The Effect of Attending Tutoring on Course Grades in Calculus I

ERIC Educational Resources Information Center

Rickard, Brian; Mills, Melissa

2018-01-01

Tutoring centres are common in universities in the United States, but there are few published studies that statistically examine the effects of tutoring on student success. This study utilizes multiple regression analysis to model the effect of tutoring attendance on final course grades in Calculus I. Our model predicted that every three visits to…
Helping Students Assess the Relative Importance of Different Intermolecular Interactions

ERIC Educational Resources Information Center

Jasien, Paul G.

2008-01-01

A semi-quantitative model has been developed to estimate the relative effects of dispersion, dipole-dipole interactions, and H-bonding on the normal boiling points ("T[subscript b]") for a subset of simple organic systems. The model is based upon a statistical analysis using multiple linear regression on a series of straight-chain organic…
A Statistical Examination of Magnetic Field Model Accuracy for Mapping Geosynchronous Solar Energetic Particle Observations to Lower Earth Orbits

NASA Astrophysics Data System (ADS)

Young, S. L.; Kress, B. T.; Rodriguez, J. V.; McCollough, J. P.

2013-12-01

Operational specifications of space environmental hazards can be an important input used by decision makers. Ideally the specification would come from on-board sensors, but for satellites where that capability is not available another option is to map data from remote observations to the location of the satellite. This requires a model of the physical environment and an understanding of its accuracy for mapping applications. We present a statistical comparison between magnetic field model mappings of solar energetic particle observations made by NOAA's Geostationary Operational Environmental Satellites (GOES) to the location of the Combined Release and Radiation Effects Satellite (CRRES). Because CRRES followed a geosynchronous transfer orbit which precessed in local time this allows us to examine the model accuracy between LEO and GEO orbits across a range of local times. We examine the accuracy of multiple magnetic field models using a variety of statistics and examine their utility for operational purposes.
Uncertainty Analysis of Inertial Model Attitude Sensor Calibration and Application with a Recommended New Calibration Method

NASA Technical Reports Server (NTRS)

Tripp, John S.; Tcheng, Ping

1999-01-01

Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.
Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

NASA Astrophysics Data System (ADS)

Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

2017-12-01

An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

USGS Publications Warehouse

Lee, L.; Helsel, D.

2007-01-01

Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Statistical analysis of lightning electric field measured under Malaysian condition

NASA Astrophysics Data System (ADS)

Salimi, Behnam; Mehranzamir, Kamyar; Abdul-Malek, Zulkurnain

2014-02-01

Lightning is an electrical discharge during thunderstorms that can be either within clouds (Inter-Cloud), or between clouds and ground (Cloud-Ground). The Lightning characteristics and their statistical information are the foundation for the design of lightning protection system as well as for the calculation of lightning radiated fields. Nowadays, there are various techniques to detect lightning signals and to determine various parameters produced by a lightning flash. Each technique provides its own claimed performances. In this paper, the characteristics of captured broadband electric fields generated by cloud-to-ground lightning discharges in South of Malaysia are analyzed. A total of 130 cloud-to-ground lightning flashes from 3 separate thunderstorm events (each event lasts for about 4-5 hours) were examined. Statistical analyses of the following signal parameters were presented: preliminary breakdown pulse train time duration, time interval between preliminary breakdowns and return stroke, multiplicity of stroke, and percentages of single stroke only. The BIL model is also introduced to characterize the lightning signature patterns. Observations on the statistical analyses show that about 79% of lightning signals fit well with the BIL model. The maximum and minimum of preliminary breakdown time duration of the observed lightning signals are 84 ms and 560 us, respectively. The findings of the statistical results show that 7.6% of the flashes were single stroke flashes, and the maximum number of strokes recorded was 14 multiple strokes per flash. A preliminary breakdown signature in more than 95% of the flashes can be identified.
Interpretation of commonly used statistical regression models.

PubMed

Kasza, Jessica; Wolfe, Rory

2014-01-01

A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
The Forbes 400, the Pareto power-law and efficient markets

NASA Astrophysics Data System (ADS)

Klass, O. S.; Biham, O.; Levy, M.; Malcai, O.; Solomon, S.

2007-01-01

Statistical regularities at the top end of the wealth distribution in the United States are examined using the Forbes 400 lists of richest Americans, published between 1988 and 2003. It is found that the wealths are distributed according to a power-law (Pareto) distribution. This result is explained using a simple stochastic model of multiple investors that incorporates the efficient market hypothesis as well as the multiplicative nature of financial market fluctuations.
The Use of Meta-Analytic Statistical Significance Testing

ERIC Educational Resources Information Center

Polanin, Joshua R.; Pigott, Terri D.

2015-01-01

Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
Imprints of dynamical interactions on brown dwarf pairing statistics and kinematics

NASA Astrophysics Data System (ADS)

Sterzik, M. F.; Durisen, R. H.

2003-03-01

We present statistically robust predictions of brown dwarf properties arising from dynamical interactions during their early evolution in small clusters. Our conclusions are based on numerical calculations of the internal cluster dynamics as well as on Monte-Carlo models. Accounting for recent observational constraints on the sub-stellar mass function and initial properties in fragmenting star forming clumps, we derive multiplicity fractions, mass ratios, separation distributions, and velocity dispersions. We compare them with observations of brown dwarfs in the field and in young clusters. Observed brown dwarf companion fractions around 15 +/- 7% for very low-mass stars as reported recently by Close et al. (\\cite{CSFB03}) are consistent with certain dynamical decay models. A significantly smaller mean separation distribution for brown dwarf binaries than for binaries of late-type stars can be explained by similar specific energy at the time of cluster formation for all cluster masses. Due to their higher velocity dispersions, brown-dwarfs and low-mass single stars will undergo time-dependent spatial segregation from higher-mass stars and multiple systems. This will cause mass functions and binary statistics in star forming regions to vary with the age of the region and the volume sampled.
The role of multiple-scale modelling of epilepsy in seizure forecasting

PubMed Central

Kuhlmann, Levin; Grayden, David B.; Wendling, Fabrice; Schiff, Steven J.

2014-01-01

Over the past three decades, a number of seizure prediction, or forecasting, methods have been developed. Although major achievements were accomplished regarding the statistical evaluation of proposed algorithms, it is recognized that further progress is still necessary for clinical application in patients. The lack of physiological motivation can partly explain this limitation. Therefore, a natural question is raised: can computational models of epilepsy be used to improve these methods? Here we review the literature on the multiple-scale neural modelling of epilepsy and the use of such models to infer physiological changes underlying epilepsy and epileptic seizures. We argue how these methods can be applied to advance the state-of-the-art in seizure forecasting. PMID:26035674

Reassessment of the relationship between M-protein decrement and survival in multiple myeloma.

PubMed

Palmer, M; Belch, A; Hanson, J; Brox, L

1989-01-01

The relationship between percentage M-protein decrement and survival is assessed in 134 multiple myeloma patients. The correlation did not achieve statistical significance (P = 0.069). Multivariate analysis using the Cox proportional hazards model, including a number of previously recognised prognostic factors, showed only percentage M-protein decrement, creatinine and haemoglobin to be significantly correlated with survival. However, the R'-statistic for each of these variables was low, indicating that their prognostic power is weak. We conclude that neither the percentage M-protein decrement nor the response derived from it can be used as an accurate means of assessing the efficacy of treatment in myeloma. Mature survival data alone should be used for this purpose.
Reassessment of the relationship between M-protein decrement and survival in multiple myeloma.

PubMed Central

Palmer, M.; Belch, A.; Hanson, J.; Brox, L.

1989-01-01

The relationship between percentage M-protein decrement and survival is assessed in 134 multiple myeloma patients. The correlation did not achieve statistical significance (P = 0.069). Multivariate analysis using the Cox proportional hazards model, including a number of previously recognised prognostic factors, showed only percentage M-protein decrement, creatinine and haemoglobin to be significantly correlated with survival. However, the R'-statistic for each of these variables was low, indicating that their prognostic power is weak. We conclude that neither the percentage M-protein decrement nor the response derived from it can be used as an accurate means of assessing the efficacy of treatment in myeloma. Mature survival data alone should be used for this purpose. PMID:2757916
Dynamic Quantitative Trait Locus Analysis of Plant Phenomic Data.

PubMed

Li, Zitong; Sillanpää, Mikko J

2015-12-01

Advanced platforms have recently become available for automatic and systematic quantification of plant growth and development. These new techniques can efficiently produce multiple measurements of phenotypes over time, and introduce time as an extra dimension to quantitative trait locus (QTL) studies. Functional mapping utilizes a class of statistical models for identifying QTLs associated with the growth characteristics of interest. A major benefit of functional mapping is that it integrates information over multiple timepoints, and therefore could increase the statistical power for QTL detection. We review the current development of computationally efficient functional mapping methods which provide invaluable tools for analyzing large-scale timecourse data that are readily available in our post-genome era. Copyright © 2015 Elsevier Ltd. All rights reserved.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

NASA Astrophysics Data System (ADS)

Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

2017-03-01

This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
In defence of model-based inference in phylogeography

PubMed Central

Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

2017-01-01

Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Detecting anomalies in CMB maps: a new method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neelakanta, Jayanth T., E-mail: jayanthtn@gmail.com

2015-10-01

Ever since WMAP announced its first results, different analyses have shown that there is weak evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult to account for them via a single statistical fluke. So, one is led to considering a combination of these anomalies. But, if we ''hand-pick'' the anomalies (test statistics) to consider, we are making an a posteriori choice. In this article, we propose two statistics that do not suffer from this problem. The statistics aremore » linear and quadratic combinations of the a{sub ℓ m}'s with random co-efficients, and they test the null hypothesis that the a{sub ℓ m}'s are independent, normally-distributed, zero-mean random variables with an m-independent variance. The motivation for considering multiple modes is this: because most physical models that lead to large-scale anomalies result in coupling multiple ℓ and m modes, the ''coherence'' of this coupling should get enhanced if a combination of different modes is considered. In this sense, the statistics are thus much more generic than those that have been hitherto considered in literature. Using fiducial data, we demonstrate that the method works and discuss how it can be used with actual CMB data to make quite general statements about the incompatibility of the data with the null hypothesis.« less
Common pitfalls in statistical analysis: The perils of multiple testing

PubMed Central

Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

2016-01-01

Multiple testing refers to situations where a dataset is subjected to statistical testing multiple times - either at multiple time-points or through multiple subgroups or for multiple end-points. This amplifies the probability of a false-positive finding. In this article, we look at the consequences of multiple testing and explore various methods to deal with this issue. PMID:27141478
Empirical comparison study of approximate methods for structure selection in binary graphical models.

PubMed

Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel

2014-03-01

Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Seismic activity prediction using computational intelligence techniques in northern Pakistan

NASA Astrophysics Data System (ADS)

Asim, Khawaja M.; Awais, Muhammad; Martínez-Álvarez, F.; Iqbal, Talat

2017-10-01

Earthquake prediction study is carried out for the region of northern Pakistan. The prediction methodology includes interdisciplinary interaction of seismology and computational intelligence. Eight seismic parameters are computed based upon the past earthquakes. Predictive ability of these eight seismic parameters is evaluated in terms of information gain, which leads to the selection of six parameters to be used in prediction. Multiple computationally intelligent models have been developed for earthquake prediction using selected seismic parameters. These models include feed-forward neural network, recurrent neural network, random forest, multi layer perceptron, radial basis neural network, and support vector machine. The performance of every prediction model is evaluated and McNemar's statistical test is applied to observe the statistical significance of computational methodologies. Feed-forward neural network shows statistically significant predictions along with accuracy of 75% and positive predictive value of 78% in context of northern Pakistan.
An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

ERIC Educational Resources Information Center

Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

2011-01-01

This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Modeling stochastic kinetics of molecular machines at multiple levels: from molecules to modules.

PubMed

Chowdhury, Debashish

2013-06-04

A molecular machine is either a single macromolecule or a macromolecular complex. In spite of the striking superficial similarities between these natural nanomachines and their man-made macroscopic counterparts, there are crucial differences. Molecular machines in a living cell operate stochastically in an isothermal environment far from thermodynamic equilibrium. In this mini-review we present a catalog of the molecular machines and an inventory of the essential toolbox for theoretically modeling these machines. The tool kits include 1), nonequilibrium statistical-physics techniques for modeling machines and machine-driven processes; and 2), statistical-inference methods for reverse engineering a functional machine from the empirical data. The cell is often likened to a microfactory in which the machineries are organized in modular fashion; each module consists of strongly coupled multiple machines, but different modules interact weakly with each other. This microfactory has its own automated supply chain and delivery system. Buoyed by the success achieved in modeling individual molecular machines, we advocate integration of these models in the near future to develop models of functional modules. A system-level description of the cell from the perspective of molecular machinery (the mechanome) is likely to emerge from further integrations that we envisage here. Copyright © 2013 Biophysical Society. Published by Elsevier Inc. All rights reserved.
An open-access CMIP5 pattern library for temperature and precipitation: description and methodology

NASA Astrophysics Data System (ADS)

Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben

2017-05-01

Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.
Nonparametric statistical modeling of binary star separations

NASA Technical Reports Server (NTRS)

Heacox, William D.; Gathright, John

1994-01-01

We develop a comprehensive statistical model for the distribution of observed separations in binary star systems, in terms of distributions of orbital elements, projection effects, and distances to systems. We use this model to derive several diagnostics for estimating the completeness of imaging searches for stellar companions, and the underlying stellar multiplicities. In application to recent imaging searches for low-luminosity companions to nearby M dwarf stars, and for companions to young stars in nearby star-forming regions, our analyses reveal substantial uncertainty in estimates of stellar multiplicity. For binary stars with late-type dwarf companions, semimajor axes appear to be distributed approximately as a(exp -1) for values ranging from about one to several thousand astronomical units. About one-quarter of the companions to field F and G dwarf stars have semimajor axes less than 1 AU, and about 15% lie beyond 1000 AU. The geometric efficiency (fraction of companions imaged onto the detector) of imaging searches is nearly independent of distances to program stars and orbital eccentricities, and varies only slowly with detector spatial limitations.
Forecasting the discomfort levels within the greater Athens area, Greece using artificial neural networks and multiple criteria analysis

NASA Astrophysics Data System (ADS)

Vouterakos, P. A.; Moustris, K. P.; Bartzokas, A.; Ziomas, I. C.; Nastos, P. T.; Paliatsos, A. G.

2012-12-01

In this work, artificial neural networks (ANNs) were developed and applied in order to forecast the discomfort levels due to the combination of high temperature and air humidity, during the hot season of the year, in eight different regions within the Greater Athens area (GAA), Greece. For the selection of the best type and architecture of ANNs-forecasting models, the multiple criteria analysis (MCA) technique was applied. Three different types of ANNs were developed and tested with the MCA method. Concretely, the multilayer perceptron, the generalized feed forward networks (GFFN), and the time-lag recurrent networks were developed and tested. Results showed that the best ANNs type performance was achieved by using the GFFN model for the prediction of discomfort levels due to high temperature and air humidity within GAA. For the evaluation of the constructed ANNs, appropriate statistical indices were used. The analysis proved that the forecasting ability of the developed ANNs models is very satisfactory at a significant statistical level of p < 0.01.
Visualization of Spatio-Temporal Relations in Movement Event Using Multi-View

NASA Astrophysics Data System (ADS)

Zheng, K.; Gu, D.; Fang, F.; Wang, Y.; Liu, H.; Zhao, W.; Zhang, M.; Li, Q.

2017-09-01

Spatio-temporal relations among movement events extracted from temporally varying trajectory data can provide useful information about the evolution of individual or collective movers, as well as their interactions with their spatial and temporal contexts. However, the pure statistical tools commonly used by analysts pose many difficulties, due to the large number of attributes embedded in multi-scale and multi-semantic trajectory data. The need for models that operate at multiple scales to search for relations at different locations within time and space, as well as intuitively interpret what these relations mean, also presents challenges. Since analysts do not know where or when these relevant spatio-temporal relations might emerge, these models must compute statistical summaries of multiple attributes at different granularities. In this paper, we propose a multi-view approach to visualize the spatio-temporal relations among movement events. We describe a method for visualizing movement events and spatio-temporal relations that uses multiple displays. A visual interface is presented, and the user can interactively select or filter spatial and temporal extents to guide the knowledge discovery process. We also demonstrate how this approach can help analysts to derive and explain the spatio-temporal relations of movement events from taxi trajectory data.
SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

PubMed

Xu, Wenxuan; Zhang, Li; Lu, Yaping

2016-06-01

The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.
An integrated data model to estimate spatiotemporal occupancy, abundance, and colonization dynamics.

PubMed

Williams, Perry J; Hooten, Mevin B; Womble, Jamie N; Esslinger, George G; Bower, Michael R; Hefley, Trevor J

2017-02-01

Ecological invasions and colonizations occur dynamically through space and time. Estimating the distribution and abundance of colonizing species is critical for efficient management or conservation. We describe a statistical framework for simultaneously estimating spatiotemporal occupancy and abundance dynamics of a colonizing species. Our method accounts for several issues that are common when modeling spatiotemporal ecological data including multiple levels of detection probability, multiple data sources, and computational limitations that occur when making fine-scale inference over a large spatiotemporal domain. We apply the model to estimate the colonization dynamics of sea otters (Enhydra lutris) in Glacier Bay, in southeastern Alaska. © 2016 by the Ecological Society of America.
MODELING A MIXTURE: PBPK/PD APPROACHES FOR PREDICTING CHEMICAL INTERACTIONS.

EPA Science Inventory

Since environmental chemical exposures generally involve multiple chemicals, there are both regulatory and scientific drivers to develop methods to predict outcomes of these exposures. Even using efficient statistical and experimental designs, it is not possible to test in vivo a...
Passage relevance models for genomics search.

PubMed

Urbain, Jay; Frieder, Ophir; Goharian, Nazli

2009-03-19

We present a passage relevance model for integrating syntactic and semantic evidence of biomedical concepts and topics using a probabilistic graphical model. Component models of topics, concepts, terms, and document are represented as potential functions within a Markov Random Field. The probability of a passage being relevant to a biologist's information need is represented as the joint distribution across all potential functions. Relevance model feedback of top ranked passages is used to improve distributional estimates of query concepts and topics in context, and a dimensional indexing strategy is used for efficient aggregation of concept and term statistics. By integrating multiple sources of evidence including dependencies between topics, concepts, and terms, we seek to improve genomics literature passage retrieval precision. Using this model, we are able to demonstrate statistically significant improvements in retrieval precision using a large genomics literature corpus.
Fully Bayesian tests of neutrality using genealogical summary statistics.

PubMed

Drummond, Alexei J; Suchard, Marc A

2008-10-31

Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

Statistical Power in Evaluations That Investigate Effects on Multiple Outcomes: A Guide for Researchers

ERIC Educational Resources Information Center

Porter, Kristin E.

2016-01-01

In education research and in many other fields, researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple…
Climate change and the eco-hydrology of fire: Will area burned increase in a warming western USA?

Treesearch

Donald McKenzie; Jeremy S. Littell

2017-01-01

Wildfire area is predicted to increase with global warming. Empirical statistical models and process-based simulations agree almost universally. The key relationship for this unanimity, observed at multiple spatial and temporal scales, is between drought and fire. Predictive models often focus on ecosystems in which this relationship appears to be particularly strong,...
POOLMS: A computer program for fitting and model selection for two level factorial replication-free experiments

NASA Technical Reports Server (NTRS)

Amling, G. E.; Holms, A. G.

1973-01-01

A computer program is described that performs a statistical multiple-decision procedure called chain pooling. It uses a number of mean squares assigned to error variance that is conditioned on the relative magnitudes of the mean squares. The model selection is done according to user-specified levels of type 1 or type 2 error probabilities.
An Alternative to the 3PL: Using Asymmetric Item Characteristic Curves to Address Guessing Effects

ERIC Educational Resources Information Center

Lee, Sora; Bolt, Daniel M.

2018-01-01

Both the statistical and interpretational shortcomings of the three-parameter logistic (3PL) model in accommodating guessing effects on multiple-choice items are well documented. We consider the use of a residual heteroscedasticity (RH) model as an alternative, and compare its performance to the 3PL with real test data sets and through simulation…
Trends in Mortality After Primary Cytoreductive Surgery for Ovarian Cancer: A Systematic Review and Metaregression of Randomized Clinical Trials and Observational Studies.

PubMed

Di Donato, Violante; Kontopantelis, Evangelos; Aletti, Giovanni; Casorelli, Assunta; Piacenti, Ilaria; Bogani, Giorgio; Lecce, Francesca; Benedetti Panici, Pierluigi

2017-06-01

Primary cytoreductive surgery (PDS) followed by platinum-based chemotherapy is the cornerstone of treatment and the absence of residual tumor after PDS is universally considered the most important prognostic factor. The aim of the present analysis was to evaluate trend and predictors of 30-day mortality in patients undergoing primary cytoreduction for ovarian cancer. Literature was searched for records reporting 30-day mortality after PDS. All cohorts were rated for quality. Simple and multiple Poisson regression models were used to quantify the association between 30-day mortality and the following: overall or severe complications, proportion of patients with stage IV disease, median age, year of publication, and weighted surgical complexity index. Using the multiple regression model, we calculated the risk of perioperative mortality at different levels for statistically significant covariates of interest. Simple regression identified median age and proportion of patients with stage IV disease as statistically significant predictors of 30-day mortality. When included in the multiple Poisson regression model, both remained statistically significant, with an incidence rate ratio of 1.087 for median age and 1.017 for stage IV disease. Disease stage was a strong predictor, with the risk estimated to increase from 2.8% (95% confidence interval 2.02-3.66) for stage III to 16.1% (95% confidence interval 6.18-25.93) for stage IV, for a cohort with a median age of 65 years. Metaregression demonstrated that increased age and advanced clinical stage were independently associated with an increased risk of mortality, and the combined effects of both factors greatly increased the risk.
Test anxiety and academic performance in chiropractic students.

PubMed

Zhang, Niu; Henderson, Charles N R

2014-01-01

Objective : We assessed the level of students' test anxiety, and the relationship between test anxiety and academic performance. Methods : We recruited 166 third-quarter students. The Test Anxiety Inventory (TAI) was administered to all participants. Total scores from written examinations and objective structured clinical examinations (OSCEs) were used as response variables. Results : Multiple regression analysis shows that there was a modest, but statistically significant negative correlation between TAI scores and written exam scores, but not OSCE scores. Worry and emotionality were the best predictive models for written exam scores. Mean total anxiety and emotionality scores for females were significantly higher than those for males, but not worry scores. Conclusion : Moderate-to-high test anxiety was observed in 85% of the chiropractic students examined. However, total test anxiety, as measured by the TAI score, was a very weak predictive model for written exam performance. Multiple regression analysis demonstrated that replacing total anxiety (TAI) with worry and emotionality (TAI subscales) produces a much more effective predictive model of written exam performance. Sex, age, highest current academic degree, and ethnicity contributed little additional predictive power in either regression model. Moreover, TAI scores were not found to be statistically significant predictors of physical exam skill performance, as measured by OSCEs.
Climate change and health modeling: horses for courses.

PubMed

Ebi, Kristie L; Rocklöv, Joacim

2014-01-01

Mathematical and statistical models are needed to understand the extent to which weather, climate variability, and climate change are affecting current and may affect future health burdens in the context of other risk factors and a range of possible development pathways, and the temporal and spatial patterns of any changes. Such understanding is needed to guide the design and the implementation of adaptation and mitigation measures. Because each model projection captures only a narrow range of possible futures, and because models serve different purposes, multiple models are needed for each health outcome ('horses for courses'). Multiple modeling results can be used to bracket the ranges of when, where, and with what intensity negative health consequences could arise. This commentary explores some climate change and health modeling issues, particularly modeling exposure-response relationships, developing early warning systems, projecting health risks over coming decades, and modeling to inform decision-making. Research needs are also suggested.
Multiple imputation as one tool to provide longitudinal databases for modelling human height and weight development.

PubMed

Aßmann, C

2016-06-01

Besides large efforts regarding field work, provision of valid databases requires statistical and informational infrastructure to enable long-term access to longitudinal data sets on height, weight and related issues. To foster use of longitudinal data sets within the scientific community, provision of valid databases has to address data-protection regulations. It is, therefore, of major importance to hinder identifiability of individuals from publicly available databases. To reach this goal, one possible strategy is to provide a synthetic database to the public allowing for pretesting strategies for data analysis. The synthetic databases can be established using multiple imputation tools. Given the approval of the strategy, verification is based on the original data. Multiple imputation by chained equations is illustrated to facilitate provision of synthetic databases as it allows for capturing a wide range of statistical interdependencies. Also missing values, typically occurring within longitudinal databases for reasons of item non-response, can be addressed via multiple imputation when providing databases. The provision of synthetic databases using multiple imputation techniques is one possible strategy to ensure data protection, increase visibility of longitudinal databases and enhance the analytical potential.
Applied immuno-epidemiological research: an approach for integrating existing knowledge into the statistical analysis of multiple immune markers.

PubMed

Genser, Bernd; Fischer, Joachim E; Figueiredo, Camila A; Alcântara-Neves, Neuza; Barreto, Mauricio L; Cooper, Philip J; Amorim, Leila D; Saemann, Marcus D; Weichhart, Thomas; Rodrigues, Laura C

2016-05-20

Immunologists often measure several correlated immunological markers, such as concentrations of different cytokines produced by different immune cells and/or measured under different conditions, to draw insights from complex immunological mechanisms. Although there have been recent methodological efforts to improve the statistical analysis of immunological data, a framework is still needed for the simultaneous analysis of multiple, often correlated, immune markers. This framework would allow the immunologists' hypotheses about the underlying biological mechanisms to be integrated. We present an analytical approach for statistical analysis of correlated immune markers, such as those commonly collected in modern immuno-epidemiological studies. We demonstrate i) how to deal with interdependencies among multiple measurements of the same immune marker, ii) how to analyse association patterns among different markers, iii) how to aggregate different measures and/or markers to immunological summary scores, iv) how to model the inter-relationships among these scores, and v) how to use these scores in epidemiological association analyses. We illustrate the application of our approach to multiple cytokine measurements from 818 children enrolled in a large immuno-epidemiological study (SCAALA Salvador), which aimed to quantify the major immunological mechanisms underlying atopic diseases or asthma. We demonstrate how to aggregate systematically the information captured in multiple cytokine measurements to immunological summary scores aimed at reflecting the presumed underlying immunological mechanisms (Th1/Th2 balance and immune regulatory network). We show how these aggregated immune scores can be used as predictors in regression models with outcomes of immunological studies (e.g. specific IgE) and compare the results to those obtained by a traditional multivariate regression approach. The proposed analytical approach may be especially useful to quantify complex immune responses in immuno-epidemiological studies, where investigators examine the relationship among epidemiological patterns, immune response, and disease outcomes.
Test of the statistical model in {sup 96}Mo with the BaF{sub 2}{gamma} calorimeter DANCE array

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sheets, S. A.; Mitchell, G. E.; Agvaanluvsan, U.

2009-02-15

The {gamma}-ray cascades following the {sup 95}Mo(n,{gamma}){sup 96}Mo reaction were studied with the {gamma} calorimeter DANCE (Detector for Advanced Neutron Capture Experiments) consisting of 160 BaF{sub 2} scintillation detectors at the Los Alamos Neutron Science Center. The {gamma}-ray energy spectra for different multiplicities were measured for s- and p-wave resonances below 2 keV. The shapes of these spectra were found to be in very good agreement with simulations using the DICEBOX statistical model code. The relevant model parameters used for the level density and photon strength functions were identical with those that provided the best fit of the data frommore » a recent measurement of the thermal {sup 95}Mo(n,{gamma}){sup 96}Mo reaction with the two-step-cascade method. The reported results strongly suggest that the extreme statistical model works very well in the mass region near A=100.« less
A Method for Modeling Household Occupant Behavior to Simulate Residential Energy Consumption

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Brandon J; Starke, Michael R; Abdelaziz, Omar

2014-01-01

This paper presents a statistical method for modeling the behavior of household occupants to estimate residential energy consumption. Using data gathered by the U.S. Census Bureau in the American Time Use Survey (ATUS), actions carried out by survey respondents are categorized into ten distinct activities. These activities are defined to correspond to the major energy consuming loads commonly found within the residential sector. Next, time varying minute resolution Markov chain based statistical models of different occupant types are developed. Using these behavioral models, individual occupants are simulated to show how an occupant interacts with the major residential energy consuming loadsmore » throughout the day. From these simulations, the minimum number of occupants, and consequently the minimum number of multiple occupant households, needing to be simulated to produce a statistically accurate representation of aggregate residential behavior can be determined. Finally, future work will involve the use of these occupant models along side residential load models to produce a high-resolution energy consumption profile and estimate the potential for demand response from residential loads.« less
Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach.

PubMed

Verde, Pablo E

2010-12-30

In the last decades, the amount of published results on clinical diagnostic tests has expanded very rapidly. The counterpart to this development has been the formal evaluation and synthesis of diagnostic results. However, published results present substantial heterogeneity and they can be regarded as so far removed from the classical domain of meta-analysis, that they can provide a rather severe test of classical statistical methods. Recently, bivariate random effects meta-analytic methods, which model the pairs of sensitivities and specificities, have been presented from the classical point of view. In this work a bivariate Bayesian modeling approach is presented. This approach substantially extends the scope of classical bivariate methods by allowing the structural distribution of the random effects to depend on multiple sources of variability. Meta-analysis is summarized by the predictive posterior distributions for sensitivity and specificity. This new approach allows, also, to perform substantial model checking, model diagnostic and model selection. Statistical computations are implemented in the public domain statistical software (WinBUGS and R) and illustrated with real data examples. Copyright © 2010 John Wiley & Sons, Ltd.
Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees.

PubMed

Zhu, Sha; Degnan, James H; Goldstien, Sharyn J; Eldon, Bjarki

2015-09-15

There has been increasing interest in coalescent models which admit multiple mergers of ancestral lineages; and to model hybridization and coalescence simultaneously. Hybrid-Lambda is a software package that simulates gene genealogies under multiple merger and Kingman's coalescent processes within species networks or species trees. Hybrid-Lambda allows different coalescent processes to be specified for different populations, and allows for time to be converted between generations and coalescent units, by specifying a population size for each population. In addition, Hybrid-Lambda can generate simulated datasets, assuming the infinitely many sites mutation model, and compute the F ST statistic. As an illustration, we apply Hybrid-Lambda to infer the time of subdivision of certain marine invertebrates under different coalescent processes. Hybrid-Lambda makes it possible to investigate biogeographic concordance among high fecundity species exhibiting skewed offspring distribution.
On the Stationarity of Multiple Autoregressive Approximants: Theory and Algorithms

DTIC Science & Technology

1976-08-01

a I (3.4) Hannan and Terrell (1972) consider problems of a similar nature. Efficient estimates A(1),... , A(p) , and i of A(1)... ,A(p) and...34Autoregressive model fitting for control, Ann . Inst. Statist. Math., 23, 163-180. Hannan, E. J. (1970), Multiple Time Series, New York, John Wiley...Hannan, E. J. and Terrell , R. D. (1972), "Time series regression with linear constraints, " International Economic Review, 13, 189-200. Masani, P
Nonlinear Modeling of Causal Interrelationships in Neuronal Ensembles

PubMed Central

Zanos, Theodoros P.; Courellis, Spiros H.; Berger, Theodore W.; Hampson, Robert E.; Deadwyler, Sam A.; Marmarelis, Vasilis Z.

2009-01-01

The increasing availability of multiunit recordings gives new urgency to the need for effective analysis of “multidimensional” time-series data that are derived from the recorded activity of neuronal ensembles in the form of multiple sequences of action potentials—treated mathematically as point-processes and computationally as spike-trains. Whether in conditions of spontaneous activity or under conditions of external stimulation, the objective is the identification and quantification of possible causal links among the neurons generating the observed binary signals. A multiple-input/multiple-output (MIMO) modeling methodology is presented that can be used to quantify the neuronal dynamics of causal interrelationships in neuronal ensembles using spike-train data recorded from individual neurons. These causal interrelationships are modeled as transformations of spike-trains recorded from a set of neurons designated as the “inputs” into spike-trains recorded from another set of neurons designated as the “outputs.” The MIMO model is composed of a set of multiinput/single-output (MISO) modules, one for each output. Each module is the cascade of a MISO Volterra model and a threshold operator generating the output spikes. The Laguerre expansion approach is used to estimate the Volterra kernels of each MISO module from the respective input–output data using the least-squares method. The predictive performance of the model is evaluated with the use of the receiver operating characteristic (ROC) curve, from which the optimum threshold is also selected. The Mann–Whitney statistic is used to select the significant inputs for each output by examining the statistical significance of improvements in the predictive accuracy of the model when the respective inputs is included. Illustrative examples are presented for a simulated system and for an actual application using multiunit data recordings from the hippocampus of a behaving rat. PMID:18701382
Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems.

PubMed

Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

2018-01-04

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. Copyright © 2018 Montesinos-Lopez et al.
Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

PubMed Central

Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C.; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

2018-01-01

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. PMID:29097376
Statistical strategies for averaging EC50 from multiple dose-response experiments.

PubMed

Jiang, Xiaoqi; Kopp-Schneider, Annette

2015-11-01

In most dose-response studies, repeated experiments are conducted to determine the EC50 value for a chemical, requiring averaging EC50 estimates from a series of experiments. Two statistical strategies, the mixed-effect modeling and the meta-analysis approach, can be applied to estimate average behavior of EC50 values over all experiments by considering the variabilities within and among experiments. We investigated these two strategies in two common cases of multiple dose-response experiments in (a) complete and explicit dose-response relationships are observed in all experiments and in (b) only in a subset of experiments. In case (a), the meta-analysis strategy is a simple and robust method to average EC50 estimates. In case (b), all experimental data sets can be first screened using the dose-response screening plot, which allows visualization and comparison of multiple dose-response experimental results. As long as more than three experiments provide information about complete dose-response relationships, the experiments that cover incomplete relationships can be excluded from the meta-analysis strategy of averaging EC50 estimates. If there are only two experiments containing complete dose-response information, the mixed-effects model approach is suggested. We subsequently provided a web application for non-statisticians to implement the proposed meta-analysis strategy of averaging EC50 estimates from multiple dose-response experiments.
High-resolution modeling of thermal thresholds and environmental influences on coral bleaching for local and regional reef management.

PubMed

Kumagai, Naoki H; Yamano, Hiroya

2018-01-01

Coral reefs are one of the world's most threatened ecosystems, with global and local stressors contributing to their decline. Excessive sea-surface temperatures (SSTs) can cause coral bleaching, resulting in coral death and decreases in coral cover. A SST threshold of 1 °C over the climatological maximum is widely used to predict coral bleaching. In this study, we refined thermal indices predicting coral bleaching at high-spatial resolution (1 km) by statistically optimizing thermal thresholds, as well as considering other environmental influences on bleaching such as ultraviolet (UV) radiation, water turbidity, and cooling effects. We used a coral bleaching dataset derived from the web-based monitoring system Sango Map Project, at scales appropriate for the local and regional conservation of Japanese coral reefs. We recorded coral bleaching events in the years 2004-2016 in Japan. We revealed the influence of multiple factors on the ability to predict coral bleaching, including selection of thermal indices, statistical optimization of thermal thresholds, quantification of multiple environmental influences, and use of multiple modeling methods (generalized linear models and random forests). After optimization, differences in predictive ability among thermal indices were negligible. Thermal index, UV radiation, water turbidity, and cooling effects were important predictors of the occurrence of coral bleaching. Predictions based on the best model revealed that coral reefs in Japan have experienced recent and widespread bleaching. A practical method to reduce bleaching frequency by screening UV radiation was also demonstrated in this paper.
High-resolution modeling of thermal thresholds and environmental influences on coral bleaching for local and regional reef management

PubMed Central

Yamano, Hiroya

2018-01-01

Coral reefs are one of the world’s most threatened ecosystems, with global and local stressors contributing to their decline. Excessive sea-surface temperatures (SSTs) can cause coral bleaching, resulting in coral death and decreases in coral cover. A SST threshold of 1 °C over the climatological maximum is widely used to predict coral bleaching. In this study, we refined thermal indices predicting coral bleaching at high-spatial resolution (1 km) by statistically optimizing thermal thresholds, as well as considering other environmental influences on bleaching such as ultraviolet (UV) radiation, water turbidity, and cooling effects. We used a coral bleaching dataset derived from the web-based monitoring system Sango Map Project, at scales appropriate for the local and regional conservation of Japanese coral reefs. We recorded coral bleaching events in the years 2004–2016 in Japan. We revealed the influence of multiple factors on the ability to predict coral bleaching, including selection of thermal indices, statistical optimization of thermal thresholds, quantification of multiple environmental influences, and use of multiple modeling methods (generalized linear models and random forests). After optimization, differences in predictive ability among thermal indices were negligible. Thermal index, UV radiation, water turbidity, and cooling effects were important predictors of the occurrence of coral bleaching. Predictions based on the best model revealed that coral reefs in Japan have experienced recent and widespread bleaching. A practical method to reduce bleaching frequency by screening UV radiation was also demonstrated in this paper. PMID:29473007

Stretching single atom contacts at multiple subatomic step-length.

PubMed

Wei, Yi-Min; Liang, Jing-Hong; Chen, Zhao-Bin; Zhou, Xiao-Shun; Mao, Bing-Wei; Oviedo, Oscar A; Leiva, Ezequiel P M

2013-08-14

This work describes jump-to-contact STM-break junction experiments leading to novel statistical distribution of last-step length associated with conductance of a single atom contact. Last-step length histograms are observed with up to five for Fe and three for Cu peaks at integral multiples close to 0.075 nm, a subatomic distance. A model is proposed in terms of gliding from a fcc hollow-site to a hcp hollow-site of adjacent atomic planes at 1/3 regular layer spacing along with tip stretching to account for the multiple subatomic step-length behavior.
Multiresolution multiscale active mask segmentation of fluorescence microscope images

NASA Astrophysics Data System (ADS)

Srinivasa, Gowri; Fickus, Matthew; Kovačević, Jelena

2009-08-01

We propose an active mask segmentation framework that combines the advantages of statistical modeling, smoothing, speed and flexibility offered by the traditional methods of region-growing, multiscale, multiresolution and active contours respectively. At the crux of this framework is a paradigm shift from evolving contours in the continuous domain to evolving multiple masks in the discrete domain. Thus, the active mask framework is particularly suited to segment digital images. We demonstrate the use of the framework in practice through the segmentation of punctate patterns in fluorescence microscope images. Experiments reveal that statistical modeling helps the multiple masks converge from a random initial configuration to a meaningful one. This obviates the need for an involved initialization procedure germane to most of the traditional methods used to segment fluorescence microscope images. While we provide the mathematical details of the functions used to segment fluorescence microscope images, this is only an instantiation of the active mask framework. We suggest some other instantiations of the framework to segment different types of images.
Testing the Predictive Power of Coulomb Stress on Aftershock Sequences

NASA Astrophysics Data System (ADS)

Woessner, J.; Lombardi, A.; Werner, M. J.; Marzocchi, W.

2009-12-01

Empirical and statistical models of clustered seismicity are usually strongly stochastic and perceived to be uninformative in their forecasts, since only marginal distributions are used, such as the Omori-Utsu and Gutenberg-Richter laws. In contrast, so-called physics-based aftershock models, based on seismic rate changes calculated from Coulomb stress changes and rate-and-state friction, make more specific predictions: anisotropic stress shadows and multiplicative rate changes. We test the predictive power of models based on Coulomb stress changes against statistical models, including the popular Short Term Earthquake Probabilities and Epidemic-Type Aftershock Sequences models: We score and compare retrospective forecasts on the aftershock sequences of the 1992 Landers, USA, the 1997 Colfiorito, Italy, and the 2008 Selfoss, Iceland, earthquakes. To quantify predictability, we use likelihood-based metrics that test the consistency of the forecasts with the data, including modified and existing tests used in prospective forecast experiments within the Collaboratory for the Study of Earthquake Predictability (CSEP). Our results indicate that a statistical model performs best. Moreover, two Coulomb model classes seem unable to compete: Models based on deterministic Coulomb stress changes calculated from a given fault-slip model, and those based on fixed receiver faults. One model of Coulomb stress changes does perform well and sometimes outperforms the statistical models, but its predictive information is diluted, because of uncertainties included in the fault-slip model. Our results suggest that models based on Coulomb stress changes need to incorporate stochastic features that represent model and data uncertainty.
Statistical technique for analysing functional connectivity of multiple spike trains.

PubMed

Masud, Mohammad Shahed; Borisyuk, Roman

2011-03-15

A new statistical technique, the Cox method, used for analysing functional connectivity of simultaneously recorded multiple spike trains is presented. This method is based on the theory of modulated renewal processes and it estimates a vector of influence strengths from multiple spike trains (called reference trains) to the selected (target) spike train. Selecting another target spike train and repeating the calculation of the influence strengths from the reference spike trains enables researchers to find all functional connections among multiple spike trains. In order to study functional connectivity an "influence function" is identified. This function recognises the specificity of neuronal interactions and reflects the dynamics of postsynaptic potential. In comparison to existing techniques, the Cox method has the following advantages: it does not use bins (binless method); it is applicable to cases where the sample size is small; it is sufficiently sensitive such that it estimates weak influences; it supports the simultaneous analysis of multiple influences; it is able to identify a correct connectivity scheme in difficult cases of "common source" or "indirect" connectivity. The Cox method has been thoroughly tested using multiple sets of data generated by the neural network model of the leaky integrate and fire neurons with a prescribed architecture of connections. The results suggest that this method is highly successful for analysing functional connectivity of simultaneously recorded multiple spike trains. Copyright © 2011 Elsevier B.V. All rights reserved.
Applying the compound Poisson process model to the reporting of injury-related mortality rates.

PubMed

Kegler, Scott R

2007-02-16

Injury-related mortality rate estimates are often analyzed under the assumption that case counts follow a Poisson distribution. Certain types of injury incidents occasionally involve multiple fatalities, however, resulting in dependencies between cases that are not reflected in the simple Poisson model and which can affect even basic statistical analyses. This paper explores the compound Poisson process model as an alternative, emphasizing adjustments to some commonly used interval estimators for population-based rates and rate ratios. The adjusted estimators involve relatively simple closed-form computations, which in the absence of multiple-case incidents reduce to familiar estimators based on the simpler Poisson model. Summary data from the National Violent Death Reporting System are referenced in several examples demonstrating application of the proposed methodology.
A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics.

PubMed

Pare, Guillaume; Mao, Shihong; Deng, Wei Q

2016-06-08

Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.
A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

PubMed Central

Pare, Guillaume; Mao, Shihong; Deng, Wei Q.

2016-01-01

Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519
Using Statistical Mechanics and Entropy Principles to Interpret Variability in Power Law Models of the Streamflow Recession

NASA Astrophysics Data System (ADS)

Dralle, D.; Karst, N.; Thompson, S. E.

2015-12-01

Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the hypothesis that power law streamflow recession dynamics, and their variations, have their origin in the multiple modalities of storage partitioning.
Individual risk factors for deep infection and compromised fracture healing after intramedullary nailing of tibial shaft fractures: a single centre experience of 480 patients.

PubMed

Metsemakers, W-J; Handojo, K; Reynders, P; Sermon, A; Vanderschot, P; Nijs, S

2015-04-01

Despite modern advances in the treatment of tibial shaft fractures, complications including nonunion, malunion, and infection remain relatively frequent. A better understanding of these injuries and its complications could lead to prevention rather than treatment strategies. A retrospective study was performed to identify risk factors for deep infection and compromised fracture healing after intramedullary nailing (IMN) of tibial shaft fractures. Between January 2000 and January 2012, 480 consecutive patients with 486 tibial shaft fractures were enrolled in the study. Statistical analysis was performed to determine predictors of deep infection and compromised fracture healing. Compromised fracture healing was subdivided in delayed union and nonunion. The following independent variables were selected for analysis: age, sex, smoking, obesity, diabetes, American Society of Anaesthesiologists (ASA) classification, polytrauma, fracture type, open fractures, Gustilo type, primary external fixation (EF), time to nailing (TTN) and reaming. As primary statistical evaluation we performed a univariate analysis, followed by a multiple logistic regression model. Univariate regression analysis revealed similar risk factors for delayed union and nonunion, including fracture type, open fractures and Gustilo type. Factors affecting the occurrence of deep infection in this model were primary EF, a prolonged TTN, open fractures and Gustilo type. Multiple logistic regression analysis revealed polytrauma as the single risk factor for nonunion. With respect to delayed union, no risk factors could be identified. In the same statistical model, deep infection was correlated with primary EF. The purpose of this study was to evaluate risk factors of poor outcome after IMN of tibial shaft fractures. The univariate regression analysis showed that the nature of complications after tibial shaft nailing could be multifactorial. This was not confirmed in a multiple logistic regression model, which only revealed polytrauma and primary EF as risk factors for nonunion and deep infection, respectively. Future strategies should focus on prevention in high-risk populations such as polytrauma patients treated with EF. Copyright © 2014 Elsevier Ltd. All rights reserved.
ACCELERATED FAILURE TIME MODELS PROVIDE A USEFUL STATISTICAL FRAMEWORK FOR AGING RESEARCH

PubMed Central

Swindell, William R.

2009-01-01

Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model “deceleration factor”. AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data. PMID:19007875
Accelerated failure time models provide a useful statistical framework for aging research.

PubMed

Swindell, William R

2009-03-01

Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model "deceleration factor". AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data.
A STATISTICAL MODELING METHODOLOGY FOR THE DETECTION, QUANTIFICATION, AND PREDICTION OF ECOLOGICAL THRESHOLDS

EPA Science Inventory

This study will provide a general methodology for integrating threshold information from multiple species ecological metrics, allow for prediction of changes of alternative stable states, and provide a risk assessment tool that can be applied to adaptive management. The integr...
Generalized Full-Information Item Bifactor Analysis

ERIC Educational Resources Information Center

Cai, Li; Yang, Ji Seung; Hansen, Mark

2011-01-01

Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of…
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

PubMed

Chu, Annie; Cui, Jenny; Dinov, Ivo D

2009-03-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Controlling Guessing Bias in the Dichotomous Rasch Model Applied to a Large-Scale, Vertically Scaled Testing Program

ERIC Educational Resources Information Center

Andrich, David; Marais, Ida; Humphry, Stephen Mark

2016-01-01

Recent research has shown how the statistical bias in Rasch model difficulty estimates induced by guessing in multiple-choice items can be eliminated. Using vertical scaling of a high-profile national reading test, it is shown that the dominant effect of removing such bias is a nonlinear change in the unit of scale across the continuum. The…
A MULTIPLE TESTING OF THE ABC METHOD AND THE DEVELOPMENT OF A SECOND-GENERATION MODEL. PART II, TEST RESULTS AND AN ANALYSIS OF RECALL RATIO.

ERIC Educational Resources Information Center

ALTMANN, BERTHOLD

AFTER A BRIEF SUMMARY OF THE TEST PROGRAM (DESCRIBED MORE FULLY IN LI 000 318), THE STATISTICAL RESULTS TABULATED AS OVERALL "ABC (APPROACH BY CONCEPT)-RELEVANCE RATIOS" AND "ABC-RECALL FIGURES" ARE PRESENTED AND REVIEWED. AN ABSTRACT MODEL DEVELOPED IN ACCORDANCE WITH MAX WEBER'S "IDEALTYPUS" ("DIE OBJEKTIVITAET…
Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

NASA Technical Reports Server (NTRS)

Stolzer, Alan J.; Halford, Carl

2007-01-01

In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
A new fracture mechanics model for multiple matrix cracks of SiC fiber reinforced brittle-matrix composites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Okabe, T.; Takeda, N.; Komotori, J.

1999-11-26

A new model is proposed for multiple matrix cracking in order to take into account the role of matrix-rich regions in the cross section in initiating crack growth. The model is used to predict the matrix cracking stress and the total number of matrix cracks. The model converts the matrix-rich regions into equivalent penny shape crack sizes and predicts the matrix cracking stress with a fracture mechanics crack-bridging model. The estimated distribution of matrix cracking stresses is used as statistical input to predict the number of matrix cracks. The results show good agreement with the experimental results by replica observations.more » Therefore, it is found that the matrix cracking behavior mainly depends on the distribution of matrix-rich regions in the composite.« less
Borrowing of strength and study weights in multivariate and network meta-analysis.

PubMed

Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D

2017-12-01

Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of 'borrowing of strength'. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis).
Borrowing of strength and study weights in multivariate and network meta-analysis

PubMed Central

Jackson, Dan; White, Ian R; Price, Malcolm; Copas, John; Riley, Richard D

2016-01-01

Multivariate and network meta-analysis have the potential for the estimated mean of one effect to borrow strength from the data on other effects of interest. The extent of this borrowing of strength is usually assessed informally. We present new mathematical definitions of ‘borrowing of strength’. Our main proposal is based on a decomposition of the score statistic, which we show can be interpreted as comparing the precision of estimates from the multivariate and univariate models. Our definition of borrowing of strength therefore emulates the usual informal assessment. We also derive a method for calculating study weights, which we embed into the same framework as our borrowing of strength statistics, so that percentage study weights can accompany the results from multivariate and network meta-analyses as they do in conventional univariate meta-analyses. Our proposals are illustrated using three meta-analyses involving correlated effects for multiple outcomes, multiple risk factor associations and multiple treatments (network meta-analysis). PMID:26546254

Unraveling multiple changes in complex climate time series using Bayesian inference

NASA Astrophysics Data System (ADS)

Berner, Nadine; Trauth, Martin H.; Holschneider, Matthias

2016-04-01

Change points in time series are perceived as heterogeneities in the statistical or dynamical characteristics of observations. Unraveling such transitions yields essential information for the understanding of the observed system. The precise detection and basic characterization of underlying changes is therefore of particular importance in environmental sciences. We present a kernel-based Bayesian inference approach to investigate direct as well as indirect climate observations for multiple generic transition events. In order to develop a diagnostic approach designed to capture a variety of natural processes, the basic statistical features of central tendency and dispersion are used to locally approximate a complex time series by a generic transition model. A Bayesian inversion approach is developed to robustly infer on the location and the generic patterns of such a transition. To systematically investigate time series for multiple changes occurring at different temporal scales, the Bayesian inversion is extended to a kernel-based inference approach. By introducing basic kernel measures, the kernel inference results are composed into a proxy probability to a posterior distribution of multiple transitions. Thus, based on a generic transition model a probability expression is derived that is capable to indicate multiple changes within a complex time series. We discuss the method's performance by investigating direct and indirect climate observations. The approach is applied to environmental time series (about 100 a), from the weather station in Tuscaloosa, Alabama, and confirms documented instrumentation changes. Moreover, the approach is used to investigate a set of complex terrigenous dust records from the ODP sites 659, 721/722 and 967 interpreted as climate indicators of the African region of the Plio-Pleistocene period (about 5 Ma). The detailed inference unravels multiple transitions underlying the indirect climate observations coinciding with established global climate events.
Statistical Learning Is Constrained to Less Abstract Patterns in Complex Sensory Input (but not the Least)

PubMed Central

Emberson, Lauren L.; Rubinstein, Dani

2016-01-01

The influence of statistical information on behavior (either through learning or adaptation) is quickly becoming foundational to many domains of cognitive psychology and cognitive neuroscience, from language comprehension to visual development. We investigate a central problem impacting these diverse fields: when encountering input with rich statistical information, are there any constraints on learning? This paper examines learning outcomes when adult learners are given statistical information across multiple levels of abstraction simultaneously: from abstract, semantic categories of everyday objects to individual viewpoints on these objects. After revealing statistical learning of abstract, semantic categories with scrambled individual exemplars (Exp. 1), participants viewed pictures where the categories as well as the individual objects predicted picture order (e.g., bird1—dog1, bird2—dog2). Our findings suggest that participants preferentially encode the relationships between the individual objects, even in the presence of statistical regularities linking semantic categories (Exps. 2 and 3). In a final experiment we investigate whether learners are biased towards learning object-level regularities or simply construct the most detailed model given the data (and therefore best able to predict the specifics of the upcoming stimulus) by investigating whether participants preferentially learn from the statistical regularities linking individual snapshots of objects or the relationship between the objects themselves (e.g., bird_picture1— dog_picture1, bird_picture2—dog_picture2). We find that participants fail to learn the relationships between individual snapshots, suggesting a bias towards object-level statistical regularities as opposed to merely constructing the most complete model of the input. This work moves beyond the previous existence proofs that statistical learning is possible at both very high and very low levels of abstraction (categories vs. individual objects) and suggests that, at least with the current categories and type of learner, there are biases to pick up on statistical regularities between individual objects even when robust statistical information is present at other levels of abstraction. These findings speak directly to emerging theories about how systems supporting statistical learning and prediction operate in our structure-rich environments. Moreover, the theoretical implications of the current work across multiple domains of study is already clear: statistical learning cannot be assumed to be unconstrained even if statistical learning has previously been established at a given level of abstraction when that information is presented in isolation. PMID:27139779
Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

PubMed

Gangnon, Ronald E

2012-03-01

The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.
Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

PubMed Central

Gangnon, Ronald E.

2011-01-01

Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118
Protein structure modeling for CASP10 by multiple layers of global optimization.

PubMed

Joo, Keehyoung; Lee, Juyong; Sim, Sangjin; Lee, Sun Young; Lee, Kiho; Heo, Seungryong; Lee, In-Ho; Lee, Sung Jong; Lee, Jooyoung

2014-02-01

In the template-based modeling (TBM) category of CASP10 experiment, we introduced a new protocol called protein modeling system (PMS) to generate accurate protein structures in terms of side-chains as well as backbone trace. In the new protocol, a global optimization algorithm, called conformational space annealing (CSA), is applied to the three layers of TBM procedure: multiple sequence-structure alignment, 3D chain building, and side-chain re-modeling. For 3D chain building, we developed a new energy function which includes new distance restraint terms of Lorentzian type (derived from multiple templates), and new energy terms that combine (physical) energy terms such as dynamic fragment assembly (DFA) energy, DFIRE statistical potential energy, hydrogen bonding term, etc. These physical energy terms are expected to guide the structure modeling especially for loop regions where no template structures are available. In addition, we developed a new quality assessment method based on random forest machine learning algorithm to screen templates, multiple alignments, and final models. For TBM targets of CASP10, we find that, due to the combination of three stages of CSA global optimizations and quality assessment, the modeling accuracy of PMS improves at each additional stage of the protocol. It is especially noteworthy that the side-chains of the final PMS models are far more accurate than the models in the intermediate steps. Copyright © 2013 Wiley Periodicals, Inc.
Statistical Analysis of CFD Solutions from the 6th AIAA CFD Drag Prediction Workshop

NASA Technical Reports Server (NTRS)

Derlaga, Joseph M.; Morrison, Joseph H.

2017-01-01

A graphical framework is used for statistical analysis of the results from an extensive N- version test of a collection of Reynolds-averaged Navier-Stokes computational uid dynam- ics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using both common and custom grid sequencees as well as multiple turbulence models for the June 2016 6th AIAA CFD Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic con guration for this workshop was the Common Research Model subsonic transport wing- body previously used for both the 4th and 5th Drag Prediction Workshops. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.
Stochastic Analysis and Probabilistic Downscaling of Soil Moisture

NASA Astrophysics Data System (ADS)

Deshon, J. P.; Niemann, J. D.; Green, T. R.; Jones, A. S.

2017-12-01

Soil moisture is a key variable for rainfall-runoff response estimation, ecological and biogeochemical flux estimation, and biodiversity characterization, each of which is useful for watershed condition assessment. These applications require not only accurate, fine-resolution soil-moisture estimates but also confidence limits on those estimates and soil-moisture patterns that exhibit realistic statistical properties (e.g., variance and spatial correlation structure). The Equilibrium Moisture from Topography, Vegetation, and Soil (EMT+VS) model downscales coarse-resolution (9-40 km) soil moisture from satellite remote sensing or land-surface models to produce fine-resolution (10-30 m) estimates. The model was designed to produce accurate deterministic soil-moisture estimates at multiple points, but the resulting patterns do not reproduce the variance or spatial correlation of observed soil-moisture patterns. The primary objective of this research is to generalize the EMT+VS model to produce a probability density function (pdf) for soil moisture at each fine-resolution location and time. Each pdf has a mean that is equal to the deterministic soil-moisture estimate, and the pdf can be used to quantify the uncertainty in the soil-moisture estimates and to simulate soil-moisture patterns. Different versions of the generalized model are hypothesized based on how uncertainty enters the model, whether the uncertainty is additive or multiplicative, and which distributions describe the uncertainty. These versions are then tested by application to four catchments with detailed soil-moisture observations (Tarrawarra, Satellite Station, Cache la Poudre, and Nerrigundah). The performance of the generalized models is evaluated by comparing the statistical properties of the simulated soil-moisture patterns to those of the observations and the deterministic EMT+VS model. The versions of the generalized EMT+VS model with normally distributed stochastic components produce soil-moisture patterns with more realistic statistical properties than the deterministic model. Additionally, the results suggest that the variance and spatial correlation of the stochastic soil-moisture variations do not vary consistently with the spatial-average soil moisture.
Statistical methods for quantitative mass spectrometry proteomic experiments with labeling.

PubMed

Oberg, Ann L; Mahoney, Douglas W

2012-01-01

Mass Spectrometry utilizing labeling allows multiple specimens to be subjected to mass spectrometry simultaneously. As a result, between-experiment variability is reduced. Here we describe use of fundamental concepts of statistical experimental design in the labeling framework in order to minimize variability and avoid biases. We demonstrate how to export data in the format that is most efficient for statistical analysis. We demonstrate how to assess the need for normalization, perform normalization, and check whether it worked. We describe how to build a model explaining the observed values and test for differential protein abundance along with descriptive statistics and measures of reliability of the findings. Concepts are illustrated through the use of three case studies utilizing the iTRAQ 4-plex labeling protocol.
The multiple imputation method: a case study involving secondary data analysis.

PubMed

Walani, Salimah R; Cleland, Charles M

2015-05-01

To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.
Multiple imputation methods for bivariate outcomes in cluster randomised trials.

PubMed

DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R

2016-09-10

Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Combining data visualization and statistical approaches for interpreting measurements and meta-data: Integrating heatmaps, variable clustering, and mixed regression models

EPA Science Inventory

The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
Regression Commonality Analysis: A Technique for Quantitative Theory Building

ERIC Educational Resources Information Center

Nimon, Kim; Reio, Thomas G., Jr.

2011-01-01

When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
The Learning Organization Model across Vocational and Academic Teacher Groups

ERIC Educational Resources Information Center

Park, Joo Ho; Rojewski, Jay W.

2006-01-01

Multiple-group confirmatory factor analysis was used to investigate factorial invariance between vocational and academic teacher groups on a measure of the learning organization concept. Participants were 488 full-time teachers of public trade industry-technical and business schools located within Seoul, South Korea. Statistically significant…
US EPA'S LANDSCAPE ECOLOGY RESEARCH: ASSESSING TRENDS FOR WETLANDS AND SURFACE WATERS USING REMORE SENSING, GIS, AND FIELD-BASED TECHNIQUES

EPA Science Inventory

The US EPA, Environmental Sciences Division-Las Vegas is using a variety of geopspatical and statistical modeling approaches to locate and assess the complex functions of wetland ecosystems. These assessments involve measuring landscape characteristrics and change, at multiple s...
COMBINING EVIDENCE ON AIR POLLUTION AND DAILY MORTALITY FROM 20 LARGEST U.S. CITIES: A HIERARCHICAL MODELING STRATEGY

EPA Science Inventory

Environmental science and management are fed by individual studies of pollution effects, often focused on single locations. Data are encountered data, typically from multiple sources and on different time and spatial scales. Statistical issues including publication bias and m...
Galaxy mergers and gravitational lens statistics

NASA Technical Reports Server (NTRS)

Rix, Hans-Walter; Maoz, Dan; Turner, Edwin L.; Fukugita, Masataka

1994-01-01

We investigate the impact of hierarchical galaxy merging on the statistics of gravitational lensing of distant sources. Since no definite theoretical predictions for the merging history of luminous galaxies exist, we adopt a parameterized prescription, which allows us to adjust the expected number of pieces comprising a typical present galaxy at z approximately 0.65. The existence of global parameter relations for elliptical galaxies and constraints on the evolution of the phase space density in dissipationless mergers, allow us to limit the possible evolution of galaxy lens properties under merging. We draw two lessons from implementing this lens evolution into statistical lens calculations: (1) The total optical depth to multiple imaging (e.g., of quasars) is quite insensitive to merging. (2) Merging leads to a smaller mean separation of observed multiple images. Because merging does not reduce drastically the expected lensing frequency, it cannot make lambda-dominated cosmologies compatible with the existing lensing observations. A comparison with the data from the Hubble Space Telescope (HST) Snapshot Survey shows that models with little or no evolution of the lens population are statistically favored over strong merging scenarios. A specific merging scenario proposed to Toomre can be rejected (95% level) by such a comparison. Some versions of the scenario proposed by Broadhurst, Ellis, & Glazebrook are statistically acceptable.
A statistical forecast model using the time-scale decomposition technique to predict rainfall during flood period over the middle and lower reaches of the Yangtze River Valley

NASA Astrophysics Data System (ADS)

Hu, Yijia; Zhong, Zhong; Zhu, Yimin; Ha, Yao

2018-04-01

In this paper, a statistical forecast model using the time-scale decomposition method is established to do the seasonal prediction of the rainfall during flood period (FPR) over the middle and lower reaches of the Yangtze River Valley (MLYRV). This method decomposites the rainfall over the MLYRV into three time-scale components, namely, the interannual component with the period less than 8 years, the interdecadal component with the period from 8 to 30 years, and the interdecadal component with the period larger than 30 years. Then, the predictors are selected for the three time-scale components of FPR through the correlation analysis. At last, a statistical forecast model is established using the multiple linear regression technique to predict the three time-scale components of the FPR, respectively. The results show that this forecast model can capture the interannual and interdecadal variation of FPR. The hindcast of FPR during 14 years from 2001 to 2014 shows that the FPR can be predicted successfully in 11 out of the 14 years. This forecast model performs better than the model using traditional scheme without time-scale decomposition. Therefore, the statistical forecast model using the time-scale decomposition technique has good skills and application value in the operational prediction of FPR over the MLYRV.
Minimal agent based model for financial markets II. Statistical properties of the linear and multiplicative dynamics

NASA Astrophysics Data System (ADS)

Alfi, V.; Cristelli, M.; Pietronero, L.; Zaccaria, A.

2009-02-01

We present a detailed study of the statistical properties of the Agent Based Model introduced in paper I [Eur. Phys. J. B, DOI: 10.1140/epjb/e2009-00028-4] and of its generalization to the multiplicative dynamics. The aim of the model is to consider the minimal elements for the understanding of the origin of the stylized facts and their self-organization. The key elements are fundamentalist agents, chartist agents, herding dynamics and price behavior. The first two elements correspond to the competition between stability and instability tendencies in the market. The herding behavior governs the possibility of the agents to change strategy and it is a crucial element of this class of models. We consider a linear approximation for the price dynamics which permits a simple interpretation of the model dynamics and, for many properties, it is possible to derive analytical results. The generalized non linear dynamics results to be extremely more sensible to the parameter space and much more difficult to analyze and control. The main results for the nature and self-organization of the stylized facts are, however, very similar in the two cases. The main peculiarity of the non linear dynamics is an enhancement of the fluctuations and a more marked evidence of the stylized facts. We will also discuss some modifications of the model to introduce more realistic elements with respect to the real markets.
Statistical Modeling of Single Target Cell Encapsulation

PubMed Central

Moon, SangJun; Ceyhan, Elvan; Gurkan, Umut Atakan; Demirci, Utkan

2011-01-01

High throughput drop-on-demand systems for separation and encapsulation of individual target cells from heterogeneous mixtures of multiple cell types is an emerging method in biotechnology that has broad applications in tissue engineering and regenerative medicine, genomics, and cryobiology. However, cell encapsulation in droplets is a random process that is hard to control. Statistical models can provide an understanding of the underlying processes and estimation of the relevant parameters, and enable reliable and repeatable control over the encapsulation of cells in droplets during the isolation process with high confidence level. We have modeled and experimentally verified a microdroplet-based cell encapsulation process for various combinations of cell loading and target cell concentrations. Here, we explain theoretically and validate experimentally a model to isolate and pattern single target cells from heterogeneous mixtures without using complex peripheral systems. PMID:21814548
Investigation of Relationship between QBO and Ionospheric Neutral Temperature

NASA Astrophysics Data System (ADS)

Saǧır, Selçuk; Atıcı, Ramazan; Özcan, Osman

2016-07-01

The relationship between Quasi Biennial Oscillation (QBO) measured at 10 hPa altitude and neutral temperature obtained from NRLMSIS-00 model for 90 km altitude of ionosphere is statistically investigated. For this study, multiple-regression model is used. To see effect on neutral temperature of QBO directions, Dummy variables are added to model established. In the results of performed analysis, it is observed that QBO is effected on neutral temperature of ionosphere. It is determined that 57% of variations at neutral temperature can be explainable by QBO. According to the established model, statistical explainable ratio was determined as 1% that it is the highest rate. Also, it is seen that an increase/a decrease of 1 meter per second at QBO give rise to an increase/a decrease of 0,07 K at neutral temperature.

Multicollinearity is a red herring in the search for moderator variables: A guide to interpreting moderated multiple regression models and a critique of Iacobucci, Schneider, Popovich, and Bakamitsos (2016).

PubMed

McClelland, Gary H; Irwin, Julie R; Disatnik, David; Sivan, Liron

2017-02-01

Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship. We show multicollinearity is completely irrelevant for tests of moderator variables. Furthermore, readers of Iacobucci et al. might be confused by a number of their errors. We note those errors, but more positively, we describe a variety of methods researchers might use to test and interpret their moderated multiple regression models, including two-stage testing, mean-centering, spotlighting, orthogonalizing, and floodlighting without regard to putative issues of multicollinearity. We cite a number of recent studies in the psychological literature in which the researchers used these methods appropriately to test, to interpret, and to report their moderated multiple regression models. We conclude with a set of recommendations for the analysis and reporting of moderated multiple regression that should help researchers better understand their models and facilitate generalizations across studies.
Combining super-ensembles and statistical emulation to improve a regional climate and vegetation model

NASA Astrophysics Data System (ADS)

Hawkins, L. R.; Rupp, D. E.; Li, S.; Sarah, S.; McNeall, D. J.; Mote, P.; Betts, R. A.; Wallom, D.

2017-12-01

Changing regional patterns of surface temperature, precipitation, and humidity may cause ecosystem-scale changes in vegetation, altering the distribution of trees, shrubs, and grasses. A changing vegetation distribution, in turn, alters the albedo, latent heat flux, and carbon exchanged with the atmosphere with resulting feedbacks onto the regional climate. However, a wide range of earth-system processes that affect the carbon, energy, and hydrologic cycles occur at sub grid scales in climate models and must be parameterized. The appropriate parameter values in such parameterizations are often poorly constrained, leading to uncertainty in predictions of how the ecosystem will respond to changes in forcing. To better understand the sensitivity of regional climate to parameter selection and to improve regional climate and vegetation simulations, we used a large perturbed physics ensemble and a suite of statistical emulators. We dynamically downscaled a super-ensemble (multiple parameter sets and multiple initial conditions) of global climate simulations using a 25-km resolution regional climate model HadRM3p with the land-surface scheme MOSES2 and dynamic vegetation module TRIFFID. We simultaneously perturbed land surface parameters relating to the exchange of carbon, water, and energy between the land surface and atmosphere in a large super-ensemble of regional climate simulations over the western US. Statistical emulation was used as a computationally cost-effective tool to explore uncertainties in interactions. Regions of parameter space that did not satisfy observational constraints were eliminated and an ensemble of parameter sets that reduce regional biases and span a range of plausible interactions among earth system processes were selected. This study demonstrated that by combining super-ensemble simulations with statistical emulation, simulations of regional climate could be improved while simultaneously accounting for a range of plausible land-atmosphere feedback strengths.
On the use of multiple-point statistics to improve groundwater flow modeling in karst aquifers: A case study from the Hydrogeological Experimental Site of Poitiers, France

NASA Astrophysics Data System (ADS)

Le Coz, Mathieu; Bodin, Jacques; Renard, Philippe

2017-02-01

Limestone aquifers often exhibit complex groundwater flow behaviors resulting from depositional heterogeneities and post-lithification fracturing and karstification. In this study, multiple-point statistics (MPS) was applied to reproduce karst features and to improve groundwater flow modeling. For this purpose, MPS realizations were used in a numerical flow model to simulate the responses to pumping test experiments observed at the Hydrogeological Experimental Site of Poitiers, France. The main flow behaviors evident in the field data were simulated, particularly (i) the early-time inflection of the drawdown signal at certain observation wells and (ii) the convex behavior of the drawdown curves at intermediate times. In addition, it was shown that the spatial structure of the karst features at various scales is critical with regard to the propagation of the depletion wave induced by pumping. Indeed, (i) the spatial shape of the cone of depression is significantly affected by the karst proportion in the vicinity of the pumping well, and (ii) early-time inflection of the drawdown signal occurs only at observation wells crossing locally well-developed karst features.
Exploring the Effects of Stellar Multiplicity on Exoplanet Occurrence Rates

NASA Astrophysics Data System (ADS)

Barclay, Thomas; Shabram, Megan

2017-06-01

Determining the frequency of habitable worlds is a key goal of the Kepler mission. During Kepler's four year investigation it detected thousands of transiting exoplanets with sizes varying from smaller than Mercury to larger than Jupiter. Finding planets was just the first step to determining frequency, and for the past few years the mission team has been modeling the reliability and completeness of the Kepler planet sample. One effect that has not typically been built into occurrence rate statistics is that of stellar multiplicity. If a planet orbits the primary star in a binary or triple star system then the transit depth will be somewhat diluted resulting in a modest underestimation in the planet size. However, if a detected planet orbits a fainter star then the error in measured planet radius can be very significant. We have taken a hypothetical star and planet population and passed that through a Kepler detection model. From this we have derived completeness corrections for a realistic case of a Universe with binary stars and compared that with a model Universe where all stars are single. We report on the impact that binaries have on exoplanet population statistics.
[Factors associated with physical activity among Chinese immigrant women].

PubMed

Cho, Sung-Hye; Lee, Hyeonkyeong

2013-12-01

This study was done to assess the level of physical activity among Chinese immigrant women and to determine the relationships of physical activity with individual characteristics and behavior-specific cognition. A cross-sectional descriptive study was conducted with 161 Chinese immigrant women living in Busan. A health promotion model of physical activity adapted from Pender's Health Promotion Model was used. Self-administered questionnaires were used to collect data during the period from September 25 to November 20, 2012. Using SPSS 18.0 program, descriptive statistics, t-test, analysis of variance, correlation analysis, and multiple regression analysis were done. The average level of physical activity of the Chinese immigrant women was 1,050.06 ± 686.47 MET-min/week and the minimum activity among types of physical activity was most dominant (59.6%). As a result of multiple regression analysis, it was confirmed that self-efficacy and acculturation were statistically significant variables in the model (p<.001), with an explanatory power of 23.7%. The results indicate that the development and application of intervention strategies to increase acculturation and self-efficacy for immigrant women will aid in increasing the physical activity in Chinese immigrant women.
Multivariate space - time analysis of PRE-STORM precipitation

NASA Technical Reports Server (NTRS)

Polyak, Ilya; North, Gerald R.; Valdes, Juan B.

1994-01-01

This paper presents the methodologies and results of the multivariate modeling and two-dimensional spectral and correlation analysis of PRE-STORM rainfall gauge data. Estimated parameters of the models for the specific spatial averages clearly indicate the eastward and southeastward wave propagation of rainfall fluctuations. A relationship between the coefficients of the diffusion equation and the parameters of the stochastic model of rainfall fluctuations is derived that leads directly to the exclusive use of rainfall data to estimate advection speed (about 12 m/s) as well as other coefficients of the diffusion equation of the corresponding fields. The statistical methodology developed here can be used for confirmation of physical models by comparison of the corresponding second-moment statistics of the observed and simulated data, for generating multiple samples of any size, for solving the inverse problem of the hydrodynamic equations, and for application in some other areas of meteorological and climatological data analysis and modeling.
Robust Lee local statistic filter for removal of mixed multiplicative and impulse noise

NASA Astrophysics Data System (ADS)

Ponomarenko, Nikolay N.; Lukin, Vladimir V.; Egiazarian, Karen O.; Astola, Jaakko T.

2004-05-01

A robust version of Lee local statistic filter able to effectively suppress the mixed multiplicative and impulse noise in images is proposed. The performance of the proposed modification is studied for a set of test images, several values of multiplicative noise variance, Gaussian and Rayleigh probability density functions of speckle, and different characteris-tics of impulse noise. The advantages of the designed filter in comparison to the conventional Lee local statistic filter and some other filters able to cope with mixed multiplicative+impulse noise are demonstrated.
Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis

PubMed Central

Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.

2006-01-01

In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709
Bayesian modelling of lung function data from multiple-breath washout tests.

PubMed

Mahar, Robert K; Carlin, John B; Ranganathan, Sarath; Ponsonby, Anne-Louise; Vuillermin, Peter; Vukcevic, Damjan

2018-05-30

Paediatric respiratory researchers have widely adopted the multiple-breath washout (MBW) test because it allows assessment of lung function in unsedated infants and is well suited to longitudinal studies of lung development and disease. However, a substantial proportion of MBW tests in infants fail current acceptability criteria. We hypothesised that a model-based approach to analysing the data, in place of traditional simple empirical summaries, would enable more efficient use of these tests. We therefore developed a novel statistical model for infant MBW data and applied it to 1197 tests from 432 individuals from a large birth cohort study. We focus on Bayesian estimation of the lung clearance index, the most commonly used summary of lung function from MBW tests. Our results show that the model provides an excellent fit to the data and shed further light on statistical properties of the standard empirical approach. Furthermore, the modelling approach enables the lung clearance index to be estimated by using tests with different degrees of completeness, something not possible with the standard approach. Our model therefore allows previously unused data to be used rather than discarded, as well as routine use of shorter tests without significant loss of precision. Beyond our specific application, our work illustrates a number of important aspects of Bayesian modelling in practice, such as the importance of hierarchical specifications to account for repeated measurements and the value of model checking via posterior predictive distributions. Copyright © 2018 John Wiley & Sons, Ltd.
Enhancing the mathematical properties of new haplotype homozygosity statistics for the detection of selective sweeps.

PubMed

Garud, Nandita R; Rosenberg, Noah A

2015-06-01

Soft selective sweeps represent an important form of adaptation in which multiple haplotypes bearing adaptive alleles rise to high frequency. Most statistical methods for detecting selective sweeps from genetic polymorphism data, however, have focused on identifying hard selective sweeps in which a favored allele appears on a single haplotypic background; these methods might be underpowered to detect soft sweeps. Among exceptions is the set of haplotype homozygosity statistics introduced for the detection of soft sweeps by Garud et al. (2015). These statistics, examining frequencies of multiple haplotypes in relation to each other, include H12, a statistic designed to identify both hard and soft selective sweeps, and H2/H1, a statistic that conditional on high H12 values seeks to distinguish between hard and soft sweeps. A challenge in the use of H2/H1 is that its range depends on the associated value of H12, so that equal H2/H1 values might provide different levels of support for a soft sweep model at different values of H12. Here, we enhance the H12 and H2/H1 haplotype homozygosity statistics for selective sweep detection by deriving the upper bound on H2/H1 as a function of H12, thereby generating a statistic that normalizes H2/H1 to lie between 0 and 1. Through a reanalysis of resequencing data from inbred lines of Drosophila, we show that the enhanced statistic both strengthens interpretations obtained with the unnormalized statistic and leads to empirical insights that are less readily apparent without the normalization. Copyright © 2015 Elsevier Inc. All rights reserved.
GPU-computing in econophysics and statistical physics

NASA Astrophysics Data System (ADS)

Preis, T.

2011-03-01

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
Statistical Deviations From the Theoretical Only-SBU Model to Estimate MCU Rates in SRAMs

NASA Astrophysics Data System (ADS)

Franco, Francisco J.; Clemente, Juan Antonio; Baylac, Maud; Rey, Solenne; Villa, Francesca; Mecha, Hortensia; Agapito, Juan A.; Puchner, Helmut; Hubert, Guillaume; Velazco, Raoul

2017-08-01

This paper addresses a well-known problem that occurs when memories are exposed to radiation: the determination if a bit flip is isolated or if it belongs to a multiple event. As it is unusual to know the physical layout of the memory, this paper proposes to evaluate the statistical properties of the sets of corrupted addresses and to compare the results with a mathematical prediction model where all of the events are single bit upsets. A set of rules easy to implement in common programming languages can be iteratively applied if anomalies are observed, thus yielding a classification of errors quite closer to reality (more than 80% accuracy in our experiments).
Identifying when tagged fishes have been consumed by piscivorous predators: application of multivariate mixture models to movement parameters of telemetered fishes

USGS Publications Warehouse

Romine, Jason G.; Perry, Russell W.; Johnston, Samuel V.; Fitzer, Christopher W.; Pagliughi, Stephen W.; Blake, Aaron R.

2013-01-01

Mixture models proved valuable as a means to differentiate between salmonid smolts and predators that consumed salmonid smolts. However, successful application of this method requires that telemetered fishes and their predators exhibit measurable differences in movement behavior. Our approach is flexible, allows inclusion of multiple track statistics and improves upon rule-based manual classification methods.
Masquerade Detection Using a Taxonomy-Based Multinomial Modeling Approach in UNIX Systems

DTIC Science & Technology

2008-08-25

primarily the modeling of statistical features , such as the frequency of events, the duration of events, the co- occurrence of multiple events...are identified, we can extract features representing such behavior while auditing the user’s behavior. Figure1: Taxonomy of Linux and Unix...achieved when the features are extracted just from simple commands. Method Hit Rate False Positive Rate ocSVM using simple cmds (freq.-based
Gene genealogies for genetic association mapping, with application to Crohn's disease

PubMed Central

Burkett, Kelly M.; Greenwood, Celia M. T.; McNeney, Brad; Graham, Jinko

2013-01-01

A gene genealogy describes relationships among haplotypes sampled from a population. Knowledge of the gene genealogy for a set of haplotypes is useful for estimation of population genetic parameters and it also has potential application in finding disease-predisposing genetic variants. As the true gene genealogy is unknown, Markov chain Monte Carlo (MCMC) approaches have been used to sample genealogies conditional on data at multiple genetic markers. We previously implemented an MCMC algorithm to sample from an approximation to the distribution of the gene genealogy conditional on haplotype data. Our approach samples ancestral trees, recombination and mutation rates at a genomic focal point. In this work, we describe how our sampler can be used to find disease-predisposing genetic variants in samples of cases and controls. We use a tree-based association statistic that quantifies the degree to which case haplotypes are more closely related to each other around the focal point than control haplotypes, without relying on a disease model. As the ancestral tree is a latent variable, so is the tree-based association statistic. We show how the sampler can be used to estimate the posterior distribution of the latent test statistic and corresponding latent p-values, which together comprise a fuzzy p-value. We illustrate the approach on a publicly-available dataset from a study of Crohn's disease that consists of genotypes at multiple SNP markers in a small genomic region. We estimate the posterior distribution of the tree-based association statistic and the recombination rate at multiple focal points in the region. Reassuringly, the posterior mean recombination rates estimated at the different focal points are consistent with previously published estimates. The tree-based association approach finds multiple sub-regions where the case haplotypes are more genetically related than the control haplotypes, and that there may be one or multiple disease-predisposing loci. PMID:24348515
Additive hazards regression and partial likelihood estimation for ecological monitoring data across space.

PubMed

Lin, Feng-Chang; Zhu, Jun

2012-01-01

We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.
Statistical Forecasting of Current and Future Circum-Arctic Ground Temperatures and Active Layer Thickness

NASA Astrophysics Data System (ADS)

Aalto, J.; Karjalainen, O.; Hjort, J.; Luoto, M.

2018-05-01

Mean annual ground temperature (MAGT) and active layer thickness (ALT) are key to understanding the evolution of the ground thermal state across the Arctic under climate change. Here a statistical modeling approach is presented to forecast current and future circum-Arctic MAGT and ALT in relation to climatic and local environmental factors, at spatial scales unreachable with contemporary transient modeling. After deploying an ensemble of multiple statistical techniques, distance-blocked cross validation between observations and predictions suggested excellent and reasonable transferability of the MAGT and ALT models, respectively. The MAGT forecasts indicated currently suitable conditions for permafrost to prevail over an area of 15.1 ± 2.8 × 106 km2. This extent is likely to dramatically contract in the future, as the results showed consistent, but region-specific, changes in ground thermal regime due to climate change. The forecasts provide new opportunities to assess future Arctic changes in ground thermal state and biogeochemical feedback.
Model Independence in Downscaled Climate Projections: a Case Study in the Southeast United States

NASA Astrophysics Data System (ADS)

Gray, G. M. E.; Boyles, R.

2016-12-01

Downscaled climate projections are used to deduce how the climate will change in future decades at local and regional scales. It is important to use multiple models to characterize part of the future uncertainty given the impact on adaptation decision making. This is traditionally employed through an equally-weighted ensemble of multiple GCMs downscaled using one technique. Newer practices include several downscaling techniques in an effort to increase the ensemble's representation of future uncertainty. However, this practice may be adding statistically dependent models to the ensemble. Previous research has shown a dependence problem in the GCM ensemble in multiple generations, but has not been shown in the downscaled ensemble. In this case study, seven downscaled climate projections on the daily time scale are considered: CLAREnCE10, SERAP, BCCA (CMIP5 and CMIP3 versions), Hostetler, CCR, and MACA-LIVNEH. These data represent 83 ensemble members, 44 GCMs, and two generations of GCMs. Baseline periods are compared against the University of Idaho's METDATA gridded observation dataset. Hierarchical agglomerative clustering is applied to the correlated errors to determine dependent clusters. Redundant GCMs across different downscaling techniques show the most dependence, while smaller dependence signals are detected within downscaling datasets and across generations of GCMs. These results indicate that using additional downscaled projections to increase the ensemble size must be done with care to avoid redundant GCMs and the process of downscaling may increase the dependence of those downscaled GCMs. Climate model generation does not appear dissimilar enough to be treated as two separate statistical populations for ensemble building at the local and regional scales.
Statistical Downscaling of WRF-Chem Model: An Air Quality Analysis over Bogota, Colombia

NASA Astrophysics Data System (ADS)

Kumar, Anikender; Rojas, Nestor

2015-04-01

Statistical downscaling is a technique that is used to extract high-resolution information from regional scale variables produced by coarse resolution models such as Chemical Transport Models (CTMs). The fully coupled WRF-Chem (Weather Research and Forecasting with Chemistry) model is used to simulate air quality over Bogota. Bogota is a tropical Andean megacity located over a high-altitude plateau in the middle of very complex terrain. The WRF-Chem model was adopted for simulating the hourly ozone concentrations. The computational domains were chosen of 120x120x32, 121x121x32 and 121x121x32 grid points with horizontal resolutions of 27, 9 and 3 km respectively. The model was initialized with real boundary conditions using NCAR-NCEP's Final Analysis (FNL) and a 1ox1o (~111 km x 111 km) resolution. Boundary conditions were updated every 6 hours using reanalysis data. The emission rates were obtained from global inventories, namely the REanalysis of the TROpospheric (RETRO) chemical composition and the Emission Database for Global Atmospheric Research (EDGAR). Multiple linear regression and artificial neural network techniques are used to downscale the model output at each monitoring stations. The results confirm that the statistically downscaled outputs reduce simulated errors by up to 25%. This study provides a general overview of statistical downscaling of chemical transport models and can constitute a reference for future air quality modeling exercises over Bogota and other Colombian cities.
Estimating urban ground-level PM10 using MODIS 3km AOD product and meteorological parameters from WRF model

NASA Astrophysics Data System (ADS)

Ghotbi, Saba; Sotoudeheian, Saeed; Arhami, Mohammad

2016-09-01

Satellite remote sensing products of AOD from MODIS along with appropriate meteorological parameters were used to develop statistical models and estimate ground-level PM10. Most of previous studies obtained meteorological data from synoptic weather stations, with rather sparse spatial distribution, and used it along with 10 km AOD product to develop statistical models, applicable for PM variations in regional scale (resolution of ≥10 km). In the current study, meteorological parameters were simulated with 3 km resolution using WRF model and used along with the rather new 3 km AOD product (launched in 2014). The resulting PM statistical models were assessed for a polluted and largely variable urban area, Tehran, Iran. Despite the critical particulate pollution problem, very few PM studies were conducted in this area. The issue of rather poor direct PM-AOD associations existed, due to different factors such as variations in particles optical properties, in addition to bright background issue for satellite data, as the studied area located in the semi-arid areas of Middle East. Statistical approach of linear mixed effect (LME) was used, and three types of statistical models including single variable LME model (using AOD as independent variable) and multiple variables LME model by using meteorological data from two sources, WRF model and synoptic stations, were examined. Meteorological simulations were performed using a multiscale approach and creating an appropriate physic for the studied region, and the results showed rather good agreements with recordings of the synoptic stations. The single variable LME model was able to explain about 61%-73% of daily PM10 variations, reflecting a rather acceptable performance. Statistical models performance improved through using multivariable LME and incorporating meteorological data as auxiliary variables, particularly by using fine resolution outputs from WRF (R2 = 0.73-0.81). In addition, rather fine resolution for PM estimates was mapped for the studied city, and resulting concentration maps were consistent with PM recordings at the existing stations.

Pilot points method for conditioning multiple-point statistical facies simulation on flow data

NASA Astrophysics Data System (ADS)

Ma, Wei; Jafarpour, Behnam

2018-05-01

We propose a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and then are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) is adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at selected locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.
MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data.

PubMed

Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping

2016-05-15

Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Social Ecology, Genomics, and African American Health: A Nonlinear Dynamical Perspective

PubMed Central

Madhere, Serge; Harrell, Jules; Royal, Charmaine D. M.

2009-01-01

This article offers a model that clarifies the degree of interdependence between social ecology and genomic processes. Drawing on principles from nonlinear dynamics, the model delineates major lines of bifurcation involving people's habitat, their family health history, and collective catastrophes experienced by their community. It shows how mechanisms of resource acquisition, depletion, and preservation can lead to disruptions in basic metabolism and in the activity of cytokines, neurotransmitters, and protein kinases, thus giving impetus to epigenetic changes. The hypotheses generated from the model are discussed throughout the article for their relevance to health problems among African Americans. Where appropriate, they are examined in light of data from the National Vital Statistics System. Multiple health outcomes are considered. For any one of them, the model makes clear the unique and converging contributions of multiple antecedent factors. PMID:19672481
Statistical modelling of networked human-automation performance using working memory capacity.

PubMed

Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja

2014-01-01

This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.
Identifying Node Role in Social Network Based on Multiple Indicators

PubMed Central

Huang, Shaobin; Lv, Tianyang; Zhang, Xizhe; Yang, Yange; Zheng, Weimin; Wen, Chao

2014-01-01

It is a classic topic of social network analysis to evaluate the importance of nodes and identify the node that takes on the role of core or bridge in a network. Because a single indicator is not sufficient to analyze multiple characteristics of a node, it is a natural solution to apply multiple indicators that should be selected carefully. An intuitive idea is to select some indicators with weak correlations to efficiently assess different characteristics of a node. However, this paper shows that it is much better to select the indicators with strong correlations. Because indicator correlation is based on the statistical analysis of a large number of nodes, the particularity of an important node will be outlined if its indicator relationship doesn't comply with the statistical correlation. Therefore, the paper selects the multiple indicators including degree, ego-betweenness centrality and eigenvector centrality to evaluate the importance and the role of a node. The importance of a node is equal to the normalized sum of its three indicators. A candidate for core or bridge is selected from the great degree nodes or the nodes with great ego-betweenness centrality respectively. Then, the role of a candidate is determined according to the difference between its indicators' relationship with the statistical correlation of the overall network. Based on 18 real networks and 3 kinds of model networks, the experimental results show that the proposed methods perform quite well in evaluating the importance of nodes and in identifying the node role. PMID:25089823
A spatial scan statistic for compound Poisson data.

PubMed

Rosychuk, Rhonda J; Chang, Hsing-Ming

2013-12-20

The topic of spatial cluster detection gained attention in statistics during the late 1980s and early 1990s. Effort has been devoted to the development of methods for detecting spatial clustering of cases and events in the biological sciences, astronomy and epidemiology. More recently, research has examined detecting clusters of correlated count data associated with health conditions of individuals. Such a method allows researchers to examine spatial relationships of disease-related events rather than just incident or prevalent cases. We introduce a spatial scan test that identifies clusters of events in a study region. Because an individual case may have multiple (repeated) events, we base the test on a compound Poisson model. We illustrate our method for cluster detection on emergency department visits, where individuals may make multiple disease-related visits. Copyright © 2013 John Wiley & Sons, Ltd.
Detector-Response Correction of Two-Dimensional γ -Ray Spectra from Neutron Capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rusev, G.; Jandel, M.; Arnold, C. W.

2015-05-28

The neutron-capture reaction produces a large variety of γ-ray cascades with different γ-ray multiplicities. A measured spectral distribution of these cascades for each γ-ray multiplicity is of importance to applications and studies of γ-ray statistical properties. The DANCE array, a 4π ball of 160 BaF 2 detectors, is an ideal tool for measurement of neutron-capture γ-rays. The high granularity of DANCE enables measurements of high-multiplicity γ-ray cascades. The measured two-dimensional spectra (γ-ray energy, γ-ray multiplicity) have to be corrected for the DANCE detector response in order to compare them with predictions of the statistical model or use them in applications.more » The detector-response correction problem becomes more difficult for a 4π detection system than for a single detector. A trial and error approach and an iterative decomposition of γ-ray multiplets, have been successfully applied to the detector-response correction. As a result, applications of the decomposition methods are discussed for two-dimensional γ-ray spectra measured at DANCE from γ-ray sources and from the 10B(n, γ) and 113Cd(n, γ) reactions.« less
Confronting weather and climate models with observational data from soil moisture networks over the United States

PubMed Central

Dirmeyer, Paul A.; Wu, Jiexia; Norton, Holly E.; Dorigo, Wouter A.; Quiring, Steven M.; Ford, Trenton W.; Santanello, Joseph A.; Bosilovich, Michael G.; Ek, Michael B.; Koster, Randal D.; Balsamo, Gianpaolo; Lawrence, David M.

2018-01-01

Four land surface models in uncoupled and coupled configurations are compared to observations of daily soil moisture from 19 networks in the conterminous United States to determine the viability of such comparisons and explore the characteristics of model and observational data. First, observations are analyzed for error characteristics and representation of spatial and temporal variability. Some networks have multiple stations within an area comparable to model grid boxes; for those we find that aggregation of stations before calculation of statistics has little effect on estimates of variance, but soil moisture memory is sensitive to aggregation. Statistics for some networks stand out as unlike those of their neighbors, likely due to differences in instrumentation, calibration and maintenance. Buried sensors appear to have less random error than near-field remote sensing techniques, and heat dissipation sensors show less temporal variability than other types. Model soil moistures are evaluated using three metrics: standard deviation in time, temporal correlation (memory) and spatial correlation (length scale). Models do relatively well in capturing large-scale variability of metrics across climate regimes, but poorly reproduce observed patterns at scales of hundreds of kilometers and smaller. Uncoupled land models do no better than coupled model configurations, nor do reanalyses outperform free-running models. Spatial decorrelation scales are found to be difficult to diagnose. Using data for model validation, calibration or data assimilation from multiple soil moisture networks with different types of sensors and measurement techniques requires great caution. Data from models and observations should be put on the same spatial and temporal scales before comparison. PMID:29645013
Confronting Weather and Climate Models with Observational Data from Soil Moisture Networks over the United States

NASA Technical Reports Server (NTRS)

Dirmeyer, Paul A.; Wu, Jiexia; Norton, Holly E.; Dorigo, Wouter A.; Quiring, Steven M.; Ford, Trenton W.; Santanello, Joseph A., Jr.; Bosilovich, Michael G.; Ek, Michael B.; Koster, Randal Dean;

2016-01-01

Four land surface models in uncoupled and coupled configurations are compared to observations of daily soil moisture from 19 networks in the conterminous United States to determine the viability of such comparisons and explore the characteristics of model and observational data. First, observations are analyzed for error characteristics and representation of spatial and temporal variability. Some networks have multiple stations within an area comparable to model grid boxes; for those we find that aggregation of stations before calculation of statistics has little effect on estimates of variance, but soil moisture memory is sensitive to aggregation. Statistics for some networks stand out as unlike those of their neighbors, likely due to differences in instrumentation, calibration and maintenance. Buried sensors appear to have less random error than near-field remote sensing techniques, and heat dissipation sensors show less temporal variability than other types. Model soil moistures are evaluated using three metrics: standard deviation in time, temporal correlation (memory) and spatial correlation (length scale). Models do relatively well in capturing large-scale variability of metrics across climate regimes, but poorly reproduce observed patterns at scales of hundreds of kilometers and smaller. Uncoupled land models do no better than coupled model configurations, nor do reanalyses out perform free-running models. Spatial decorrelation scales are found to be difficult to diagnose. Using data for model validation, calibration or data assimilation from multiple soil moisture networks with different types of sensors and measurement techniques requires great caution. Data from models and observations should be put on the same spatial and temporal scales before comparison.

Confronting weather and climate models with observational data from soil moisture networks over the United States.

PubMed

Dirmeyer, Paul A; Wu, Jiexia; Norton, Holly E; Dorigo, Wouter A; Quiring, Steven M; Ford, Trenton W; Santanello, Joseph A; Bosilovich, Michael G; Ek, Michael B; Koster, Randal D; Balsamo, Gianpaolo; Lawrence, David M

2016-04-01

Four land surface models in uncoupled and coupled configurations are compared to observations of daily soil moisture from 19 networks in the conterminous United States to determine the viability of such comparisons and explore the characteristics of model and observational data. First, observations are analyzed for error characteristics and representation of spatial and temporal variability. Some networks have multiple stations within an area comparable to model grid boxes; for those we find that aggregation of stations before calculation of statistics has little effect on estimates of variance, but soil moisture memory is sensitive to aggregation. Statistics for some networks stand out as unlike those of their neighbors, likely due to differences in instrumentation, calibration and maintenance. Buried sensors appear to have less random error than near-field remote sensing techniques, and heat dissipation sensors show less temporal variability than other types. Model soil moistures are evaluated using three metrics: standard deviation in time, temporal correlation (memory) and spatial correlation (length scale). Models do relatively well in capturing large-scale variability of metrics across climate regimes, but poorly reproduce observed patterns at scales of hundreds of kilometers and smaller. Uncoupled land models do no better than coupled model configurations, nor do reanalyses outperform free-running models. Spatial decorrelation scales are found to be difficult to diagnose. Using data for model validation, calibration or data assimilation from multiple soil moisture networks with different types of sensors and measurement techniques requires great caution. Data from models and observations should be put on the same spatial and temporal scales before comparison.
A Semiparametric Approach for Composite Functional Mapping of Dynamic Quantitative Traits

PubMed Central

Yang, Runqing; Gao, Huijiang; Wang, Xin; Zhang, Ji; Zeng, Zhao-Bang; Wu, Rongling

2007-01-01

Functional mapping has emerged as a powerful tool for mapping quantitative trait loci (QTL) that control developmental patterns of complex dynamic traits. Original functional mapping has been constructed within the context of simple interval mapping, without consideration of separate multiple linked QTL for a dynamic trait. In this article, we present a statistical framework for mapping QTL that affect dynamic traits by capitalizing on the strengths of functional mapping and composite interval mapping. Within this so-called composite functional-mapping framework, functional mapping models the time-dependent genetic effects of a QTL tested within a marker interval using a biologically meaningful parametric function, whereas composite interval mapping models the time-dependent genetic effects of the markers outside the test interval to control the genome background using a flexible nonparametric approach based on Legendre polynomials. Such a semiparametric framework was formulated by a maximum-likelihood model and implemented with the EM algorithm, allowing for the estimation and the test of the mathematical parameters that define the QTL effects and the regression coefficients of the Legendre polynomials that describe the marker effects. Simulation studies were performed to investigate the statistical behavior of composite functional mapping and compare its advantage in separating multiple linked QTL as compared to functional mapping. We used the new mapping approach to analyze a genetic mapping example in rice, leading to the identification of multiple QTL, some of which are linked on the same chromosome, that control the developmental trajectory of leaf age. PMID:17947431
A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging

PubMed Central

Logsdon, Benjamin A.; Carty, Cara L.; Reiner, Alexander P.; Dai, James Y.; Kooperberg, Charles

2012-01-01

Motivation: For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm. Results: We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort. Availability: An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html. Contact: blogsdon@fhcrc.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22563072
multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions.

PubMed

Kang, Guangliang; Du, Li; Zhang, Hong

2016-06-22

The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.
Heterogeneous Structure of Stem Cells Dynamics: Statistical Models and Quantitative Predictions

PubMed Central

Bogdan, Paul; Deasy, Bridget M.; Gharaibeh, Burhan; Roehrs, Timo; Marculescu, Radu

2014-01-01

Understanding stem cell (SC) population dynamics is essential for developing models that can be used in basic science and medicine, to aid in predicting cells fate. These models can be used as tools e.g. in studying patho-physiological events at the cellular and tissue level, predicting (mal)functions along the developmental course, and personalized regenerative medicine. Using time-lapsed imaging and statistical tools, we show that the dynamics of SC populations involve a heterogeneous structure consisting of multiple sub-population behaviors. Using non-Gaussian statistical approaches, we identify the co-existence of fast and slow dividing subpopulations, and quiescent cells, in stem cells from three species. The mathematical analysis also shows that, instead of developing independently, SCs exhibit a time-dependent fractal behavior as they interact with each other through molecular and tactile signals. These findings suggest that more sophisticated models of SC dynamics should view SC populations as a collective and avoid the simplifying homogeneity assumption by accounting for the presence of more than one dividing sub-population, and their multi-fractal characteristics. PMID:24769917
Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea

NASA Astrophysics Data System (ADS)

Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng

2011-11-01

SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.
A statistical approach for generating synthetic tip stress data from limited CPT soundings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Basalams, M.K.

CPT tip stress data obtained from a Uranium mill tailings impoundment are treated as time series. A statistical class of models that was developed to model time series is explored to investigate its applicability in modeling the tip stress series. These models were developed by Box and Jenkins (1970) and are known as Autoregressive Moving Average (ARMA) models. This research demonstrates how to apply the ARMA models to tip stress series. Generation of synthetic tip stress series that preserve the main statistical characteristics of the measured series is also investigated. Multiple regression analysis is used to model the regional variationmore » of the ARMA model parameters as well as the regional variation of the mean and the standard deviation of the measured tip stress series. The reliability of the generated series is investigated from a geotechnical point of view as well as from a statistical point of view. Estimation of the total settlement using the measured and the generated series subjected to the same loading condition are performed. The variation of friction angle with depth of the impoundment materials is also investigated. This research shows that these series can be modeled by the Box and Jenkins ARMA models. A third degree Autoregressive model AR(3) is selected to represent these series. A theoretical double exponential density function is fitted to the AR(3) model residuals. Synthetic tip stress series are generated at nearby locations. The generated series are shown to be reliable in estimating the total settlement and the friction angle variation with depth for this particular site.« less
Predicting recreational water quality advisories: A comparison of statistical methods

USGS Publications Warehouse

Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

2016-01-01

Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.
A Wave Chaotic Study of Quantum Graphs with Microwave Networks

NASA Astrophysics Data System (ADS)

Fu, Ziyuan

Quantum graphs provide a setting to test the hypothesis that all ray-chaotic systems show universal wave chaotic properties. I study the quantum graphs with a wave chaotic approach. Here, an experimental setup consisting of a microwave coaxial cable network is used to simulate quantum graphs. Some basic features and the distributions of impedance statistics are analyzed from experimental data on an ensemble of tetrahedral networks. The random coupling model (RCM) is applied in an attempt to uncover the universal statistical properties of the system. Deviations from RCM predictions have been observed in that the statistics of diagonal and off-diagonal impedance elements are different. Waves trapped due to multiple reflections on bonds between nodes in the graph most likely cause the deviations from universal behavior in the finite-size realization of a quantum graph. In addition, I have done some investigations on the Random Coupling Model, which are useful for further research.
Statistical post-processing of seasonal multi-model forecasts: Why is it so hard to beat the multi-model mean?

NASA Astrophysics Data System (ADS)

Siegert, Stefan

2017-04-01

Initialised climate forecasts on seasonal time scales, run several months or even years ahead, are now an integral part of the battery of products offered by climate services world-wide. The availability of seasonal climate forecasts from various modeling centres gives rise to multi-model ensemble forecasts. Post-processing such seasonal-to-decadal multi-model forecasts is challenging 1) because the cross-correlation structure between multiple models and observations can be complicated, 2) because the amount of training data to fit the post-processing parameters is very limited, and 3) because the forecast skill of numerical models tends to be low on seasonal time scales. In this talk I will review new statistical post-processing frameworks for multi-model ensembles. I will focus particularly on Bayesian hierarchical modelling approaches, which are flexible enough to capture commonly made assumptions about collective and model-specific biases of multi-model ensembles. Despite the advances in statistical methodology, it turns out to be very difficult to out-perform the simplest post-processing method, which just recalibrates the multi-model ensemble mean by linear regression. I will discuss reasons for this, which are closely linked to the specific characteristics of seasonal multi-model forecasts. I explore possible directions for improvements, for example using informative priors on the post-processing parameters, and jointly modelling forecasts and observations.
A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites

NASA Astrophysics Data System (ADS)

Wang, Q. J.; Robertson, D. E.; Chiew, F. H. S.

2009-05-01

Seasonal forecasting of streamflows can be highly valuable for water resources management. In this paper, a Bayesian joint probability (BJP) modeling approach for seasonal forecasting of streamflows at multiple sites is presented. A Box-Cox transformed multivariate normal distribution is proposed to model the joint distribution of future streamflows and their predictors such as antecedent streamflows and El Niño-Southern Oscillation indices and other climate indicators. Bayesian inference of model parameters and uncertainties is implemented using Markov chain Monte Carlo sampling, leading to joint probabilistic forecasts of streamflows at multiple sites. The model provides a parametric structure for quantifying relationships between variables, including intersite correlations. The Box-Cox transformed multivariate normal distribution has considerable flexibility for modeling a wide range of predictors and predictands. The Bayesian inference formulated allows the use of data that contain nonconcurrent and missing records. The model flexibility and data-handling ability means that the BJP modeling approach is potentially of wide practical application. The paper also presents a number of statistical measures and graphical methods for verification of probabilistic forecasts of continuous variables. Results for streamflows at three river gauges in the Murrumbidgee River catchment in southeast Australia show that the BJP modeling approach has good forecast quality and that the fitted model is consistent with observed data.

Dose-rate effects of ethylene oxide exposure on developmental toxicity.

PubMed

Weller, E; Long, N; Smith, A; Williams, P; Ravi, S; Gill, J; Henessey, R; Skornik, W; Brain, J; Kimmel, C; Kimmel, G; Holmes, L; Ryan, L

1999-08-01

In risk assessment, evaluating a health effect at a duration of exposure that is untested involves assuming that equivalent multiples of concentration (C) and duration (T) of exposure have the same effect. The limitations of this approach (attributed to F. Haber, Zur Geschichte des Gaskrieges [On the history of gas warfare], in Funf Vortrage aus den Jahren 1920-1923 [Five lectures from the years 1920-1923], 1924, Springer, Berlin, pp. 76-92), have been noted in several studies. The study presented in this paper was designed to specifically look at dose-rate (C x T) effects, and it forms an ideal case study to implement statistical models and to examine the statistical issues in risk assessment. Pregnant female C57BL/6J mice were exposed, on gestational day 7, to ethylene oxide (EtO) via inhalation for 1.5, 3, or 6 h at exposures that result in C x T multiples of 2100 or 2700 ppm-h. EtO was selected because of its short half-life, documented developmental toxicity, and relevance to exposures that occur in occupational settings. Concurrent experiments were run with animals exposed to air for similar periods. Statistical analysis using models developed to assess dose-rate effects revealed significant effects with respect to fetal death and resorptions, malformations, crown-to-rump length, and fetal weight. Animals exposed to short, high exposures of EtO on day 7 of gestation were found to have more adverse effects than animals exposed to the same C x T multiple but at longer, lower exposures. The implication for risk assessment is that applying Haber's Law could potentially lead to an underestimation of risk at a shorter duration of exposure and an overestimation of risk at a longer duration of exposure. Further research, toxicological and statistical, are required to understand the mechanism of the dose-rate effects, and how to incorporate the mechanistic information into the risk assessment decision process.
LP-search and its use in analysis of the accuracy of control systems with acoustical models

NASA Technical Reports Server (NTRS)

Sergeyev, V. I.; Sobol, I. M.; Statnikov, R. B.; Statnikov, I. N.

1973-01-01

The LP-search is proposed as an analog of the Monte Carlo method for finding values in nonlinear statistical systems. It is concluded that: To attain the required accuracy in solution to the problem of control for a statistical system in the LP-search, a considerably smaller number of tests is required than in the Monte Carlo method. The LP-search allows the possibility of multiple repetitions of tests under identical conditions and observability of the output variables of the system.
iASeq: integrative analysis of allele-specificity of protein-DNA interactions in multiple ChIP-seq datasets

PubMed Central

2012-01-01

Background ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. Results We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. Conclusions iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB. PMID:23194258
Using Algal Metrics and Biomass to Evaluate Multiple Ways of Defining Concentration-Based Nutrient Criteria in Streams and their Ecological Relevance

EPA Science Inventory

We examined the utility of nutrient criteria derived solely from total phosphorus (TP) concentrations in streams (regression models and percentile distributions) and evaluated their ecological relevance to diatom and algal biomass responses. We used a variety of statistics to cha...
Leveraging non-targeted metabolite profiling via statistical genomics

USDA-ARS?s Scientific Manuscript database

One of the challenges of systems biology is to integrate multiple sources of data in order to build a cohesive view of the system of study. Here we describe the mass spectrometry based profiling of maize kernels, a model system for genomic studies and a cornerstone of the agroeconomy. Using a networ...
Assessing the Chances of Success: Naive Statistics versus Kind Experience

ERIC Educational Resources Information Center

Hogarth, Robin M.; Mukherjee, Kanchan; Soyer, Emre

2013-01-01

Additive integration of information is ubiquitous in judgment and has been shown to be effective even when multiplicative rules of probability theory are prescribed. We explore the generality of these findings in the context of estimating probabilities of success in contests. We first define a normative model of these probabilities that takes…
A Statistical Model for Misreported Binary Outcomes in Clustered RCTs of Education Interventions

ERIC Educational Resources Information Center

Schochet, Peter Z.

2013-01-01

In randomized control trials (RCTs) of educational interventions, there is a growing literature on impact estimation methods to adjust for missing student outcome data using such methods as multiple imputation, the construction of nonresponse weights, casewise deletion, and maximum likelihood methods (see, for example, Allison, 2002; Graham, 2009;…
Adaptive variation in Pinus ponderosa from Intermountain regions. II. Middle Columbia River system

Treesearch

Gerald Rehfeldt

1986-01-01

Seedling populations were grown and compared in common environments. Statistical analyses detected genetic differences between populations for numerous traits reflecting growth potential and periodicity of shoot elongation. Multiple regression models described an adaptive landscape in which populations from low elevations have a high growth potential while those from...
Electronic Resource Expenditure and the Decline in Reference Transaction Statistics in Academic Libraries

ERIC Educational Resources Information Center

Dubnjakovic, Ana

2012-01-01

The current study investigates factors influencing increase in reference transactions in a typical week in academic libraries across the United States of America. Employing multiple regression analysis and general linear modeling, variables of interest from the "Academic Library Survey (ALS) 2006" survey (sample size 3960 academic libraries) were…
The value of decision models: Using ecologically based invasive plant management as an example

USDA-ARS?s Scientific Manuscript database

Humans have both fast and slow thought processes which influence our judgment and decision-making. The fast system is intuitive and valuable for decisions which do not require multiple steps or the application of logic or statistics. However, many decisions in natural resources are complex and req...
Changes of crop rotation in Iowa determined from the USDA-NASS cropland data layer product

USDA-ARS?s Scientific Manuscript database

Crop rotation is one of the important decisions made independently by numerous farm managers, and is a critical variable in models of crop growth and soil carbon. By combining multiple years (2001-2009) of the USDA National Agricultural Statistics Service (NASS) cropland data layer (CDL), it is pos...
Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

PubMed

Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

2012-01-01

As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single-variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p value, no matter how many loci. Copyright © 2013 S. Karger AG, Basel.
Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

PubMed Central

Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

2013-01-01

As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values. In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci. PMID:23594495
Post-processing method for wind speed ensemble forecast using wind speed and direction

NASA Astrophysics Data System (ADS)

Sofie Eide, Siri; Bjørnar Bremnes, John; Steinsland, Ingelin

2017-04-01

Statistical methods are widely applied to enhance the quality of both deterministic and ensemble NWP forecasts. In many situations, like wind speed forecasting, most of the predictive information is contained in one variable in the NWP models. However, in statistical calibration of deterministic forecasts it is often seen that including more variables can further improve forecast skill. For ensembles this is rarely taken advantage of, mainly due to that it is generally not straightforward how to include multiple variables. In this study, it is demonstrated how multiple variables can be included in Bayesian model averaging (BMA) by using a flexible regression method for estimating the conditional means. The method is applied to wind speed forecasting at 204 Norwegian stations based on wind speed and direction forecasts from the ECMWF ensemble system. At about 85 % of the sites the ensemble forecasts were improved in terms of CRPS by adding wind direction as predictor compared to only using wind speed. On average the improvements were about 5 %, but mainly for moderate to strong wind situations. For weak wind speeds adding wind direction had more or less neutral impact.
Comparing and combining biomarkers as principle surrogates for time-to-event clinical endpoints.

PubMed

Gabriel, Erin E; Sachs, Michael C; Gilbert, Peter B

2015-02-10

Principal surrogate endpoints are useful as targets for phase I and II trials. In many recent trials, multiple post-randomization biomarkers are measured. However, few statistical methods exist for comparison of or combination of biomarkers as principal surrogates, and none of these methods to our knowledge utilize time-to-event clinical endpoint information. We propose a Weibull model extension of the semi-parametric estimated maximum likelihood method that allows for the inclusion of multiple biomarkers in the same risk model as multivariate candidate principal surrogates. We propose several methods for comparing candidate principal surrogates and evaluating multivariate principal surrogates. These include the time-dependent and surrogate-dependent true and false positive fraction, the time-dependent and the integrated standardized total gain, and the cumulative distribution function of the risk difference. We illustrate the operating characteristics of our proposed methods in simulations and outline how these statistics can be used to evaluate and compare candidate principal surrogates. We use these methods to investigate candidate surrogates in the Diabetes Control and Complications Trial. Copyright © 2014 John Wiley & Sons, Ltd.
Mad cows and computer models: the U.S. response to BSE.

PubMed

Ackerman, Frank; Johnecheck, Wendy A

2008-01-01

The proportion of slaughtered cattle tested for BSE is much smaller in the U.S. than in Europe and Japan, leaving the U.S. heavily dependent on statistical models to estimate both the current prevalence and the spread of BSE. We examine the models relied on by USDA, finding that the prevalence model provides only a rough estimate, due to limited data availability. Reassuring forecasts from the model of the spread of BSE depend on the arbitrary constraint that worst-case values are assumed by only one of 17 key parameters at a time. In three of the six published scenarios with multiple worst-case parameter values, there is at least a 25% probability that BSE will spread rapidly. In public policy terms, reliance on potentially flawed models can be seen as a gamble that no serious BSE outbreak will occur. Statistical modeling at this level of abstraction, with its myriad, compound uncertainties, is no substitute for precautionary policies to protect public health against the threat of epidemics such as BSE.
The joint space-time statistics of macroweather precipitation, space-time statistical factorization and macroweather models.

PubMed

Lovejoy, S; de Lima, M I P

2015-07-01

Over the range of time scales from about 10 days to 30-100 years, in addition to the familiar weather and climate regimes, there is an intermediate "macroweather" regime characterized by negative temporal fluctuation exponents: implying that fluctuations tend to cancel each other out so that averages tend to converge. We show theoretically and numerically that macroweather precipitation can be modeled by a stochastic weather-climate model (the Climate Extended Fractionally Integrated Flux, model, CEFIF) first proposed for macroweather temperatures and we show numerically that a four parameter space-time CEFIF model can approximately reproduce eight or so empirical space-time exponents. In spite of this success, CEFIF is theoretically and numerically difficult to manage. We therefore propose a simplified stochastic model in which the temporal behavior is modeled as a fractional Gaussian noise but the spatial behaviour as a multifractal (climate) cascade: a spatial extension of the recently introduced ScaLIng Macroweather Model, SLIMM. Both the CEFIF and this spatial SLIMM model have a property often implicitly assumed by climatologists that climate statistics can be "homogenized" by normalizing them with the standard deviation of the anomalies. Physically, it means that the spatial macroweather variability corresponds to different climate zones that multiplicatively modulate the local, temporal statistics. This simplified macroweather model provides a framework for macroweather forecasting that exploits the system's long range memory and spatial correlations; for it, the forecasting problem has been solved. We test this factorization property and the model with the help of three centennial, global scale precipitation products that we analyze jointly in space and in time.
Exact Asymptotics of the Freezing Transition of a Logarithmically Correlated Random Energy Model

NASA Astrophysics Data System (ADS)

Webb, Christian

2011-12-01

We consider a logarithmically correlated random energy model, namely a model for directed polymers on a Cayley tree, which was introduced by Derrida and Spohn. We prove asymptotic properties of a generating function of the partition function of the model by studying a discrete time analogy of the KPP-equation—thus translating Bramson's work on the KPP-equation into a discrete time case. We also discuss connections to extreme value statistics of a branching random walk and a rescaled multiplicative cascade measure beyond the critical point.
Damage modeling and statistical analysis of optics damage performance in MJ-class laser systems.

PubMed

Liao, Zhi M; Raymond, B; Gaylord, J; Fallejo, R; Bude, J; Wegner, P

2014-11-17

Modeling the lifetime of a fused silica optic is described for a multiple beam, MJ-class laser system. This entails combining optic processing data along with laser shot data to account for complete history of optic processing and shot exposure. Integrating with online inspection data allows for the construction of a performance metric to describe how an optic performs with respect to the model. This methodology helps to validate the damage model as well as allows strategic planning and identifying potential hidden parameters that are affecting the optic's performance.
Statistics of multi-look AIRSAR imagery: A comparison of theory with measurements

NASA Technical Reports Server (NTRS)

Lee, J. S.; Hoppel, K. W.; Mango, S. A.

1993-01-01

The intensity and amplitude statistics of SAR images, such as L-Band HH for SEASAT and SIR-B, and C-Band VV for ERS-1 have been extensively investigated for various terrain, ground cover and ocean surfaces. Less well-known are the statistics between multiple channels of polarimetric of interferometric SAR's, especially for the multi-look processed data. In this paper, we investigate the probability density functions (PDF's) of phase differences, the magnitude of complex products and the amplitude ratios, between polarization channels (i.e. HH, HV, and VV) using 1-look and 4-look AIRSAR polarimetric data. Measured histograms are compared with theoretical PDF's which were recently derived based on a complex Gaussian model.

Legitimate Techniques for Improving the R-Square and Related Statistics of a Multiple Regression Model

DTIC Science & Technology

1981-01-01

explanatory variable has been ommitted. Ramsey (1974) has developed a rather interesting test for detecting specification errors using estimates of the...Peter. (1979) A Guide to Econometrics , Cambridge, MA: The MIT Press. Ramsey , J.B. (1974), "Classical Model Selection Through Specification Error... Tests ," in P. Zarembka, Ed. Frontiers in Econometrics , New York: Academia Press. Theil, Henri. (1971), Principles of Econometrics , New York: John Wiley
Time Series Model Identification by Estimating Information, Memory, and Quantiles.

DTIC Science & Technology

1983-07-01

Standards, Sect. D, 68D, 937-951. Parzen, Emanuel (1969) "Multiple time series modeling" Multivariate Analysis - II, edited by P. Krishnaiah , Academic... Krishnaiah , North Holland: Amsterdam, 283-295. Parzen, Emanuel (1979) "Forecasting and Whitening Filter Estimation" TIMS Studies in the Management...principle. Applications of Statistics, P. R. Krishnaiah , ed. North Holland: Amsterdam, 27-41. Box, G. E. P. and Jenkins, G. M. (1970) Time Series Analysis
ICD-11 and DSM-5 personality trait domains capture categorical personality disorders: Finding a common ground.

PubMed

Bach, Bo; Sellbom, Martin; Skjernov, Mathias; Simonsen, Erik

2018-05-01

The five personality disorder trait domains in the proposed International Classification of Diseases, 11th edition and the Diagnostic and Statistical Manual of Mental Disorders, 5th edition are comparable in terms of Negative Affectivity, Detachment, Antagonism/Dissociality and Disinhibition. However, the International Classification of Diseases, 11th edition model includes a separate domain of Anankastia, whereas the Diagnostic and Statistical Manual of Mental Disorders, 5th edition model includes an additional domain of Psychoticism. This study examined associations of International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition trait domains, simultaneously, with categorical personality disorders. Psychiatric outpatients ( N = 226) were administered the Structured Clinical Interview for DSM-IV Axis II Personality Disorders Interview and the Personality Inventory for DSM-5. International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition trait domain scores were obtained using pertinent scoring algorithms for the Personality Inventory for DSM-5. Associations between categorical personality disorders and trait domains were examined using correlation and multiple regression analyses. Both the International Classification of Diseases, 11th edition and the Diagnostic and Statistical Manual of Mental Disorders, 5th edition domain models showed relevant continuity with categorical personality disorders and captured a substantial amount of their information. As expected, the International Classification of Diseases, 11th edition model was superior in capturing obsessive-compulsive personality disorder, whereas the Diagnostic and Statistical Manual of Mental Disorders, 5th edition model was superior in capturing schizotypal personality disorder. These preliminary findings suggest that little information is 'lost' in a transition to trait domain models and potentially adds to narrowing the gap between Diagnostic and Statistical Manual of Mental Disorders, 5th edition and the proposed International Classification of Diseases, 11th edition model. Accordingly, the International Classification of Diseases, 11th edition and Diagnostic and Statistical Manual of Mental Disorders, 5th edition domain models may be used to delineate one another as well as features of familiar categorical personality disorder types. A preliminary category-to-domain 'cross walk' is provided in the article.
Merging information from multi-model flood projections in a hierarchical Bayesian framework

NASA Astrophysics Data System (ADS)

Le Vine, Nataliya

2016-04-01

Multi-model ensembles are becoming widely accepted for flood frequency change analysis. The use of multiple models results in large uncertainty around estimates of flood magnitudes, due to both uncertainty in model selection and natural variability of river flow. The challenge is therefore to extract the most meaningful signal from the multi-model predictions, accounting for both model quality and uncertainties in individual model estimates. The study demonstrates the potential of a recently proposed hierarchical Bayesian approach to combine information from multiple models. The approach facilitates explicit treatment of shared multi-model discrepancy as well as the probabilistic nature of the flood estimates, by treating the available models as a sample from a hypothetical complete (but unobserved) set of models. The advantages of the approach are: 1) to insure an adequate 'baseline' conditions with which to compare future changes; 2) to reduce flood estimate uncertainty; 3) to maximize use of statistical information in circumstances where multiple weak predictions individually lack power, but collectively provide meaningful information; 4) to adjust multi-model consistency criteria when model biases are large; and 5) to explicitly consider the influence of the (model performance) stationarity assumption. Moreover, the analysis indicates that reducing shared model discrepancy is the key to further reduction of uncertainty in the flood frequency analysis. The findings are of value regarding how conclusions about changing exposure to flooding are drawn, and to flood frequency change attribution studies.
Backscattering from a randomly rough dielectric surface

NASA Technical Reports Server (NTRS)

Fung, Adrian K.; Li, Zongqian; Chen, K. S.

1992-01-01

A backscattering model for scattering from a randomly rough dielectric surface is developed based on an approximate solution of a pair of integral equations for the tangential surface fields. Both like and cross-polarized scattering coefficients are obtained. It is found that the like polarized scattering coefficients contain two types of terms: single scattering terms and multiple scattering terms. The single scattering terms in like polarized scattering are shown to reduce the first-order solutions derived from the small perturbation method when the roughness parameters satisfy the slightly rough conditions. When surface roughnesses are large but the surface slope is small, only a single scattering term corresponding to the standard Kirchhoff model is significant. If the surface slope is large, the multiple scattering term will also be significant. The cross-polarized backscattering coefficients satisfy reciprocity and contain only multiple scattering terms. The difference between vertical and horizontal scattering coefficients is found to increase with the dielectric constant and is generally smaller than that predicted by the first-order small perturbation model. Good agreements are obtained between this model and measurements from statistically known surfaces.
Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.

PubMed

Hu, Jianhua; Wright, Fred A

2007-03-01

The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.
Scaling in geology: landforms and earthquakes.

PubMed Central

Turcotte, D L

1995-01-01

Landforms and earthquakes appear to be extremely complex; yet, there is order in the complexity. Both satisfy fractal statistics in a variety of ways. A basic question is whether the fractal behavior is due to scale invariance or is the signature of a broadly applicable class of physical processes. Both landscape evolution and regional seismicity appear to be examples of self-organized critical phenomena. A variety of statistical models have been proposed to model landforms, including diffusion-limited aggregation, self-avoiding percolation, and cellular automata. Many authors have studied the behavior of multiple slider-block models, both in terms of the rupture of a fault to generate an earthquake and in terms of the interactions between faults associated with regional seismicity. The slider-block models exhibit a remarkably rich spectrum of behavior; two slider blocks can exhibit low-order chaotic behavior. Large numbers of slider blocks clearly exhibit self-organized critical behavior. Images Fig. 6 PMID:11607562
A 3D object-based model to simulate highly-heterogeneous, coarse, braided river deposits

NASA Astrophysics Data System (ADS)

Huber, E.; Huggenberger, P.; Caers, J.

2016-12-01

There is a critical need in hydrogeological modeling for geologically more realistic representation of the subsurface. Indeed, widely-used representations of the subsurface heterogeneity based on smooth basis functions such as cokriging or the pilot-point approach fail at reproducing the connectivity of high permeable geological structures that control subsurface solute transport. To realistically model the connectivity of high permeable structures of coarse, braided river deposits, multiple-point statistics and object-based models are promising alternatives. We therefore propose a new object-based model that, according to a sedimentological model, mimics the dominant processes of floodplain dynamics. Contrarily to existing models, this object-based model possesses the following properties: (1) it is consistent with field observations (outcrops, ground-penetrating radar data, etc.), (2) it allows different sedimentological dynamics to be modeled that result in different subsurface heterogeneity patterns, (3) it is light in memory and computationally fast, and (4) it can be conditioned to geophysical data. In this model, the main sedimentological elements (scour fills with open-framework-bimodal gravel cross-beds, gravel sheet deposits, open-framework and sand lenses) and their internal structures are described by geometrical objects. Several spatial distributions are proposed that allow to simulate the horizontal position of the objects on the floodplain as well as the net rate of sediment deposition. The model is grid-independent and any vertical section can be computed algebraically. Furthermore, model realizations can serve as training images for multiple-point statistics. The significance of this model is shown by its impact on the subsurface flow distribution that strongly depends on the sedimentological dynamics modeled. The code will be provided as a free and open-source R-package.
Multiple Point Statistics algorithm based on direct sampling and multi-resolution images

NASA Astrophysics Data System (ADS)

Julien, S.; Renard, P.; Chugunova, T.

2017-12-01

Multiple Point Statistics (MPS) has become popular for more than one decade in Earth Sciences, because these methods allow to generate random fields reproducing highly complex spatial features given in a conceptual model, the training image, while classical geostatistics techniques based on bi-point statistics (covariance or variogram) fail to generate realistic models. Among MPS methods, the direct sampling consists in borrowing patterns from the training image to populate a simulation grid. This latter is sequentially filled by visiting each of these nodes in a random order, and then the patterns, whose the number of nodes is fixed, become narrower during the simulation process, as the simulation grid is more densely informed. Hence, large scale structures are caught in the beginning of the simulation and small scale ones in the end. However, MPS may mix spatial characteristics distinguishable at different scales in the training image, and then loose the spatial arrangement of different structures. To overcome this limitation, we propose to perform MPS simulation using a decomposition of the training image in a set of images at multiple resolutions. Applying a Gaussian kernel onto the training image (convolution) results in a lower resolution image, and iterating this process, a pyramid of images depicting fewer details at each level is built, as it can be done in image processing for example to lighten the space storage of a photography. The direct sampling is then employed to simulate the lowest resolution level, and then to simulate each level, up to the finest resolution, conditioned to the level one rank coarser. This scheme helps reproduce the spatial structures at any scale of the training image and then generate more realistic models. We illustrate the method with aerial photographies (satellite images) and natural textures. Indeed, these kinds of images often display typical structures at different scales and are well-suited for MPS simulation techniques.
Multiple-Solution Problems in a Statistics Classroom: An Example

ERIC Educational Resources Information Center

Chu, Chi Wing; Chan, Kevin L. T.; Chan, Wai-Sum; Kwong, Koon-Shing

2017-01-01

The mathematics education literature shows that encouraging students to develop multiple solutions for given problems has a positive effect on students' understanding and creativity. In this paper, we present an example of multiple-solution problems in statistics involving a set of non-traditional dice. In particular, we consider the exact…
ACIRF user's guide: Theory and examples

NASA Astrophysics Data System (ADS)

Dana, Roger A.

1989-12-01

Design and evaluation of radio frequency systems that must operate through ionospheric disturbances resulting from high altitude nuclear detonations requires an accurate channel model. This model must include the effects of high gain antennas that may be used to receive the signals. Such a model can then be used to construct realizations of the received signal for use in digital simulations of trans-ionospheric links or for use in hardware channel simulators. The FORTRAN channel model ACIRF (Antenna Channel Impulse Response Function) generates random realizations of the impulse response function at the outputs of multiple antennas. This user's guide describes the FORTRAN program ACIRF (version 2.0) that generates realizations of channel impulse response functions at the outputs of multiple antennas with arbitrary beamwidths, pointing angles, and relatives positions. This channel model is valid under strong scattering conditions when Rayleigh fading statistics apply. Both frozen-in and turbulent models for the temporal fluctuations are included in this version of ACIRF. The theory of the channel model is described and several examples are given.
Predicting future protection of respirator users: Statistical approaches and practical implications.

PubMed

Hu, Chengcheng; Harber, Philip; Su, Jing

2016-01-01

The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.
Robust Combining of Disparate Classifiers Through Order Statistics

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Ghosh, Joydeep

2001-01-01

Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices

NASA Astrophysics Data System (ADS)

Passemier, Damien; McKay, Matthew R.; Chen, Yang

2015-07-01

Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.
Statistical Selection of Biological Models for Genome-Wide Association Analyses.

PubMed

Bi, Wenjian; Kang, Guolian; Pounds, Stanley B

2018-05-24

Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.
graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture.

PubMed

Chung, Dongjun; Kim, Hang J; Zhao, Hongyu

2017-02-01

Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.
A Comparison of Approximation Modeling Techniques: Polynomial Versus Interpolating Models

NASA Technical Reports Server (NTRS)

Giunta, Anthony A.; Watson, Layne T.

1998-01-01

Two methods of creating approximation models are compared through the calculation of the modeling accuracy on test problems involving one, five, and ten independent variables. Here, the test problems are representative of the modeling challenges typically encountered in realistic engineering optimization problems. The first approximation model is a quadratic polynomial created using the method of least squares. This type of polynomial model has seen considerable use in recent engineering optimization studies due to its computational simplicity and ease of use. However, quadratic polynomial models may be of limited accuracy when the response data to be modeled have multiple local extrema. The second approximation model employs an interpolation scheme known as kriging developed in the fields of spatial statistics and geostatistics. This class of interpolating model has the flexibility to model response data with multiple local extrema. However, this flexibility is obtained at an increase in computational expense and a decrease in ease of use. The intent of this study is to provide an initial exploration of the accuracy and modeling capabilities of these two approximation methods.
Dose response explorer: an integrated open-source tool for exploring and modelling radiotherapy dose volume outcome relationships

NASA Astrophysics Data System (ADS)

El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.

2006-11-01

Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.
Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08

PubMed Central

Casey, Martin F.; Neidell, Matthew

2013-01-01

Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205
Statistical experiments using the multiple regression research for prediction of proper hardness in areas of phosphorus cast-iron brake shoes manufacturing

NASA Astrophysics Data System (ADS)

Kiss, I.; Cioată, V. G.; Ratiu, S. A.; Rackov, M.; Penčić, M.

2018-01-01

Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. This article focuses on expressing the multiple linear regression model related to the hardness assurance by the chemical composition of the phosphorous cast irons destined to the brake shoes, having in view that the regression coefficients will illustrate the unrelated contributions of each independent variable towards predicting the dependent variable. In order to settle the multiple correlations between the hardness of the cast-iron brake shoes, and their chemical compositions several regression equations has been proposed. Is searched a mathematical solution which can determine the optimum chemical composition for the hardness desirable values. Starting from the above-mentioned affirmations two new statistical experiments are effectuated related to the values of Phosphorus [P], Manganese [Mn] and Silicon [Si]. Therefore, the regression equations, which describe the mathematical dependency between the above-mentioned elements and the hardness, are determined. As result, several correlation charts will be revealed.

A quantitative model of application slow-down in multi-resource shared systems

DOE PAGES

Lim, Seung-Hwan; Kim, Youngjae

2016-12-26

Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing these benefits comes at a price-resource contention among jobs increases job completion time. In this study, we analyze slow-downs of jobs due to contention for multiple resources in a system; referred to as dilation factor. We observe that multiple-resource contention creates non-linear dilation factors of jobs. From this observation, we establish a general quantitative model for dilation factors of jobs in multi-resource systems. A job ismore » characterized by a vector-valued loading statistics and dilation factors of a job set are given by a quadratic function of their loading vectors. We demonstrate how to systematically characterize a job, maintain the data structure to calculate the dilation factor (loading matrix), and calculate the dilation factor of each job. We validate the accuracy of the model with multiple processes running on a native Linux server, virtualized servers, and with multiple MapReduce workloads co-scheduled in a cluster. Evaluation with measured data shows that the D-factor model has an error margin of less than 16%. We extended the D-factor model to capture the slow-down of applications when multiple identical resources exist such as multi-core environments and multi-disks environments. Finally, validation results of the extended D-factor model with HPC checkpoint applications on the parallel file systems show that D-factor accurately captures the slow down of concurrent applications in such environments.« less
A quantitative model of application slow-down in multi-resource shared systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lim, Seung-Hwan; Kim, Youngjae

Scheduling multiple jobs onto a platform enhances system utilization by sharing resources. The benefits from higher resource utilization include reduced cost to construct, operate, and maintain a system, which often include energy consumption. Maximizing these benefits comes at a price-resource contention among jobs increases job completion time. In this study, we analyze slow-downs of jobs due to contention for multiple resources in a system; referred to as dilation factor. We observe that multiple-resource contention creates non-linear dilation factors of jobs. From this observation, we establish a general quantitative model for dilation factors of jobs in multi-resource systems. A job ismore » characterized by a vector-valued loading statistics and dilation factors of a job set are given by a quadratic function of their loading vectors. We demonstrate how to systematically characterize a job, maintain the data structure to calculate the dilation factor (loading matrix), and calculate the dilation factor of each job. We validate the accuracy of the model with multiple processes running on a native Linux server, virtualized servers, and with multiple MapReduce workloads co-scheduled in a cluster. Evaluation with measured data shows that the D-factor model has an error margin of less than 16%. We extended the D-factor model to capture the slow-down of applications when multiple identical resources exist such as multi-core environments and multi-disks environments. Finally, validation results of the extended D-factor model with HPC checkpoint applications on the parallel file systems show that D-factor accurately captures the slow down of concurrent applications in such environments.« less
Simplified estimation of age-specific reference intervals for skewed data.

PubMed

Wright, E M; Royston, P

1997-12-30

Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.
Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays

NASA Astrophysics Data System (ADS)

Sibatov, R. T.

2011-08-01

A new statistical model of the charge transport in colloidal quantum dot arrays is proposed. It takes into account Coulomb blockade forbidding multiple occupancy of nanocrystals and the influence of energetic disorder of interdot space. The model explains power-law current transients and the presence of the memory effect. The fractional differential analogue of the Ohm law is found phenomenologically for nanocrystal arrays. The model combines ideas that were considered as conflicting by other authors: the Scher-Montroll idea about the power-law distribution of waiting times in localized states for disordered semiconductors is applied taking into account Coulomb blockade; Novikov's condition about the asymptotic power-law distribution of time intervals between successful current pulses in conduction channels is fulfilled; and the carrier injection blocking predicted by Ginger and Greenham (2000 J. Appl. Phys. 87 1361) takes place.
An Integrative Account of Constraints on Cross-Situational Learning

PubMed Central

Yurovsky, Daniel; Frank, Michael C.

2015-01-01

Word-object co-occurrence statistics are a powerful information source for vocabulary learning, but there is considerable debate about how learners actually use them. While some theories hold that learners accumulate graded, statistical evidence about multiple referents for each word, others suggest that they track only a single candidate referent. In two large-scale experiments, we show that neither account is sufficient: Cross-situational learning involves elements of both. Further, the empirical data are captured by a computational model that formalizes how memory and attention interact with co-occurrence tracking. Together, the data and model unify opposing positions in a complex debate and underscore the value of understanding the interaction between computational and algorithmic levels of explanation. PMID:26302052
Desensitized Optimal Filtering and Sensor Fusion Toolkit

NASA Technical Reports Server (NTRS)

Karlgaard, Christopher D.

2015-01-01

Analytical Mechanics Associates, Inc., has developed a software toolkit that filters and processes navigational data from multiple sensor sources. A key component of the toolkit is a trajectory optimization technique that reduces the sensitivity of Kalman filters with respect to model parameter uncertainties. The sensor fusion toolkit also integrates recent advances in adaptive Kalman and sigma-point filters for non-Gaussian problems with error statistics. This Phase II effort provides new filtering and sensor fusion techniques in a convenient package that can be used as a stand-alone application for ground support and/or onboard use. Its modular architecture enables ready integration with existing tools. A suite of sensor models and noise distribution as well as Monte Carlo analysis capability are included to enable statistical performance evaluations.
Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits

PubMed Central

Zeng, Ping; Mukherjee, Sayan; Zhou, Xiang

2017-01-01

Epistasis, commonly defined as the interaction between multiple genes, is an important genetic component underlying phenotypic variation. Many statistical methods have been developed to model and identify epistatic interactions between genetic variants. However, because of the large combinatorial search space of interactions, most epistasis mapping methods face enormous computational challenges and often suffer from low statistical power due to multiple test correction. Here, we present a novel, alternative strategy for mapping epistasis: instead of directly identifying individual pairwise or higher-order interactions, we focus on mapping variants that have non-zero marginal epistatic effects—the combined pairwise interaction effects between a given variant and all other variants. By testing marginal epistatic effects, we can identify candidate variants that are involved in epistasis without the need to identify the exact partners with which the variants interact, thus potentially alleviating much of the statistical and computational burden associated with standard epistatic mapping procedures. Our method is based on a variance component model, and relies on a recently developed variance component estimation method for efficient parameter inference and p-value computation. We refer to our method as the “MArginal ePIstasis Test”, or MAPIT. With simulations, we show how MAPIT can be used to estimate and test marginal epistatic effects, produce calibrated test statistics under the null, and facilitate the detection of pairwise epistatic interactions. We further illustrate the benefits of MAPIT in a QTL mapping study by analyzing the gene expression data of over 400 individuals from the GEUVADIS consortium. PMID:28746338
Two approaches to incorporate clinical data uncertainty into multiple criteria decision analysis for benefit-risk assessment of medicinal products.

PubMed

Wen, Shihua; Zhang, Lanju; Yang, Bo

2014-07-01

The Problem formulation, Objectives, Alternatives, Consequences, Trade-offs, Uncertainties, Risk attitude, and Linked decisions (PrOACT-URL) framework and multiple criteria decision analysis (MCDA) have been recommended by the European Medicines Agency for structured benefit-risk assessment of medicinal products undergoing regulatory review. The objective of this article was to provide solutions to incorporate the uncertainty from clinical data into the MCDA model when evaluating the overall benefit-risk profiles among different treatment options. Two statistical approaches, the δ-method approach and the Monte-Carlo approach, were proposed to construct the confidence interval of the overall benefit-risk score from the MCDA model as well as other probabilistic measures for comparing the benefit-risk profiles between treatment options. Both approaches can incorporate the correlation structure between clinical parameters (criteria) in the MCDA model and are straightforward to implement. The two proposed approaches were applied to a case study to evaluate the benefit-risk profile of an add-on therapy for rheumatoid arthritis (drug X) relative to placebo. It demonstrated a straightforward way to quantify the impact of the uncertainty from clinical data to the benefit-risk assessment and enabled statistical inference on evaluating the overall benefit-risk profiles among different treatment options. The δ-method approach provides a closed form to quantify the variability of the overall benefit-risk score in the MCDA model, whereas the Monte-Carlo approach is more computationally intensive but can yield its true sampling distribution for statistical inference. The obtained confidence intervals and other probabilistic measures from the two approaches enhance the benefit-risk decision making of medicinal products. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

PubMed Central

Chu, Annie; Cui, Jenny; Dinov, Ivo D.

2011-01-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Parallel computing for automated model calibration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burke, John S.; Danielson, Gary R.; Schulz, Douglas A.

2002-07-29

Natural resources model calibration is a significant burden on computing and staff resources in modeling efforts. Most assessments must consider multiple calibration objectives (for example magnitude and timing of stream flow peak). An automated calibration process that allows real time updating of data/models, allowing scientists to focus effort on improving models is needed. We are in the process of building a fully featured multi objective calibration tool capable of processing multiple models cheaply and efficiently using null cycle computing. Our parallel processing and calibration software routines have been generically, but our focus has been on natural resources model calibration. Somore » far, the natural resources models have been friendly to parallel calibration efforts in that they require no inter-process communication, only need a small amount of input data and only output a small amount of statistical information for each calibration run. A typical auto calibration run might involve running a model 10,000 times with a variety of input parameters and summary statistical output. In the past model calibration has been done against individual models for each data set. The individual model runs are relatively fast, ranging from seconds to minutes. The process was run on a single computer using a simple iterative process. We have completed two Auto Calibration prototypes and are currently designing a more feature rich tool. Our prototypes have focused on running the calibration in a distributed computing cross platform environment. They allow incorporation of?smart? calibration parameter generation (using artificial intelligence processing techniques). Null cycle computing similar to SETI@Home has also been a focus of our efforts. This paper details the design of the latest prototype and discusses our plans for the next revision of the software.« less
Identification and Correction of Additive and Multiplicative Spatial Biases in Experimental High-Throughput Screening.

PubMed

Mazoure, Bogdan; Caraus, Iurie; Nadon, Robert; Makarenkov, Vladimir

2018-06-01

Data generated by high-throughput screening (HTS) technologies are prone to spatial bias. Traditionally, bias correction methods used in HTS assume either a simple additive or, more recently, a simple multiplicative spatial bias model. These models do not, however, always provide an accurate correction of measurements in wells located at the intersection of rows and columns affected by spatial bias. The measurements in these wells depend on the nature of interaction between the involved biases. Here, we propose two novel additive and two novel multiplicative spatial bias models accounting for different types of bias interactions. We describe a statistical procedure that allows for detecting and removing different types of additive and multiplicative spatial biases from multiwell plates. We show how this procedure can be applied by analyzing data generated by the four HTS technologies (homogeneous, microorganism, cell-based, and gene expression HTS), the three high-content screening (HCS) technologies (area, intensity, and cell-count HCS), and the only small-molecule microarray technology available in the ChemBank small-molecule screening database. The proposed methods are included in the AssayCorrector program, implemented in R, and available on CRAN.
MixGF: spectral probabilities for mixture spectra from more than one peptide.

PubMed

Wang, Jian; Bourne, Philip E; Bandeira, Nuno

2014-12-01

In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
MixGF: Spectral Probabilities for Mixture Spectra from more than One Peptide*

PubMed Central

Wang, Jian; Bourne, Philip E.; Bandeira, Nuno

2014-01-01

In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30–390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra. PMID:25225354
A multiplicative process for generating a beta-like survival function with application to the UK 2016 EU referendum results

NASA Astrophysics Data System (ADS)

Fenner, Trevor; Kaufmann, Eric; Levene, Mark; Loizou, George

Human dynamics and sociophysics suggest statistical models that may explain and provide us with better insight into social phenomena. Contextual and selection effects tend to produce extreme values in the tails of rank-ordered distributions of both census data and district-level election outcomes. Models that account for this nonlinearity generally outperform linear models. Fitting nonlinear functions based on rank-ordering census and election data therefore improves the fit of aggregate voting models. This may help improve ecological inference, as well as election forecasting in majoritarian systems. We propose a generative multiplicative decrease model that gives rise to a rank-order distribution and facilitates the analysis of the recent UK EU referendum results. We supply empirical evidence that the beta-like survival function, which can be generated directly from our model, is a close fit to the referendum results, and also may have predictive value when covariate data are available.
Models for predicting the mass of lime fruits by some engineering properties.

PubMed

Miraei Ashtiani, Seyed-Hassan; Baradaran Motie, Jalal; Emadi, Bagher; Aghkhani, Mohammad-Hosein

2014-11-01

Grading fruits based on mass is important in packaging and reduces the waste, also increases the marketing value of agricultural produce. The aim of this study was mass modeling of two major cultivars of Iranian limes based on engineering attributes. Models were classified into three: 1-Single and multiple variable regressions of lime mass and dimensional characteristics. 2-Single and multiple variable regressions of lime mass and projected areas. 3-Single regression of lime mass based on its actual volume and calculated volume assumed as ellipsoid and prolate spheroid shapes. All properties considered in the current study were found to be statistically significant (ρ < 0.01). The results indicated that mass modeling of lime based on minor diameter and first projected area are the most appropriate models in the first and the second classifications, respectively. In third classification, the best model was obtained on the basis of the prolate spheroid volume. It was finally concluded that the suitable grading system of lime mass is based on prolate spheroid volume.
Accounting for Multiple Births in Neonatal and Perinatal Trials: Systematic Review and Case Study

PubMed Central

Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

2010-01-01

Objectives To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births. To explore the sensitivity of an actual trial to several analytic approaches to multiples. Methods A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The NO CLD trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using non-clustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. Results In the systematic review, most studies did not describe the randomization of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (p<0.01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. Conclusions The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. PMID:19969305
Accounting for multiple births in neonatal and perinatal trials: systematic review and case study.

PubMed

Hibbs, Anna Maria; Black, Dennis; Palermo, Lisa; Cnaan, Avital; Luan, Xianqun; Truog, William E; Walsh, Michele C; Ballard, Roberta A

2010-02-01

To determine the prevalence in the neonatal literature of statistical approaches accounting for the unique clustering patterns of multiple births and to explore the sensitivity of an actual trial to several analytic approaches to multiples. A systematic review of recent perinatal trials assessed the prevalence of studies accounting for clustering of multiples. The Nitric Oxide to Prevent Chronic Lung Disease (NO CLD) trial served as a case study of the sensitivity of the outcome to several statistical strategies. We calculated odds ratios using nonclustered (logistic regression) and clustered (generalized estimating equations, multiple outputation) analyses. In the systematic review, most studies did not describe the random assignment of twins and did not account for clustering. Of those studies that did, exclusion of multiples and generalized estimating equations were the most common strategies. The NO CLD study included 84 infants with a sibling enrolled in the study. Multiples were more likely than singletons to be white and were born to older mothers (P < .01). Analyses that accounted for clustering were statistically significant; analyses assuming independence were not. The statistical approach to multiples can influence the odds ratio and width of confidence intervals, thereby affecting the interpretation of a study outcome. A minority of perinatal studies address this issue. Copyright 2010 Mosby, Inc. All rights reserved.
Turning the Potential Liability of Large Enrollment Laboratory Science Courses into an Asset

ERIC Educational Resources Information Center

Johnson, Dan; Levy, Foster; Karsai, Istvan; Stroud, Kimberly

2006-01-01

Data sharing among multiple lab sections increases statistical power of data analyses and informs student-generated hypotheses. We describe how to collect, organize, and manage data to support replicate and rolling inquiry models, with three illustrative examples of activities from a population-level biology course for science majors. (Contains 1…
Extreme Quantile Estimation in Binary Response Models

DTIC Science & Technology

1990-03-01

in Cancer Research," Biometria , VoL 66, pp. 307-316. Hsi, B.P. [1969], ’The Multiple Sample Up-and-Down Method in Bioassay," Journal of the American...New Method of Estimation," Biometria , VoL 53, pp. 439-454. Wetherill, G.B. [1976], Sequential Methods in Statistics, London: Chapman and Hall. Wu, C.FJ
Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets

DTIC Science & Technology

2017-07-01

principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for

Survivability Versus Time

NASA Technical Reports Server (NTRS)

Joyner, James J., Sr.

2014-01-01

Develop Survivability vs Time Model as a decision-evaluation tool to assess various emergency egress methods used at Launch Complex 39B (LC 39B) and in the Vehicle Assembly Building (VAB) on NASAs Kennedy Space Center. For each hazard scenario, develop probability distributions to address statistical uncertainty resulting in survivability plots over time and composite survivability plots encompassing multiple hazard scenarios.
The Integration of Environmental Education and Communicative English Based on Multiple Intelligence Theory for Students in Extended Schools

ERIC Educational Resources Information Center

Sangsongfa, Chalothorn; Rawang, Wee

2016-01-01

Research and Development (R&D) was used with 364 students, 44 teachers and 3 school directors before designing innovation, and evaluating the model efficiency with 30 voluntary students by Action Research (AR). The research used questionnaire, interview form and innovation efficiency evaluation form, and statistically analyzed by percentage,…
A Comparison of Statistical Models for Calculating Reliability of the Hoffmann Reflex

ERIC Educational Resources Information Center

Christie, A.; Kamen, G.; Boucher, Jean P.; Inglis, J. Greig; Gabriel, David A.

2010-01-01

The Hoffmann reflex is obtained through surface electromyographic recordings, and it is one of the most common neurophysiological techniques in exercise science. Measurement and evaluation of the peak-to-peak amplitude of the Hoffmann reflex has been guided by the observation that it is a variable response that requires multiple trials to obtain a…
Mapping fuels at multiple scales: landscape application of the fuel characteristic classification system.

Treesearch

D. McKenzie; C.L. Raymond; L.-K.B. Kellogg; R.A. Norheim; A.G. Andreu; A.C. Bayard; K.E. Kopper; E. Elman

2007-01-01

Fuel mapping is a complex and often multidisciplinary process, involving remote sensing, ground-based validation, statistical modeling, and knowledge-based systems. The scale and resolution of fuel mapping depend both on objectives and availability of spatial data layers. We demonstrate use of the Fuel Characteristic Classification System (FCCS) for fuel mapping at two...
Commentary: Gene by Environment Interplay and Psychopathology--In Search of a Paradigm

ERIC Educational Resources Information Center

Nigg, Joel T.

2013-01-01

The articles in this Special Issue (SI) extend research on G×E in multiple ways, showing the growing importance of specifying kinds of G×E models (e.g., bioecological, susceptibility, stress-diathesis), incorporation of sophisticated ways of measuring types of G×E correlations (rGE), checking effects of statistical artifact, exemplifying an…
Pathways to Identity: Aiding Law Enforcement in Identification Tasks With Visual Analytics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bruce, Joseph R.; Scholtz, Jean; Hodges, Duncan

The nature of identity has changed dramatically in recent years, and has grown in complexity. Identities are defined in multiple domains: biological and psychological elements strongly contribute, but also biographical and cyber elements are necessary to complete the picture. Law enforcement is beginning to adjust to these changes, recognizing its importance in criminal justice. The SuperIdentity project seeks to aid law enforcement officials in their identification tasks through research of techniques for discovering identity traits, generation of statistical models of identity and analysis of identity traits through visualization. We present use cases compiled through user interviews in multiple fields, includingmore » law enforcement, as well as the modeling and visualization tools design to aid in those use cases.« less
Pathways to Identity. Using Visualization to Aid Law Enforcement in Identification Tasks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bruce, Joseph R.; Scholtz, Jean; Hodges, Duncan

The nature of identity has changed dramatically in recent years and has grown in complexity. Identities are defined in multiple domains: biological and psychological elements strongly contribute, but biographical and cyber elements also are necessary to complete the picture. Law enforcement is beginning to adjust to these changes, recognizing identity’s importance in criminal justice. The SuperIdentity project seeks to aid law enforcement officials in their identification tasks through research of techniques for discovering identity traits, generation of statistical models of identity and analysis of identity traits through visualization. We present use cases compiled through user interviews in multiple fields, includingmore » law enforcement, and describe the modeling and visualization tools design to aid in those use cases.« less
Gravitational lensing by a smoothly variable surface mass density

NASA Technical Reports Server (NTRS)

Paczynski, Bohdan; Wambsganss, Joachim

1989-01-01

The statistical properties of gravitational lensing due to smooth but nonuniform distributions of matter are considered. It is found that a majority of triple images had a parity characteristic for 'shear-induced' lensing. Almost all cases of triple or multiple imaging were associated with large surface density enhancements, and lensing objects were present between the images. Thus, the observed gravitational lens candidates for which no lensing object has been detected between the images are unlikely to be a result of asymmetric distribution of mass external to the image circle. In a model with smoothly variable surface mass density, moderately and highly amplified images tended to be single rather than multiple. An opposite trend was found in models which had singularities in the surface mass distribution.
Exact goodness-of-fit tests for Markov chains.

PubMed

Besag, J; Mondal, D

2013-06-01

Goodness-of-fit tests are useful in assessing whether a statistical model is consistent with available data. However, the usual χ² asymptotics often fail, either because of the paucity of the data or because a nonstandard test statistic is of interest. In this article, we describe exact goodness-of-fit tests for first- and higher order Markov chains, with particular attention given to time-reversible ones. The tests are obtained by conditioning on the sufficient statistics for the transition probabilities and are implemented by simple Monte Carlo sampling or by Markov chain Monte Carlo. They apply both to single and to multiple sequences and allow a free choice of test statistic. Three examples are given. The first concerns multiple sequences of dry and wet January days for the years 1948-1983 at Snoqualmie Falls, Washington State, and suggests that standard analysis may be misleading. The second one is for a four-state DNA sequence and lends support to the original conclusion that a second-order Markov chain provides an adequate fit to the data. The last one is six-state atomistic data arising in molecular conformational dynamics simulation of solvated alanine dipeptide and points to strong evidence against a first-order reversible Markov chain at 6 picosecond time steps. © 2013, The International Biometric Society.
"Suicide shall cease to be a crime": suicide and undetermined death trends 1970-2000 before and after the decriminalization of suicide in Ireland 1993.

PubMed

Osman, Mugtaba; Parnell, Andrew C; Haley, Clifford

2017-02-01

Suicide is criminalized in more than 100 countries around the world. A dearth of research exists into the effect of suicide legislation on suicide rates and available statistics are mixed. This study investigates 10,353 suicide deaths in Ireland that took place between 1970 and 2000. Irish 1970-2000 annual suicide data were obtained from the Central Statistics Office and modelled via a negative binomial regression approach. We examined the effect of suicide legislation on different age groups and on both sexes. We used Bonferroni correction for multiple modelling. Statistical analysis was performed using the R statistical package version 3.1.2. The coefficient for the effect of suicide act on overall suicide deaths was -9.094 (95 % confidence interval (CI) -34.086 to 15.899), statistically non-significant (p = 0.476). The coefficient for the effect suicide act on undetermined deaths was statistically significant (p < 0.001) and was estimated to be -644.4 (95 % CI -818.6 to -469.9). The results of our study indicate that legalization of suicide is not associated with a significant increase in subsequent suicide deaths. However, undetermined death verdict rates have significantly dropped following legalization of suicide.
Reforming the Military Health Care System

DTIC Science & Technology

1988-01-01

Population Model and its Application ," International Journal of Health Services, vol. 10, no. 4 (1980). 7. "Understanding Variations in the Use of... Financial Management (November 1986), pp. 26- 34. 21. Based on the following multiple regression equation: OP/NOR= 0.51 + 0.35x(POP/NOR)-6.84x(CIV/NORxPOP) (t...Military Beneficiary Health Care Survey 95 B Actual and Expected Admission Rates 99 C The Statistical Model of Family Use 103 D The Capitation Budgeting
Correlation and simple linear regression.

PubMed

Eberly, Lynn E

2007-01-01

This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

PubMed Central

2013-01-01

Background In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level. Methods For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest pvalue). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33–38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable. Results The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia. Conclusion The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN. PMID:23758852
Empirical Reference Distributions for Networks of Different Size

PubMed Central

Smith, Anna; Calder, Catherine A.; Browning, Christopher R.

2016-01-01

Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although “normalized” versions of some network statistics exist, we demonstrate via simulation why direct comparison is often inappropriate. We consider normalizing network statistics relative to a simple fully parameterized reference distribution and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of ecological networks derived from the Los Angeles Family and Neighborhood Survey activity location data. PMID:27721556
Comparison of climate envelope models developed using expert-selected variables versus statistical selection

USGS Publications Warehouse

Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

2017-01-01

Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.
Tracing the source of numerical climate model uncertainties in precipitation simulations using a feature-oriented statistical model

NASA Astrophysics Data System (ADS)

Xu, Y.; Jones, A. D.; Rhoades, A.

2017-12-01

Precipitation is a key component in hydrologic cycles, and changing precipitation regimes contribute to more intense and frequent drought and flood events around the world. Numerical climate modeling is a powerful tool to study climatology and to predict future changes. Despite the continuous improvement in numerical models, long-term precipitation prediction remains a challenge especially at regional scales. To improve numerical simulations of precipitation, it is important to find out where the uncertainty in precipitation simulations comes from. There are two types of uncertainty in numerical model predictions. One is related to uncertainty in the input data, such as model's boundary and initial conditions. These uncertainties would propagate to the final model outcomes even if the numerical model has exactly replicated the true world. But a numerical model cannot exactly replicate the true world. Therefore, the other type of model uncertainty is related the errors in the model physics, such as the parameterization of sub-grid scale processes, i.e., given precise input conditions, how much error could be generated by the in-precise model. Here, we build two statistical models based on a neural network algorithm to predict long-term variation of precipitation over California: one uses "true world" information derived from observations, and the other uses "modeled world" information using model inputs and outputs from the North America Coordinated Regional Downscaling Project (NA CORDEX). We derive multiple climate feature metrics as the predictors for the statistical model to represent the impact of global climate on local hydrology, and include topography as a predictor to represent the local control. We first compare the predictors between the true world and the modeled world to determine the errors contained in the input data. By perturbing the predictors in the statistical model, we estimate how much uncertainty in the model's final outcomes is accounted for by each predictor. By comparing the statistical model derived from true world information and modeled world information, we assess the errors lying in the physics of the numerical models. This work provides a unique insight to assess the performance of numerical climate models, and can be used to guide improvement of precipitation prediction.
Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

NASA Astrophysics Data System (ADS)

Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

2015-04-01

To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.
The joint space-time statistics of macroweather precipitation, space-time statistical factorization and macroweather models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lovejoy, S., E-mail: lovejoy@physics.mcgill.ca; Lima, M. I. P. de; Department of Civil Engineering, University of Coimbra, 3030-788 Coimbra

2015-07-15

Over the range of time scales from about 10 days to 30–100 years, in addition to the familiar weather and climate regimes, there is an intermediate “macroweather” regime characterized by negative temporal fluctuation exponents: implying that fluctuations tend to cancel each other out so that averages tend to converge. We show theoretically and numerically that macroweather precipitation can be modeled by a stochastic weather-climate model (the Climate Extended Fractionally Integrated Flux, model, CEFIF) first proposed for macroweather temperatures and we show numerically that a four parameter space-time CEFIF model can approximately reproduce eight or so empirical space-time exponents. In spitemore » of this success, CEFIF is theoretically and numerically difficult to manage. We therefore propose a simplified stochastic model in which the temporal behavior is modeled as a fractional Gaussian noise but the spatial behaviour as a multifractal (climate) cascade: a spatial extension of the recently introduced ScaLIng Macroweather Model, SLIMM. Both the CEFIF and this spatial SLIMM model have a property often implicitly assumed by climatologists that climate statistics can be “homogenized” by normalizing them with the standard deviation of the anomalies. Physically, it means that the spatial macroweather variability corresponds to different climate zones that multiplicatively modulate the local, temporal statistics. This simplified macroweather model provides a framework for macroweather forecasting that exploits the system's long range memory and spatial correlations; for it, the forecasting problem has been solved. We test this factorization property and the model with the help of three centennial, global scale precipitation products that we analyze jointly in space and in time.« less
Robust statistical reconstruction for charged particle tomography

DOEpatents

Schultz, Larry Joe; Klimenko, Alexei Vasilievich; Fraser, Andrew Mcleod; Morris, Christopher; Orum, John Christopher; Borozdin, Konstantin N; Sossong, Michael James; Hengartner, Nicolas W

2013-10-08

Systems and methods for charged particle detection including statistical reconstruction of object volume scattering density profiles from charged particle tomographic data to determine the probability distribution of charged particle scattering using a statistical multiple scattering model and determine a substantially maximum likelihood estimate of object volume scattering density using expectation maximization (ML/EM) algorithm to reconstruct the object volume scattering density. The presence of and/or type of object occupying the volume of interest can be identified from the reconstructed volume scattering density profile. The charged particle tomographic data can be cosmic ray muon tomographic data from a muon tracker for scanning packages, containers, vehicles or cargo. The method can be implemented using a computer program which is executable on a computer.
Multi-scale Characterization and Modeling of Surface Slope Probability Distribution for ~20-km Diameter Lunar Craters

NASA Astrophysics Data System (ADS)

Mahanti, P.; Robinson, M. S.; Boyd, A. K.

2013-12-01

Craters ~20-km diameter and above significantly shaped the lunar landscape. The statistical nature of the slope distribution on their walls and floors dominate the overall slope distribution statistics for the lunar surface. Slope statistics are inherently useful for characterizing the current topography of the surface, determining accurate photometric and surface scattering properties, and in defining lunar surface trafficability [1-4]. Earlier experimental studies on the statistical nature of lunar surface slopes were restricted either by resolution limits (Apollo era photogrammetric studies) or by model error considerations (photoclinometric and radar scattering studies) where the true nature of slope probability distribution was not discernible at baselines smaller than a kilometer[2,3,5]. Accordingly, historical modeling of lunar surface slopes probability distributions for applications such as in scattering theory development or rover traversability assessment is more general in nature (use of simple statistical models such as the Gaussian distribution[1,2,5,6]). With the advent of high resolution, high precision topographic models of the Moon[7,8], slopes in lunar craters can now be obtained at baselines as low as 6-meters allowing unprecedented multi-scale (multiple baselines) modeling possibilities for slope probability distributions. Topographic analysis (Lunar Reconnaissance Orbiter Camera (LROC) Narrow Angle Camera (NAC) 2-m digital elevation models (DEM)) of ~20-km diameter Copernican lunar craters revealed generally steep slopes on interior walls (30° to 36°, locally exceeding 40°) over 15-meter baselines[9]. In this work, we extend the analysis from a probability distribution modeling point-of-view with NAC DEMs to characterize the slope statistics for the floors and walls for the same ~20-km Copernican lunar craters. The difference in slope standard deviations between the Gaussian approximation and the actual distribution (2-meter sampling) was computed over multiple scales. This slope analysis showed that local slope distributions are non-Gaussian for both crater walls and floors. Over larger baselines (~100 meters), crater wall slope probability distributions do approximate Gaussian distributions better, but have long distribution tails. Crater floor probability distributions however, were always asymmetric (for the baseline scales analyzed) and less affected by baseline scale variations. Accordingly, our results suggest that use of long tailed probability distributions (like Cauchy) and a baseline-dependant multi-scale model can be more effective in describing the slope statistics for lunar topography. Refrences: [1]Moore, H.(1971), JGR,75(11) [2]Marcus, A. H.(1969),JGR,74 (22).[3]R.J. Pike (1970),U.S. Geological Survey Working Paper [4]N. C. Costes, J. E. Farmer and E. B. George (1972),NASA Technical Report TR R-401 [5]M. N. Parker and G. L. Tyler(1973), Radio Science, 8(3),177-184 [6]Alekseev, V. A.et al (1968), Soviet Astronomy, Vol. 11, p.860 [7]Burns et al. (2012) Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XXXIX-B4, 483-488.[8]Smith et al. (2010) GRL 37, L18204, DOI: 10.1029/2010GL043751. [9]Wagner R., Robinson, M., Speyerer E., Mahanti, P., LPSC 2013, #2924.

Statistical self-similarity of width function maxima with implications to floods

USGS Publications Warehouse

Veitzer, S.A.; Gupta, V.K.

2001-01-01

Recently a new theory of random self-similar river networks, called the RSN model, was introduced to explain empirical observations regarding the scaling properties of distributions of various topologic and geometric variables in natural basins. The RSN model predicts that such variables exhibit statistical simple scaling, when indexed by Horton-Strahler order. The average side tributary structure of RSN networks also exhibits Tokunaga-type self-similarity which is widely observed in nature. We examine the scaling structure of distributions of the maximum of the width function for RSNs for nested, complete Strahler basins by performing ensemble simulations. The maximum of the width function exhibits distributional simple scaling, when indexed by Horton-Strahler order, for both RSNs and natural river networks extracted from digital elevation models (DEMs). We also test a powerlaw relationship between Horton ratios for the maximum of the width function and drainage areas. These results represent first steps in formulating a comprehensive physical statistical theory of floods at multiple space-time scales for RSNs as discrete hierarchical branching structures. ?? 2001 Published by Elsevier Science Ltd.
A statistical approach to quasi-extinction forecasting.

PubMed

Holmes, Elizabeth Eli; Sabo, John L; Viscido, Steven Vincent; Fagan, William Fredric

2007-12-01

Forecasting population decline to a certain critical threshold (the quasi-extinction risk) is one of the central objectives of population viability analysis (PVA), and such predictions figure prominently in the decisions of major conservation organizations. In this paper, we argue that accurate forecasting of a population's quasi-extinction risk does not necessarily require knowledge of the underlying biological mechanisms. Because of the stochastic and multiplicative nature of population growth, the ensemble behaviour of population trajectories converges to common statistical forms across a wide variety of stochastic population processes. This paper provides a theoretical basis for this argument. We show that the quasi-extinction surfaces of a variety of complex stochastic population processes (including age-structured, density-dependent and spatially structured populations) can be modelled by a simple stochastic approximation: the stochastic exponential growth process overlaid with Gaussian errors. Using simulated and real data, we show that this model can be estimated with 20-30 years of data and can provide relatively unbiased quasi-extinction risk with confidence intervals considerably smaller than (0,1). This was found to be true even for simulated data derived from some of the noisiest population processes (density-dependent feedback, species interactions and strong age-structure cycling). A key advantage of statistical models is that their parameters and the uncertainty of those parameters can be estimated from time series data using standard statistical methods. In contrast for most species of conservation concern, biologically realistic models must often be specified rather than estimated because of the limited data available for all the various parameters. Biologically realistic models will always have a prominent place in PVA for evaluating specific management options which affect a single segment of a population, a single demographic rate, or different geographic areas. However, for forecasting quasi-extinction risk, statistical models that are based on the convergent statistical properties of population processes offer many advantages over biologically realistic models.
The power to detect linkage in complex disease by means of simple LOD-score analyses.

PubMed Central

Greenberg, D A; Abreu, P; Hodge, S E

1998-01-01

Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328
The power to detect linkage in complex disease by means of simple LOD-score analyses.

PubMed

Greenberg, D A; Abreu, P; Hodge, S E

1998-09-01

Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.
Unscaled Bayes factors for multiple hypothesis testing in microarray experiments.

PubMed

Bertolino, Francesco; Cabras, Stefano; Castellanos, Maria Eugenia; Racugno, Walter

2015-12-01

Multiple hypothesis testing collects a series of techniques usually based on p-values as a summary of the available evidence from many statistical tests. In hypothesis testing, under a Bayesian perspective, the evidence for a specified hypothesis against an alternative, conditionally on data, is given by the Bayes factor. In this study, we approach multiple hypothesis testing based on both Bayes factors and p-values, regarding multiple hypothesis testing as a multiple model selection problem. To obtain the Bayes factors we assume default priors that are typically improper. In this case, the Bayes factor is usually undetermined due to the ratio of prior pseudo-constants. We show that ignoring prior pseudo-constants leads to unscaled Bayes factor which do not invalidate the inferential procedure in multiple hypothesis testing, because they are used within a comparative scheme. In fact, using partial information from the p-values, we are able to approximate the sampling null distribution of the unscaled Bayes factor and use it within Efron's multiple testing procedure. The simulation study suggests that under normal sampling model and even with small sample sizes, our approach provides false positive and false negative proportions that are less than other common multiple hypothesis testing approaches based only on p-values. The proposed procedure is illustrated in two simulation studies, and the advantages of its use are showed in the analysis of two microarray experiments. © The Author(s) 2011.
The Effects of Clinically Relevant Multiple-Choice Items on the Statistical Discrimination of Physician Clinical Competence.

ERIC Educational Resources Information Center

Downing, Steven M.; Maatsch, Jack L.

To test the effect of clinically relevant multiple-choice item content on the validity of statistical discriminations of physicians' clinical competence, data were collected from a field test of the Emergency Medicine Examination, test items for the certification of specialists in emergency medicine. Two 91-item multiple-choice subscales were…
Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

PubMed

Shabri, Ani; Samsudin, Ruhaidah

2014-01-01

Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Software engineering the mixed model for genome-wide association studies on large samples.

PubMed

Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

2009-11-01

Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.
Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

PubMed Central

Shabri, Ani; Samsudin, Ruhaidah

2014-01-01

Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666
Editorial: Introduction to the Special Section on Causal Inference in Cross Sectional and Longitudinal Mediational Models

PubMed Central

West, Stephen G.

2016-01-01

Psychologists have long had interest in the processes through which antecedent variables produce their effects on the outcomes of ultimate interest (e.g., Wood-worth's Stimulus-Organism-Response model). Models involving such meditational processes have characterized many of the important psychological theories of the 20th century and continue to the present day. However, it was not until Judd and Kenny (1981) and Baron and Kenny (1986) combined ideas from experimental design and structural equation modeling that statistical methods for directly testing such models, now known as mediation analysis, began to be developed. Methodologists have improved these statistical methods, developing new, more efficient estimators for mediated effects. They have also extended mediation analysis to multilevel data structures, models involving multiple mediators, models in which interactions occur, and an array of noncontinuous outcome measures (see MacKinnon, 2008). This work nicely maps on to key questions of applied researchers and has led to an outpouring of research testing meditational models (As of August, 2011, Baron and Kenny's article has had over 24,000 citations according to Google Scholar). PMID:26736046
A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants.

PubMed

Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

2014-04-01

Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.
A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants

PubMed Central

Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

2014-01-01

Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided. PMID:24834325
Statistical downscaling of general-circulation-model- simulated average monthly air temperature to the beginning of flowering of the dandelion (Taraxacum officinale) in Slovenia

NASA Astrophysics Data System (ADS)

Bergant, Klemen; Kajfež-Bogataj, Lučka; Črepinšek, Zalika

2002-02-01

Phenological observations are a valuable source of information for investigating the relationship between climate variation and plant development. Potential climate change in the future will shift the occurrence of phenological phases. Information about future climate conditions is needed in order to estimate this shift. General circulation models (GCM) provide the best information about future climate change. They are able to simulate reliably the most important mean features on a large scale, but they fail on a regional scale because of their low spatial resolution. A common approach to bridging the scale gap is statistical downscaling, which was used to relate the beginning of flowering of Taraxacum officinale in Slovenia with the monthly mean near-surface air temperature for January, February and March in Central Europe. Statistical models were developed and tested with NCAR/NCEP Reanalysis predictor data and EARS predictand data for the period 1960-1999. Prior to developing statistical models, empirical orthogonal function (EOF) analysis was employed on the predictor data. Multiple linear regression was used to relate the beginning of flowering with expansion coefficients of the first three EOF for the Janauary, Febrauary and March air temperatures, and a strong correlation was found between them. Developed statistical models were employed on the results of two GCM (HadCM3 and ECHAM4/OPYC3) to estimate the potential shifts in the beginning of flowering for the periods 1990-2019 and 2020-2049 in comparison with the period 1960-1989. The HadCM3 model predicts, on average, 4 days earlier occurrence and ECHAM4/OPYC3 5 days earlier occurrence of flowering in the period 1990-2019. The analogous results for the period 2020-2049 are a 10- and 11-day earlier occurrence.
Time interval between successive trading in foreign currency market: from microscopic to macroscopic

NASA Astrophysics Data System (ADS)

Sato, Aki-Hiro

2004-12-01

Recently, it has been shown that inter-transaction interval (ITI) distribution of foreign currency rates has a fat tail. In order to understand the statistical property of the ITI dealer model with N interactive agents is proposed. From numerical simulations it is confirmed that the ITI distribution of the dealer model has a power law tail. The random multiplicative process (RMP) can be approximately derived from the ITI of the dealer model. Consequently, we conclude that the power law tail of the ITI distribution of the dealer model is a result of the RMP.
Empirical-statistical downscaling of reanalysis data to high-resolution air temperature and specific humidity above a glacier surface (Cordillera Blanca, Peru)

NASA Astrophysics Data System (ADS)

Hofer, Marlis; MöLg, Thomas; Marzeion, Ben; Kaser, Georg

2010-06-01

Recently initiated observation networks in the Cordillera Blanca (Peru) provide temporally high-resolution, yet short-term, atmospheric data. The aim of this study is to extend the existing time series into the past. We present an empirical-statistical downscaling (ESD) model that links 6-hourly National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis data to air temperature and specific humidity, measured at the tropical glacier Artesonraju (northern Cordillera Blanca). The ESD modeling procedure includes combined empirical orthogonal function and multiple regression analyses and a double cross-validation scheme for model evaluation. Apart from the selection of predictor fields, the modeling procedure is automated and does not include subjective choices. We assess the ESD model sensitivity to the predictor choice using both single-field and mixed-field predictors. Statistical transfer functions are derived individually for different months and times of day. The forecast skill largely depends on month and time of day, ranging from 0 to 0.8. The mixed-field predictors perform better than the single-field predictors. The ESD model shows added value, at all time scales, against simpler reference models (e.g., the direct use of reanalysis grid point values). The ESD model forecast 1960-2008 clearly reflects interannual variability related to the El Niño/Southern Oscillation but is sensitive to the chosen predictor type.
Inherited genetic variants associated with occurrence of multiple primary melanoma.

PubMed

Gibbs, David C; Orlow, Irene; Kanetsky, Peter A; Luo, Li; Kricker, Anne; Armstrong, Bruce K; Anton-Culver, Hoda; Gruber, Stephen B; Marrett, Loraine D; Gallagher, Richard P; Zanetti, Roberto; Rosso, Stefano; Dwyer, Terence; Sharma, Ajay; La Pilla, Emily; From, Lynn; Busam, Klaus J; Cust, Anne E; Ollila, David W; Begg, Colin B; Berwick, Marianne; Thomas, Nancy E

2015-06-01

Recent studies, including genome-wide association studies, have identified several putative low-penetrance susceptibility loci for melanoma. We sought to determine their generalizability to genetic predisposition for multiple primary melanoma in the international population-based Genes, Environment, and Melanoma (GEM) Study. GEM is a case-control study of 1,206 incident cases of multiple primary melanoma and 2,469 incident first primary melanoma participants as the control group. We investigated the odds of developing multiple primary melanoma for 47 SNPs from 21 distinct genetic regions previously reported to be associated with melanoma. ORs and 95% confidence intervals were determined using logistic regression models adjusted for baseline features (age, sex, age by sex interaction, and study center). We investigated univariable models and built multivariable models to assess independent effects of SNPs. Eleven SNPs in 6 gene neighborhoods (TERT/CLPTM1L, TYRP1, MTAP, TYR, NCOA6, and MX2) and a PARP1 haplotype were associated with multiple primary melanoma. In a multivariable model that included only the most statistically significant findings from univariable modeling and adjusted for pigmentary phenotype, back nevi, and baseline features, we found TERT/CLPTM1L rs401681 (P = 0.004), TYRP1 rs2733832 (P = 0.006), MTAP rs1335510 (P = 0.0005), TYR rs10830253 (P = 0.003), and MX2 rs45430 (P = 0.008) to be significantly associated with multiple primary melanoma, while NCOA6 rs4911442 approached significance (P = 0.06). The GEM Study provides additional evidence for the relevance of these genetic regions to melanoma risk and estimates the magnitude of the observed genetic effect on development of subsequent primary melanoma. ©2015 American Association for Cancer Research.
Inherited genetic variants associated with occurrence of multiple primary melanoma

PubMed Central

Gibbs, David C.; Orlow, Irene; Kanetsky, Peter A.; Luo, Li; Kricker, Anne; Armstrong, Bruce K.; Anton-Culver, Hoda; Gruber, Stephen B.; Marrett, Loraine D.; Gallagher, Richard P.; Zanetti, Roberto; Rosso, Stefano; Dwyer, Terence; Sharma, Ajay; La Pilla, Emily; From, Lynn; Busam, Klaus J.; Cust, Anne E.; Ollila, David W.; Begg, Colin B.; Berwick, Marianne; Thomas, Nancy E.

2015-01-01

Recent studies including genome-wide association studies have identified several putative low-penetrance susceptibility loci for melanoma. We sought to determine their generalizability to genetic predisposition for multiple primary melanoma in the international population-based Genes, Environment, and Melanoma (GEM) Study. GEM is a case-control study of 1,206 incident cases of multiple primary melanoma and 2,469 incident first primary melanoma participants as the control group. We investigated the odds of developing multiple primary melanoma for 47 single nucleotide polymorphisms (SNP) from 21 distinct genetic regions previously reported to be associated with melanoma. ORs and 95% CIs were determined using logistic regression models adjusted for baseline features (age, sex, age by sex interaction, and study center). We investigated univariable models and built multivariable models to assess independent effects of SNPs. Eleven SNPs in 6 gene neighborhoods (TERT/CLPTM1L, TYRP1, MTAP, TYR, NCOA6, and MX2) and a PARP1 haplotype were associated with multiple primary melanoma. In a multivariable model that included only the most statistically significant findings from univariable modeling and adjusted for pigmentary phenotype, back nevi, and baseline features, we found TERT/CLPTM1L rs401681 (P = 0.004), TYRP1 rs2733832 (P = 0.006), MTAP rs1335510 (P = 0.0005), TYR rs10830253 (P = 0.003), and MX2 rs45430 (P = 0.008) to be significantly associated with multiple primary melanoma while NCOA6 rs4911442 approached significance (P = 0.06). The GEM study provides additional evidence for the relevance of these genetic regions to melanoma risk and estimates the magnitude of the observed genetic effect on development of subsequent primary melanoma. PMID:25837821
On temporal stochastic modeling of precipitation, nesting models across scales

NASA Astrophysics Data System (ADS)

Paschalis, Athanasios; Molnar, Peter; Fatichi, Simone; Burlando, Paolo

2014-01-01

We analyze the performance of composite stochastic models of temporal precipitation which can satisfactorily reproduce precipitation properties across a wide range of temporal scales. The rationale is that a combination of stochastic precipitation models which are most appropriate for specific limited temporal scales leads to better overall performance across a wider range of scales than single models alone. We investigate different model combinations. For the coarse (daily) scale these are models based on Alternating renewal processes, Markov chains, and Poisson cluster models, which are then combined with a microcanonical Multiplicative Random Cascade model to disaggregate precipitation to finer (minute) scales. The composite models were tested on data at four sites in different climates. The results show that model combinations improve the performance in key statistics such as probability distributions of precipitation depth, autocorrelation structure, intermittency, reproduction of extremes, compared to single models. At the same time they remain reasonably parsimonious. No model combination was found to outperform the others at all sites and for all statistics, however we provide insight on the capabilities of specific model combinations. The results for the four different climates are similar, which suggests a degree of generality and wider applicability of the approach.
A Simple Illustration for the Need of Multiple Comparison Procedures

ERIC Educational Resources Information Center

Carter, Rickey E.

2010-01-01

Statistical adjustments to accommodate multiple comparisons are routinely covered in introductory statistical courses. The fundamental rationale for such adjustments, however, may not be readily understood. This article presents a simple illustration to help remedy this.
Effects of Ensemble Configuration on Estimates of Regional Climate Uncertainties

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goldenson, N.; Mauger, G.; Leung, L. R.

Internal variability in the climate system can contribute substantial uncertainty in climate projections, particularly at regional scales. Internal variability can be quantified using large ensembles of simulations that are identical but for perturbed initial conditions. Here we compare methods for quantifying internal variability. Our study region spans the west coast of North America, which is strongly influenced by El Niño and other large-scale dynamics through their contribution to large-scale internal variability. Using a statistical framework to simultaneously account for multiple sources of uncertainty, we find that internal variability can be quantified consistently using a large ensemble or an ensemble ofmore » opportunity that includes small ensembles from multiple models and climate scenarios. The latter also produce estimates of uncertainty due to model differences. We conclude that projection uncertainties are best assessed using small single-model ensembles from as many model-scenario pairings as computationally feasible, which has implications for ensemble design in large modeling efforts.« less

Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

NASA Astrophysics Data System (ADS)

Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

2016-01-01

Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
Collaborative filtering on a family of biological targets.

PubMed

Erhan, Dumitru; L'heureux, Pierre-Jean; Yue, Shi Yi; Bengio, Yoshua

2006-01-01

Building a QSAR model of a new biological target for which few screening data are available is a statistical challenge. However, the new target may be part of a bigger family, for which we have more screening data. Collaborative filtering or, more generally, multi-task learning, is a machine learning approach that improves the generalization performance of an algorithm by using information from related tasks as an inductive bias. We use collaborative filtering techniques for building predictive models that link multiple targets to multiple examples. The more commonalities between the targets, the better the multi-target model that can be built. We show an example of a multi-target neural network that can use family information to produce a predictive model of an undersampled target. We evaluate JRank, a kernel-based method designed for collaborative filtering. We show their performance on compound prioritization for an HTS campaign and the underlying shared representation between targets. JRank outperformed the neural network both in the single- and multi-target models.
Genetic Programming Transforms in Linear Regression Situations

NASA Astrophysics Data System (ADS)

Castillo, Flor; Kordon, Arthur; Villa, Carlos

The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Kernel canonical-correlation Granger causality for multiple time series

NASA Astrophysics Data System (ADS)

Wu, Guorong; Duan, Xujun; Liao, Wei; Gao, Qing; Chen, Huafu

2011-04-01

Canonical-correlation analysis as a multivariate statistical technique has been applied to multivariate Granger causality analysis to infer information flow in complex systems. It shows unique appeal and great superiority over the traditional vector autoregressive method, due to the simplified procedure that detects causal interaction between multiple time series, and the avoidance of potential model estimation problems. However, it is limited to the linear case. Here, we extend the framework of canonical correlation to include the estimation of multivariate nonlinear Granger causality for drawing inference about directed interaction. Its feasibility and effectiveness are verified on simulated data.
Forecasting influenza in Hong Kong with Google search queries and statistical model fusion.

PubMed

Xu, Qinneng; Gel, Yulia R; Ramirez Ramirez, L Leticia; Nezafati, Kusha; Zhang, Qingpeng; Tsui, Kwok-Leung

2017-01-01

The objective of this study is to investigate predictive utility of online social media and web search queries, particularly, Google search data, to forecast new cases of influenza-like-illness (ILI) in general outpatient clinics (GOPC) in Hong Kong. To mitigate the impact of sensitivity to self-excitement (i.e., fickle media interest) and other artifacts of online social media data, in our approach we fuse multiple offline and online data sources. Four individual models: generalized linear model (GLM), least absolute shrinkage and selection operator (LASSO), autoregressive integrated moving average (ARIMA), and deep learning (DL) with Feedforward Neural Networks (FNN) are employed to forecast ILI-GOPC both one week and two weeks in advance. The covariates include Google search queries, meteorological data, and previously recorded offline ILI. To our knowledge, this is the first study that introduces deep learning methodology into surveillance of infectious diseases and investigates its predictive utility. Furthermore, to exploit the strength from each individual forecasting models, we use statistical model fusion, using Bayesian model averaging (BMA), which allows a systematic integration of multiple forecast scenarios. For each model, an adaptive approach is used to capture the recent relationship between ILI and covariates. DL with FNN appears to deliver the most competitive predictive performance among the four considered individual models. Combing all four models in a comprehensive BMA framework allows to further improve such predictive evaluation metrics as root mean squared error (RMSE) and mean absolute predictive error (MAPE). Nevertheless, DL with FNN remains the preferred method for predicting locations of influenza peaks. The proposed approach can be viewed a feasible alternative to forecast ILI in Hong Kong or other countries where ILI has no constant seasonal trend and influenza data resources are limited. The proposed methodology is easily tractable and computationally efficient.
Decision-making for foot-and-mouth disease control: Objectives matter

USGS Publications Warehouse

Probert, William J. M.; Shea, Katriona; Fonnesbeck, Christopher J.; Runge, Michael C.; Carpenter, Tim E.; Durr, Salome; Garner, M. Graeme; Harvey, Neil; Stevenson, Mark A.; Webb, Colleen T.; Werkman, Marleen; Tildesley, Michael J.; Ferrari, Matthew J.

2016-01-01

Formal decision-analytic methods can be used to frame disease control problems, the first step of which is to define a clear and specific objective. We demonstrate the imperative of framing clearly-defined management objectives in finding optimal control actions for control of disease outbreaks. We illustrate an analysis that can be applied rapidly at the start of an outbreak when there are multiple stakeholders involved with potentially multiple objectives, and when there are also multiple disease models upon which to compare control actions. The output of our analysis frames subsequent discourse between policy-makers, modellers and other stakeholders, by highlighting areas of discord among different management objectives and also among different models used in the analysis. We illustrate this approach in the context of a hypothetical foot-and-mouth disease (FMD) outbreak in Cumbria, UK using outputs from five rigorously-studied simulation models of FMD spread. We present both relative rankings and relative performance of controls within each model and across a range of objectives. Results illustrate how control actions change across both the base metric used to measure management success and across the statistic used to rank control actions according to said metric. This work represents a first step towards reconciling the extensive modelling work on disease control problems with frameworks for structured decision making.
Occupational exposure to methylene chloride and risk of cancer: a meta-analysis.

PubMed

Liu, Tao; Xu, Qin-er; Zhang, Chuan-hui; Zhang, Peng

2013-12-01

We searched MEDLINE and EMBASE for epidemiologic studies on occupational exposure to methylene chloride and risk of cancer. Estimates of study-specific odds ratios (ORs) were calculated using inverse-variance-weighted fixed-effects models and random-effects models. Statistical tests for heterogeneity were applied. We summarized data from five cohort studies and 13 case-control studies. The pooled OR for multiple myeloma was (OR 2.04; 95 % CI 1.31-3.17) in relation to occupational exposure to methylene chloride but not for non-Hodgkin's lymphoma, leukemia, breast, bronchus, trachea and lung, brain and other CNS, biliary passages and liver, prostate, pancreas, and rectum. Furthermore, we focused on specific outcomes for non-Hodgkin's lymphoma and multiple myeloma because of exposure misclassification. The pooling OR for non-Hodgkin's lymphoma and multiple myeloma was 1.42 (95 % CI 1.10-1.83) with moderate degree of heterogeneity among the studies (I (2) = 26.9 %, p = 0.205). We found an excess risk of multiple myeloma. The non-Hodgkin's lymphoma and leukemia that have shown weak effects should be investigated further.
A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

NASA Astrophysics Data System (ADS)

Guillen, George; Rainey, Gail; Morin, Michelle

2004-04-01

Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Statistical and dynamical modeling of heavy-ion fusion-fission reactions

NASA Astrophysics Data System (ADS)

Eslamizadeh, H.; Razazzadeh, H.

2018-02-01

A modified statistical model and a four dimensional dynamical model based on Langevin equations have been used to simulate the fission process of the excited compound nuclei 207At and 216Ra produced in the fusion 19F + 188Os and 19F + 197Au reactions. The evaporation residue cross section, the fission cross section, the pre-scission neutron, proton and alpha multiplicities and the anisotropy of fission fragments angular distribution have been calculated for the excited compound nuclei 207At and 216Ra. In the modified statistical model the effects of spin K about the symmetry axis and temperature have been considered in calculations of the fission widths and the potential energy surfaces. It was shown that the modified statistical model can reproduce the above mentioned experimental data by using appropriate values of the temperature coefficient of the effective potential equal to λ = 0.0180 ± 0.0055, 0.0080 ± 0.0030 MeV-2 and the scaling factor of the fission barrier height equal to rs = 1.0015 ± 0.0025, 1.0040 ± 0.0020 for the compound nuclei 207At and 216Ra, respectively. Three collective shape coordinates plus the projection of total spin of the compound nucleus on the symmetry axis, K, were considered in the four dimensional dynamical model. In the dynamical calculations, dissipation was generated through the chaos weighted wall and window friction formula. Comparison of the theoretical results with the experimental data showed that two models make it possible to reproduce satisfactorily the above mentioned experimental data for the excited compound nuclei 207At and 216Ra.
Estimation of the residual bromine concentration after disinfection of cooling water by statistical evaluation.

PubMed

Megalopoulos, Fivos A; Ochsenkuehn-Petropoulou, Maria T

2015-01-01

A statistical model based on multiple linear regression is developed, to estimate the bromine residual that can be expected after the bromination of cooling water. Make-up water sampled from a power plant in the Greek territory was used for the creation of the various cooling water matrices under investigation. The amount of bromine fed to the circuit, as well as other important operational parameters such as concentration at the cooling tower, temperature, organic load and contact time are taken as the independent variables. It is found that the highest contribution to the model's predictive ability comes from cooling water's organic load concentration, followed by the amount of bromine fed to the circuit, the water's mean temperature, the duration of the bromination period and finally its conductivity. Comparison of the model results with the experimental data confirms its ability to predict residual bromine given specific bromination conditions.
Multiple Statistical Models Based Analysis of Causative Factors and Loess Landslides in Tianshui City, China

NASA Astrophysics Data System (ADS)

Su, Xing; Meng, Xingmin; Ye, Weilin; Wu, Weijiang; Liu, Xingrong; Wei, Wanhong

2018-03-01

Tianshui City is one of the mountainous cities that are threatened by severe geo-hazards in Gansu Province, China. Statistical probability models have been widely used in analyzing and evaluating geo-hazards such as landslide. In this research, three approaches (Certainty Factor Method, Weight of Evidence Method and Information Quantity Method) were adopted to quantitively analyze the relationship between the causative factors and the landslides, respectively. The source data used in this study are including the SRTM DEM and local geological maps in the scale of 1:200,000. 12 causative factors (i.e., altitude, slope, aspect, curvature, plan curvature, profile curvature, roughness, relief amplitude, and distance to rivers, distance to faults, distance to roads, and the stratum lithology) were selected to do correlation analysis after thorough investigation of geological conditions and historical landslides. The results indicate that the outcomes of the three models are fairly consistent.
Fission time scale from pre-scission neutron and α multiplicities in the 16O + 194Pt reaction

NASA Astrophysics Data System (ADS)

Kapoor, K.; Verma, S.; Sharma, P.; Mahajan, R.; Kaur, N.; Kaur, G.; Behera, B. R.; Singh, K. P.; Kumar, A.; Singh, H.; Dubey, R.; Saneesh, N.; Jhingan, A.; Sugathan, P.; Mohanto, G.; Nayak, B. K.; Saxena, A.; Sharma, H. P.; Chamoli, S. K.; Mukul, I.; Singh, V.

2017-11-01

Pre- and post-scission α -particle multiplicities have been measured for the reaction 16O+P194t at 98.4 MeV forming R210n compound nucleus. α particles were measured at various angles in coincidence with the fission fragments. Moving source technique was used to extract the pre- and post-scission contributions to the particle multiplicity. Study of the fission mechanism using the different probes are helpful in understanding the detailed reaction dynamics. The neutron multiplicities for this reaction have been reported earlier. The multiplicities of neutrons and α particles were reproduced using standard statistical model code joanne2 by varying the transient (τt r) and saddle to scission (τs s c) times. This code includes deformation dependent-particle transmission coefficients, binding energies and level densities. Fission time scales of the order of 50-65 ×10-21 s are required to reproduce the neutron and α -particle multiplicities.
Statistical Comparison of Spike Responses to Natural Stimuli in Monkey Area V1 With Simulated Responses of a Detailed Laminar Network Model for a Patch of V1

PubMed Central

Schuch, Klaus; Logothetis, Nikos K.; Maass, Wolfgang

2011-01-01

A major goal of computational neuroscience is the creation of computer models for cortical areas whose response to sensory stimuli resembles that of cortical areas in vivo in important aspects. It is seldom considered whether the simulated spiking activity is realistic (in a statistical sense) in response to natural stimuli. Because certain statistical properties of spike responses were suggested to facilitate computations in the cortex, acquiring a realistic firing regimen in cortical network models might be a prerequisite for analyzing their computational functions. We present a characterization and comparison of the statistical response properties of the primary visual cortex (V1) in vivo and in silico in response to natural stimuli. We recorded from multiple electrodes in area V1 of 4 macaque monkeys and developed a large state-of-the-art network model for a 5 × 5-mm patch of V1 composed of 35,000 neurons and 3.9 million synapses that integrates previously published anatomical and physiological details. By quantitative comparison of the model response to the “statistical fingerprint” of responses in vivo, we find that our model for a patch of V1 responds to the same movie in a way which matches the statistical structure of the recorded data surprisingly well. The deviation between the firing regimen of the model and the in vivo data are on the same level as deviations among monkeys and sessions. This suggests that, despite strong simplifications and abstractions of cortical network models, they are nevertheless capable of generating realistic spiking activity. To reach a realistic firing state, it was not only necessary to include both N-methyl-d-aspartate and GABAB synaptic conductances in our model, but also to markedly increase the strength of excitatory synapses onto inhibitory neurons (>2-fold) in comparison to literature values, hinting at the importance to carefully adjust the effect of inhibition for achieving realistic dynamics in current network models. PMID:21106898
Sensitivity Analysis of Multiple Informant Models When Data are Not Missing at Random

PubMed Central

Blozis, Shelley A.; Ge, Xiaojia; Xu, Shu; Natsuaki, Misaki N.; Shaw, Daniel S.; Neiderhiser, Jenae; Scaramella, Laura; Leve, Leslie; Reiss, David

2014-01-01

Missing data are common in studies that rely on multiple informant data to evaluate relationships among variables for distinguishable individuals clustered within groups. Estimation of structural equation models using raw data allows for incomplete data, and so all groups may be retained even if only one member of a group contributes data. Statistical inference is based on the assumption that data are missing completely at random or missing at random. Importantly, whether or not data are missing is assumed to be independent of the missing data. A saturated correlates model that incorporates correlates of the missingness or the missing data into an analysis and multiple imputation that may also use such correlates offer advantages over the standard implementation of SEM when data are not missing at random because these approaches may result in a data analysis problem for which the missingness is ignorable. This paper considers these approaches in an analysis of family data to assess the sensitivity of parameter estimates to assumptions about missing data, a strategy that may be easily implemented using SEM software. PMID:25221420
Health Service Access across Racial/Ethnic Groups of Children in the Child Welfare System

ERIC Educational Resources Information Center

Wells, Rebecca; Hillemeier, Marianne M.; Bai, Yu; Belue, Rhonda

2009-01-01

Objective: This study examined health service access among children of different racial/ethnic groups in the child welfare system in an attempt to identify and explain disparities. Methods: Data were from the National Survey of Child and Adolescent Well-Being (NSCAW). N for descriptive statistics = 2,505. N for multiple regression model = 537.…
catcher: A Software Program to Detect Answer Copying in Multiple-Choice Tests Based on Nominal Response Model

ERIC Educational Resources Information Center

Kalender, Ilker

2012-01-01

catcher is a software program designed to compute the [omega] index, a common statistical index for the identification of collusions (cheating) among examinees taking an educational or psychological test. It requires (a) responses and (b) ability estimations of individuals, and (c) item parameters to make computations and outputs the results of…
Influence of Family Communication Structure and Vanity Trait on Consumption Behavior: A Case Study of Adolescent Students in Taiwan

ERIC Educational Resources Information Center

Chang, Wei-Lung; Liu, Hsiang-Te; Lin, Tai-An; Wen, Yung-Sung

2008-01-01

The purpose of this research was to study the relationship between family communication structure, vanity trait, and related consumption behavior. The study used an empirical method with adolescent students from the northern part of Taiwan as the subjects. Multiple statistical methods and the SEM model were used for testing the hypotheses. The…
Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

NASA Astrophysics Data System (ADS)

Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

2008-04-01

Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Meteorological Contribution to Variability in Particulate Matter Concentrations

NASA Astrophysics Data System (ADS)

Woods, H. L.; Spak, S. N.; Holloway, T.

2006-12-01

Local concentrations of fine particulate matter (PM) are driven by a number of processes, including emissions of aerosols and gaseous precursors, atmospheric chemistry, and meteorology at local, regional, and global scales. We apply statistical downscaling methods, typically used for regional climate analysis, to estimate the contribution of regional scale meteorology to PM mass concentration variability at a range of sites in the Upper Midwestern U.S. Multiple years of daily PM10 and PM2.5 data, reported by the U.S. Environmental Protection Agency (EPA), are correlated with large-scale meteorology over the region from the National Centers for Environmental Prediction (NCEP) reanalysis data. We use two statistical downscaling methods (multiple linear regression, MLR, and analog) to identify which processes have the greatest impact on aerosol concentration variability. Empirical Orthogonal Functions of the NCEP meteorological data are correlated with PM timeseries at measurement sites. We examine which meteorological variables exert the greatest influence on PM variability, and which sites exhibit the greatest response to regional meteorology. To evaluate model performance, measurement data are withheld for limited periods, and compared with model results. Preliminary results suggest that regional meteorological processes account over 50% of aerosol concentration variability at study sites.
Robust biological parametric mapping: an improved technique for multimodal brain image analysis

NASA Astrophysics Data System (ADS)

Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.

2011-03-01

Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.

Burglar Target Selection

PubMed Central

Townsley, Michael; Bernasco, Wim; Ruiter, Stijn; Johnson, Shane D.; White, Gentry; Baum, Scott

2015-01-01

Objectives: This study builds on research undertaken by Bernasco and Nieuwbeerta and explores the generalizability of a theoretically derived offender target selection model in three cross-national study regions. Methods: Taking a discrete spatial choice approach, we estimate the impact of both environment- and offender-level factors on residential burglary placement in the Netherlands, the United Kingdom, and Australia. Combining cleared burglary data from all study regions in a single statistical model, we make statistical comparisons between environments. Results: In all three study regions, the likelihood an offender selects an area for burglary is positively influenced by proximity to their home, the proportion of easily accessible targets, and the total number of targets available. Furthermore, in two of the three study regions, juvenile offenders under the legal driving age are significantly more influenced by target proximity than adult offenders. Post hoc tests indicate the magnitudes of these impacts vary significantly between study regions. Conclusions: While burglary target selection strategies are consistent with opportunity-based explanations of offending, the impact of environmental context is significant. As such, the approach undertaken in combining observations from multiple study regions may aid criminology scholars in assessing the generalizability of observed findings across multiple environments. PMID:25866418
Synchronization from Second Order Network Connectivity Statistics

PubMed Central

Zhao, Liqiong; Beverlin, Bryce; Netoff, Theoden; Nykamp, Duane Q.

2011-01-01

We investigate how network structure can influence the tendency for a neuronal network to synchronize, or its synchronizability, independent of the dynamical model for each neuron. The synchrony analysis takes advantage of the framework of second order networks, which defines four second order connectivity statistics based on the relative frequency of two-connection network motifs. The analysis identifies two of these statistics, convergent connections, and chain connections, as highly influencing the synchrony. Simulations verify that synchrony decreases with the frequency of convergent connections and increases with the frequency of chain connections. These trends persist with simulations of multiple models for the neuron dynamics and for different types of networks. Surprisingly, divergent connections, which determine the fraction of shared inputs, do not strongly influence the synchrony. The critical role of chains, rather than divergent connections, in influencing synchrony can be explained by their increasing the effective coupling strength. The decrease of synchrony with convergent connections is primarily due to the resulting heterogeneity in firing rates. PMID:21779239
The Thomas–Fermi quark model: Non-relativistic aspects

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Quan, E-mail: quan_liu@baylor.edu; Wilcox, Walter, E-mail: walter_wilcox@baylor.edu

The first numerical investigation of non-relativistic aspects of the Thomas–Fermi (TF) statistical multi-quark model is given. We begin with a review of the traditional TF model without an explicit spin interaction and find that the spin splittings are too small in this approach. An explicit spin interaction is then introduced which entails the definition of a generalized spin “flavor”. We investigate baryonic states in this approach which can be described with two inequivalent wave functions; such states can however apply to multiple degenerate flavors. We find that the model requires a spatial separation of quark flavors, even if completely degenerate.more » Although the TF model is designed to investigate the possibility of many-quark states, we find surprisingly that it may be used to fit the low energy spectrum of almost all ground state octet and decuplet baryons. The charge radii of such states are determined and compared with lattice calculations and other models. The low energy fit obtained allows us to extrapolate to the six-quark doubly strange H-dibaryon state, flavor symmetric strange states of higher quark content and possible six quark nucleon–nucleon resonances. The emphasis here is on the systematics revealed in this approach. We view our model as a versatile and convenient tool for quickly assessing the characteristics of new, possibly bound, particle states of higher quark number content. -- Highlights: • First application of the statistical Thomas–Fermi quark model to baryonic systems. • Novel aspects: spin as generalized flavor; spatial separation of quark flavor phases. • The model is statistical, but the low energy baryonic spectrum is successfully fit. • Numerical applications include the H-dibaryon, strange states and nucleon resonances. • The statistical point of view does not encourage the idea of bound many-quark baryons.« less
P values are only an index to evidence: 20th- vs. 21st-century statistical science.

PubMed

Burnham, K P; Anderson, D R

2014-03-01

Early statistical methods focused on pre-data probability statements (i.e., data as random variables) such as P values; these are not really inferences nor are P values evidential. Statistical science clung to these principles throughout much of the 20th century as a wide variety of methods were developed for special cases. Looking back, it is clear that the underlying paradigm (i.e., testing and P values) was weak. As Kuhn (1970) suggests, new paradigms have taken the place of earlier ones: this is a goal of good science. New methods have been developed and older methods extended and these allow proper measures of strength of evidence and multimodel inference. It is time to move forward with sound theory and practice for the difficult practical problems that lie ahead. Given data the useful foundation shifts to post-data probability statements such as model probabilities (Akaike weights) or related quantities such as odds ratios and likelihood intervals. These new methods allow formal inference from multiple models in the a prior set. These quantities are properly evidential. The past century was aimed at finding the "best" model and making inferences from it. The goal in the 21st century is to base inference on all the models weighted by their model probabilities (model averaging). Estimates of precision can include model selection uncertainty leading to variances conditional on the model set. The 21st century will be about the quantification of information, proper measures of evidence, and multi-model inference. Nelder (1999:261) concludes, "The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science and technology".
Disaster response team FAST skills training with a portable ultrasound simulator compared to traditional training: pilot study.

PubMed

Paddock, Michael T; Bailitz, John; Horowitz, Russ; Khishfe, Basem; Cosby, Karen; Sergel, Michelle J

2015-03-01

Pre-hospital focused assessment with sonography in trauma (FAST) has been effectively used to improve patient care in multiple mass casualty events throughout the world. Although requisite FAST knowledge may now be learned remotely by disaster response team members, traditional live instructor and model hands-on FAST skills training remains logistically challenging. The objective of this pilot study was to compare the effectiveness of a novel portable ultrasound (US) simulator with traditional FAST skills training for a deployed mixed provider disaster response team. We randomized participants into one of three training groups stratified by provider role: Group A. Traditional Skills Training, Group B. US Simulator Skills Training, and Group C. Traditional Skills Training Plus US Simulator Skills Training. After skills training, we measured participants' FAST image acquisition and interpretation skills using a standardized direct observation tool (SDOT) with healthy models and review of FAST patient images. Pre- and post-course US and FAST knowledge were also assessed using a previously validated multiple-choice evaluation. We used the ANOVA procedure to determine the statistical significance of differences between the means of each group's skills scores. Paired sample t-tests were used to determine the statistical significance of pre- and post-course mean knowledge scores within groups. We enrolled 36 participants, 12 randomized to each training group. Randomization resulted in similar distribution of participants between training groups with respect to provider role, age, sex, and prior US training. For the FAST SDOT image acquisition and interpretation mean skills scores, there was no statistically significant difference between training groups. For US and FAST mean knowledge scores, there was a statistically significant improvement between pre- and post-course scores within each group, but again there was not a statistically significant difference between training groups. This pilot study of a deployed mixed-provider disaster response team suggests that a novel portable US simulator may provide equivalent skills training in comparison to traditional live instructor and model training. Further studies with a larger sample size and other measures of short- and long-term clinical performance are warranted.
Filtering Meteoroid Flights Using Multiple Unscented Kalman Filters

NASA Astrophysics Data System (ADS)

Sansom, E. K.; Bland, P. A.; Rutten, M. G.; Paxman, J.; Towner, M. C.

2016-11-01

Estimator algorithms are immensely versatile and powerful tools that can be applied to any problem where a dynamic system can be modeled by a set of equations and where observations are available. A well designed estimator enables system states to be optimally predicted and errors to be rigorously quantified. Unscented Kalman filters (UKFs) and interactive multiple models can be found in methods from satellite tracking to self-driving cars. The luminous trajectory of the Bunburra Rockhole fireball was observed by the Desert Fireball Network in mid-2007. The recorded data set is used in this paper to examine the application of these two techniques as a viable approach to characterizing fireball dynamics. The nonlinear, single-body system of equations, used to model meteoroid entry through the atmosphere, is challenged by gross fragmentation events that may occur. The incorporation of the UKF within an interactive multiple model smoother provides a likely solution for when fragmentation events may occur as well as providing a statistical analysis of the state uncertainties. In addition to these benefits, another advantage of this approach is its automatability for use within an image processing pipeline to facilitate large fireball data analyses and meteorite recoveries.
Multiple Component Event-Related Potential (mcERP) Estimation

NASA Technical Reports Server (NTRS)

Knuth, K. H.; Clanton, S. T.; Shah, A. S.; Truccolo, W. A.; Ding, M.; Bressler, S. L.; Trejo, L. J.; Schroeder, C. E.; Clancy, Daniel (Technical Monitor)

2002-01-01

We show how model-based estimation of the neural sources responsible for transient neuroelectric signals can be improved by the analysis of single trial data. Previously, we showed that a multiple component event-related potential (mcERP) algorithm can extract the responses of individual sources from recordings of a mixture of multiple, possibly interacting, neural ensembles. McERP also estimated single-trial amplitudes and onset latencies, thus allowing more accurate estimation of ongoing neural activity during an experimental trial. The mcERP algorithm is related to informax independent component analysis (ICA); however, the underlying signal model is more physiologically realistic in that a component is modeled as a stereotypic waveshape varying both in amplitude and onset latency from trial to trial. The result is a model that reflects quantities of interest to the neuroscientist. Here we demonstrate that the mcERP algorithm provides more accurate results than more traditional methods such as factor analysis and the more recent ICA. Whereas factor analysis assumes the sources are orthogonal and ICA assumes the sources are statistically independent, the mcERP algorithm makes no such assumptions thus allowing investigators to examine interactions among components by estimating the properties of single-trial responses.
MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

PubMed Central

Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

2015-01-01

Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201
HIV dynamics with multiple infections of target cells.

PubMed

Dixit, Narendra M; Perelson, Alan S

2005-06-07

The high incidence of multiple infections of cells by HIV sets the stage for rapid HIV evolution by means of recombination. Yet how HIV dynamics proceeds with multiple infections remains poorly understood. Here, we present a mathematical model that describes the dynamics of viral, target cell, and multiply infected cell subpopulations during HIV infection. Model calculations reproduce several experimental observations and provide key insights into the influence of multiple infections on HIV dynamics. We find that the experimentally observed scaling law, that the number of cells coinfected with two distinctly labeled viruses is proportional to the square of the total number of infected cells, can be generalized so that the number of triply infected cells is proportional to the cube of the number of infected cells, etc. Despite the expectation from Poisson statistics, we find that this scaling relationship only holds under certain conditions, which we predict. We also find that multiple infections do not influence viral dynamics when the rate of viral production from infected cells is independent of the number of times the cells are infected, a regime expected when viral production is limited by cellular rather than viral factors. This result may explain why extant models, which ignore multiple infections, successfully describe viral dynamics in HIV patients. Inhibiting CD4 down-modulation increases the average number of infections per cell. Consequently, altering CD4 down-modulation may allow for an experimental determination of whether viral or cellular factors limit viral production.
HIV dynamics with multiple infections of target cells

PubMed Central

Dixit, Narendra M.; Perelson, Alan S.

2005-01-01

The high incidence of multiple infections of cells by HIV sets the stage for rapid HIV evolution by means of recombination. Yet how HIV dynamics proceeds with multiple infections remains poorly understood. Here, we present a mathematical model that describes the dynamics of viral, target cell, and multiply infected cell subpopulations during HIV infection. Model calculations reproduce several experimental observations and provide key insights into the influence of multiple infections on HIV dynamics. We find that the experimentally observed scaling law, that the number of cells coinfected with two distinctly labeled viruses is proportional to the square of the total number of infected cells, can be generalized so that the number of triply infected cells is proportional to the cube of the number of infected cells, etc. Despite the expectation from Poisson statistics, we find that this scaling relationship only holds under certain conditions, which we predict. We also find that multiple infections do not influence viral dynamics when the rate of viral production from infected cells is independent of the number of times the cells are infected, a regime expected when viral production is limited by cellular rather than viral factors. This result may explain why extant models, which ignore multiple infections, successfully describe viral dynamics in HIV patients. Inhibiting CD4 down-modulation increases the average number of infections per cell. Consequently, altering CD4 down-modulation may allow for an experimental determination of whether viral or cellular factors limit viral production. PMID:15928092
Multiple flood vulnerability assessment approach based on fuzzy comprehensive evaluation method and coordinated development degree model.

PubMed

Yang, Weichao; Xu, Kui; Lian, Jijian; Bin, Lingling; Ma, Chao

2018-05-01

Flood is a serious challenge that increasingly affects the residents as well as policymakers. Flood vulnerability assessment is becoming gradually relevant in the world. The purpose of this study is to develop an approach to reveal the relationship between exposure, sensitivity and adaptive capacity for better flood vulnerability assessment, based on the fuzzy comprehensive evaluation method (FCEM) and coordinated development degree model (CDDM). The approach is organized into three parts: establishment of index system, assessment of exposure, sensitivity and adaptive capacity, and multiple flood vulnerability assessment. Hydrodynamic model and statistical data are employed for the establishment of index system; FCEM is used to evaluate exposure, sensitivity and adaptive capacity; and CDDM is applied to express the relationship of the three components of vulnerability. Six multiple flood vulnerability types and four levels are proposed to assess flood vulnerability from multiple perspectives. Then the approach is applied to assess the spatiality of flood vulnerability in Hainan's eastern area, China. Based on the results of multiple flood vulnerability, a decision-making process for rational allocation of limited resources is proposed and applied to the study area. The study shows that multiple flood vulnerability assessment can evaluate vulnerability more completely, and help decision makers learn more information about making decisions in a more comprehensive way. In summary, this study provides a new way for flood vulnerability assessment and disaster prevention decision. Copyright © 2018 Elsevier Ltd. All rights reserved.
Estimation of aboveground biomass in Mediterranean forests by statistical modelling of ASTER fraction images

NASA Astrophysics Data System (ADS)

Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.

2014-09-01

Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
GLISTRboost: Combining Multimodal MRI Segmentation, Registration, and Biophysical Tumor Growth Modeling with Gradient Boosting Machines for Glioma Segmentation.

PubMed

Bakas, Spyridon; Zeng, Ke; Sotiras, Aristeidis; Rathore, Saima; Akbari, Hamed; Gaonkar, Bilwaj; Rozycki, Martin; Pati, Sarthak; Davatzikos, Christos

2016-01-01

We present an approach for segmenting low- and high-grade gliomas in multimodal magnetic resonance imaging volumes. The proposed approach is based on a hybrid generative-discriminative model. Firstly, a generative approach based on an Expectation-Maximization framework that incorporates a glioma growth model is used to segment the brain scans into tumor, as well as healthy tissue labels. Secondly, a gradient boosting multi-class classification scheme is used to refine tumor labels based on information from multiple patients. Lastly, a probabilistic Bayesian strategy is employed to further refine and finalize the tumor segmentation based on patient-specific intensity statistics from the multiple modalities. We evaluated our approach in 186 cases during the training phase of the BRAin Tumor Segmentation (BRATS) 2015 challenge and report promising results. During the testing phase, the algorithm was additionally evaluated in 53 unseen cases, achieving the best performance among the competing methods.
The need and approach for characterization - U.S. air force perspectives on materials state awareness

NASA Astrophysics Data System (ADS)

Aldrin, John C.; Lindgren, Eric A.

2018-04-01

This paper expands on the objective and motivation for NDE-based characterization and includes a discussion of the current approach using model-assisted inversion being pursued within the Air Force Research Laboratory (AFRL). This includes a discussion of the multiple model-based methods that can be used, including physics-based models, deep machine learning, and heuristic approaches. The benefits and drawbacks of each method is reviewed and the potential to integrate multiple methods is discussed. Initial successes are included to highlight the ability to obtain quantitative values of damage. Additional steps remaining to realize this capability with statistical metrics of accuracy are discussed, and how these results can be used to enable probabilistic life management are addressed. The outcome of this initiative will realize the long-term desired capability of NDE methods to provide quantitative characterization to accelerate certification of new materials and enhance life management of engineered systems.
Replicates in high dimensions, with applications to latent variable graphical models.

PubMed

Tan, Kean Ming; Ning, Yang; Witten, Daniela M; Liu, Han

2016-12-01

In classical statistics, much thought has been put into experimental design and data collection. In the high-dimensional setting, however, experimental design has been less of a focus. In this paper, we stress the importance of collecting multiple replicates for each subject in this setting. We consider learning the structure of a graphical model with latent variables, under the assumption that these variables take a constant value across replicates within each subject. By collecting multiple replicates for each subject, we are able to estimate the conditional dependence relationships among the observed variables given the latent variables. To test the null hypothesis of conditional independence between two observed variables, we propose a pairwise decorrelated score test. Theoretical guarantees are established for parameter estimation and for this test. We show that our proposal is able to estimate latent variable graphical models more accurately than some existing proposals, and apply the proposed method to a brain imaging dataset.
Bayesian demography 250 years after Bayes

PubMed Central

Bijak, Jakub; Bryant, John

2016-01-01

Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889
Advanced statistics: linear regression, part I: simple linear regression.

PubMed

Marill, Keith A

2004-01-01

Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
A review of statistical updating methods for clinical prediction models.

PubMed

Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew

2018-01-01

A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.
Robust Real-Time Music Transcription with a Compositional Hierarchical Model.

PubMed

Pesek, Matevž; Leonardis, Aleš; Marolt, Matija

2017-01-01

The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model's structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model's performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Boutilier, Justin J., E-mail: j.boutilier@mail.utoronto.ca; Lee, Taewoo; Craig, Tim

Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and appliedmore » three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR weight prediction methodologies perform comparably to the LR model and can produce clinical quality treatment plans by simultaneously predicting multiple weights that capture trade-offs associated with sparing multiple OARs.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.