A Statistical Test for Comparing Nonnested Covariance Structure Models.
ERIC Educational Resources Information Center
Levy, Roy; Hancock, Gregory R.
While statistical procedures are well known for comparing hierarchically related (nested) covariance structure models, statistical tests for comparing nonhierarchically related (nonnested) models have proven more elusive. While isolated attempts have been made, none exists within the commonly used maximum likelihood estimation framework, thereby…
Can Counter-Gang Models be Applied to Counter ISIS’s Internet Recruitment Campaign
2016-06-10
limitation that exists is the lack of reliable statistics from social media companies in regards to the quantity of ISIS-affiliated sites, which exist on... statistics , they have approximately 320-million monthly active users with thirty-five-plus languages supported and 77 percent of accounts located...Justice and Delinquency Prevention program. For deterrence-based models, the primary point of research is focused deterrence models with emphasis placed
Load Model Verification, Validation and Calibration Framework by Statistical Analysis on Field Data
NASA Astrophysics Data System (ADS)
Jiao, Xiangqing; Liao, Yuan; Nguyen, Thai
2017-11-01
Accurate load models are critical for power system analysis and operation. A large amount of research work has been done on load modeling. Most of the existing research focuses on developing load models, while little has been done on developing formal load model verification and validation (V&V) methodologies or procedures. Most of the existing load model validation is based on qualitative rather than quantitative analysis. In addition, not all aspects of model V&V problem have been addressed by the existing approaches. To complement the existing methods, this paper proposes a novel load model verification and validation framework that can systematically and more comprehensively examine load model's effectiveness and accuracy. Statistical analysis, instead of visual check, quantifies the load model's accuracy, and provides a confidence level of the developed load model for model users. The analysis results can also be used to calibrate load models. The proposed framework can be used as a guidance to systematically examine load models for utility engineers and researchers. The proposed method is demonstrated through analysis of field measurements collected from a utility system.
A nonparametric spatial scan statistic for continuous data.
Jung, Inkyung; Cho, Ho Jin
2015-10-20
Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
Spatial Dynamics and Determinants of County-Level Education Expenditure in China
ERIC Educational Resources Information Center
Gu, Jiafeng
2012-01-01
In this paper, a multivariate spatial autoregressive model of local public education expenditure determination with autoregressive disturbance is developed and estimated. The existence of spatial interdependence is tested using Moran's I statistic and Lagrange multiplier test statistics for both the spatial error and spatial lag models. The full…
Statistical Modeling for Radiation Hardness Assurance
NASA Technical Reports Server (NTRS)
Ladbury, Raymond L.
2014-01-01
We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.
Role of sufficient statistics in stochastic thermodynamics and its implication to sensory adaptation
NASA Astrophysics Data System (ADS)
Matsumoto, Takumi; Sagawa, Takahiro
2018-04-01
A sufficient statistic is a significant concept in statistics, which means a probability variable that has sufficient information required for an inference task. We investigate the roles of sufficient statistics and related quantities in stochastic thermodynamics. Specifically, we prove that for general continuous-time bipartite networks, the existence of a sufficient statistic implies that an informational quantity called the sensory capacity takes the maximum. Since the maximal sensory capacity imposes a constraint that the energetic efficiency cannot exceed one-half, our result implies that the existence of a sufficient statistic is inevitably accompanied by energetic dissipation. We also show that, in a particular parameter region of linear Langevin systems there exists the optimal noise intensity at which the sensory capacity, the information-thermodynamic efficiency, and the total entropy production are optimized at the same time. We apply our general result to a model of sensory adaptation of E. coli and find that the sensory capacity is nearly maximal with experimentally realistic parameters.
Euler equation existence, non-uniqueness and mesh converged statistics
Glimm, James; Sharp, David H.; Lim, Hyunkyung; Kaufman, Ryan; Hu, Wenlin
2015-01-01
We review existence and non-uniqueness results for the Euler equation of fluid flow. These results are placed in the context of physical models and their solutions. Non-uniqueness is in direct conflict with the purpose of practical simulations, so that a mitigating strategy, outlined here, is important. We illustrate these issues in an examination of mesh converged turbulent statistics, with comparison to laboratory experiments. PMID:26261361
A review of statistical updating methods for clinical prediction models.
Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew
2018-01-01
A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.
Lam, Lun Tak; Sun, Yi; Davey, Neil; Adams, Rod; Prapopoulou, Maria; Brown, Marc B; Moss, Gary P
2010-06-01
The aim was to employ Gaussian processes to assess mathematically the nature of a skin permeability dataset and to employ these methods, particularly feature selection, to determine the key physicochemical descriptors which exert the most significant influence on percutaneous absorption, and to compare such models with established existing models. Gaussian processes, including automatic relevance detection (GPRARD) methods, were employed to develop models of percutaneous absorption that identified key physicochemical descriptors of percutaneous absorption. Using MatLab software, the statistical performance of these models was compared with single linear networks (SLN) and quantitative structure-permeability relationships (QSPRs). Feature selection methods were used to examine in more detail the physicochemical parameters used in this study. A range of statistical measures to determine model quality were used. The inherently nonlinear nature of the skin data set was confirmed. The Gaussian process regression (GPR) methods yielded predictive models that offered statistically significant improvements over SLN and QSPR models with regard to predictivity (where the rank order was: GPR > SLN > QSPR). Feature selection analysis determined that the best GPR models were those that contained log P, melting point and the number of hydrogen bond donor groups as significant descriptors. Further statistical analysis also found that great synergy existed between certain parameters. It suggested that a number of the descriptors employed were effectively interchangeable, thus questioning the use of models where discrete variables are output, usually in the form of an equation. The use of a nonlinear GPR method produced models with significantly improved predictivity, compared with SLN or QSPR models. Feature selection methods were able to provide important mechanistic information. However, it was also shown that significant synergy existed between certain parameters, and as such it was possible to interchange certain descriptors (i.e. molecular weight and melting point) without incurring a loss of model quality. Such synergy suggested that a model constructed from discrete terms in an equation may not be the most appropriate way of representing mechanistic understandings of skin absorption.
Model Uncertainty Quantification Methods In Data Assimilation
NASA Astrophysics Data System (ADS)
Pathiraja, S. D.; Marshall, L. A.; Sharma, A.; Moradkhani, H.
2017-12-01
Data Assimilation involves utilising observations to improve model predictions in a seamless and statistically optimal fashion. Its applications are wide-ranging; from improving weather forecasts to tracking targets such as in the Apollo 11 mission. The use of Data Assimilation methods in high dimensional complex geophysical systems is an active area of research, where there exists many opportunities to enhance existing methodologies. One of the central challenges is in model uncertainty quantification; the outcome of any Data Assimilation study is strongly dependent on the uncertainties assigned to both observations and models. I focus on developing improved model uncertainty quantification methods that are applicable to challenging real world scenarios. These include developing methods for cases where the system states are only partially observed, where there is little prior knowledge of the model errors, and where the model error statistics are likely to be highly non-Gaussian.
Monte Carlo based statistical power analysis for mediation models: methods and software.
Zhang, Zhiyong
2014-12-01
The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.
A statistical approach to estimate O3 uptake of ponderosa pine in a mediterranean climate
N.E. Grulke; H.K. Preisler; C.C. Fan; W.A. Retzlaff
2002-01-01
In highly polluted sites, stomatal behavior is sluggish with respect to light, vapor pressure deficit, and internal CO2 concentration (Ci) and poorly described by existing models. Statistical models were developed to estimate stomatal conductance (gs) of 40-year-old ponderosa pine at three sites differing in pollutant exposure for the purpose of...
A hierarchical fire frequency model to simulate temporal patterns of fire regimes in LANDIS
Jian Yang; Hong S. He; Eric J. Gustafson
2004-01-01
Fire disturbance has important ecological effects in many forest landscapes. Existing statistically based approaches can be used to examine the effects of a fire regime on forest landscape dynamics. Most examples of statistically based fire models divide a fire occurrence into two stages--fire ignition and fire initiation. However, the exponential and Weibull fire-...
ERIC Educational Resources Information Center
Savalei, Victoria
2010-01-01
Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…
Binomial test statistics using Psi functions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowman, Kimiko o
2007-01-01
For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.
Multiple outcomes are often measured on each experimental unit in toxicology experiments. These multiple observations typically imply the existence of correlation between endpoints, and a statistical analysis that incorporates it may result in improved inference. When both disc...
NASA Astrophysics Data System (ADS)
Ghotbi, Saba; Sotoudeheian, Saeed; Arhami, Mohammad
2016-09-01
Satellite remote sensing products of AOD from MODIS along with appropriate meteorological parameters were used to develop statistical models and estimate ground-level PM10. Most of previous studies obtained meteorological data from synoptic weather stations, with rather sparse spatial distribution, and used it along with 10 km AOD product to develop statistical models, applicable for PM variations in regional scale (resolution of ≥10 km). In the current study, meteorological parameters were simulated with 3 km resolution using WRF model and used along with the rather new 3 km AOD product (launched in 2014). The resulting PM statistical models were assessed for a polluted and largely variable urban area, Tehran, Iran. Despite the critical particulate pollution problem, very few PM studies were conducted in this area. The issue of rather poor direct PM-AOD associations existed, due to different factors such as variations in particles optical properties, in addition to bright background issue for satellite data, as the studied area located in the semi-arid areas of Middle East. Statistical approach of linear mixed effect (LME) was used, and three types of statistical models including single variable LME model (using AOD as independent variable) and multiple variables LME model by using meteorological data from two sources, WRF model and synoptic stations, were examined. Meteorological simulations were performed using a multiscale approach and creating an appropriate physic for the studied region, and the results showed rather good agreements with recordings of the synoptic stations. The single variable LME model was able to explain about 61%-73% of daily PM10 variations, reflecting a rather acceptable performance. Statistical models performance improved through using multivariable LME and incorporating meteorological data as auxiliary variables, particularly by using fine resolution outputs from WRF (R2 = 0.73-0.81). In addition, rather fine resolution for PM estimates was mapped for the studied city, and resulting concentration maps were consistent with PM recordings at the existing stations.
The epistemological status of general circulation models
NASA Astrophysics Data System (ADS)
Loehle, Craig
2018-03-01
Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.
NASA Astrophysics Data System (ADS)
Cheng, Meng; Tantivasadakarn, Nathanan; Wang, Chenjie
2018-01-01
We study Abelian braiding statistics of loop excitations in three-dimensional gauge theories with fermionic particles and the closely related problem of classifying 3D fermionic symmetry-protected topological (FSPT) phases with unitary symmetries. It is known that the two problems are related by turning FSPT phases into gauge theories through gauging the global symmetry of the former. We show that there exist certain types of Abelian loop braiding statistics that are allowed only in the presence of fermionic particles, which correspond to 3D "intrinsic" FSPT phases, i.e., those that do not stem from bosonic SPT phases. While such intrinsic FSPT phases are ubiquitous in 2D systems and in 3D systems with antiunitary symmetries, their existence in 3D systems with unitary symmetries was not confirmed previously due to the fact that strong interaction is necessary to realize them. We show that the simplest unitary symmetry to support 3D intrinsic FSPT phases is Z2×Z4. To establish the results, we first derive a complete set of physical constraints on Abelian loop braiding statistics. Solving the constraints, we obtain all possible Abelian loop braiding statistics in 3D gauge theories, including those that correspond to intrinsic FSPT phases. Then, we construct exactly soluble state-sum models to realize the loop braiding statistics. These state-sum models generalize the well-known Crane-Yetter and Dijkgraaf-Witten models.
NASA Technical Reports Server (NTRS)
Bremner, Paul G.; Vazquez, Gabriel; Christiano, Daniel J.; Trout, Dawn H.
2016-01-01
Prediction of the maximum expected electromagnetic pick-up of conductors inside a realistic shielding enclosure is an important canonical problem for system-level EMC design of space craft, launch vehicles, aircraft and automobiles. This paper introduces a simple statistical power balance model for prediction of the maximum expected current in a wire conductor inside an aperture enclosure. It calculates both the statistical mean and variance of the immission from the physical design parameters of the problem. Familiar probability density functions can then be used to predict the maximum expected immission for deign purposes. The statistical power balance model requires minimal EMC design information and solves orders of magnitude faster than existing numerical models, making it ultimately viable for scaled-up, full system-level modeling. Both experimental test results and full wave simulation results are used to validate the foundational model.
Autoregressive statistical pattern recognition algorithms for damage detection in civil structures
NASA Astrophysics Data System (ADS)
Yao, Ruigen; Pakzad, Shamim N.
2012-08-01
Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.
Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.; O'Brien, Grady M.
2004-01-01
We develop a new observation‐prediction (OPR) statistic for evaluating the importance of system state observations to model predictions. The OPR statistic measures the change in prediction uncertainty produced when an observation is added to or removed from an existing monitoring network, and it can be used to guide refinement and enhancement of the network. Prediction uncertainty is approximated using a first‐order second‐moment method. We apply the OPR statistic to a model of the Death Valley regional groundwater flow system (DVRFS) to evaluate the importance of existing and potential hydraulic head observations to predicted advective transport paths in the saturated zone underlying Yucca Mountain and underground testing areas on the Nevada Test Site. Important existing observations tend to be far from the predicted paths, and many unimportant observations are in areas of high observation density. These results can be used to select locations at which increased observation accuracy would be beneficial and locations that could be removed from the network. Important potential observations are mostly in areas of high hydraulic gradient far from the paths. Results for both existing and potential observations are related to the flow system dynamics and coarse parameter zonation in the DVRFS model. If system properties in different locations are as similar as the zonation assumes, then the OPR results illustrate a data collection opportunity whereby observations in distant, high‐gradient areas can provide information about properties in flatter‐gradient areas near the paths. If this similarity is suspect, then the analysis produces a different type of data collection opportunity involving testing of model assumptions critical to the OPR results.
NASA Astrophysics Data System (ADS)
McCray, Wilmon Wil L., Jr.
The research was prompted by a need to conduct a study that assesses process improvement, quality management and analytical techniques taught to students in U.S. colleges and universities undergraduate and graduate systems engineering and the computing science discipline (e.g., software engineering, computer science, and information technology) degree programs during their academic training that can be applied to quantitatively manage processes for performance. Everyone involved in executing repeatable processes in the software and systems development lifecycle processes needs to become familiar with the concepts of quantitative management, statistical thinking, process improvement methods and how they relate to process-performance. Organizations are starting to embrace the de facto Software Engineering Institute (SEI) Capability Maturity Model Integration (CMMI RTM) Models as process improvement frameworks to improve business processes performance. High maturity process areas in the CMMI model imply the use of analytical, statistical, quantitative management techniques, and process performance modeling to identify and eliminate sources of variation, continually improve process-performance; reduce cost and predict future outcomes. The research study identifies and provides a detail discussion of the gap analysis findings of process improvement and quantitative analysis techniques taught in U.S. universities systems engineering and computing science degree programs, gaps that exist in the literature, and a comparison analysis which identifies the gaps that exist between the SEI's "healthy ingredients " of a process performance model and courses taught in U.S. universities degree program. The research also heightens awareness that academicians have conducted little research on applicable statistics and quantitative techniques that can be used to demonstrate high maturity as implied in the CMMI models. The research also includes a Monte Carlo simulation optimization model and dashboard that demonstrates the use of statistical methods, statistical process control, sensitivity analysis, quantitative and optimization techniques to establish a baseline and predict future customer satisfaction index scores (outcomes). The American Customer Satisfaction Index (ACSI) model and industry benchmarks were used as a framework for the simulation model.
75 FR 16202 - Notice of Issuance of Regulatory Guide
Federal Register 2010, 2011, 2012, 2013, 2014
2010-03-31
..., Revision 2, ``An Acceptable Model and Related Statistical Methods for the Analysis of Fuel Densification.... Introduction The U.S. Nuclear Regulatory Commission (NRC) is issuing a revision to an existing guide in the... nuclear power reactors. To meet these objectives, the guide describes statistical methods related to...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loubenets, Elena R.
We prove the existence for each Hilbert space of the two new quasi hidden variable (qHV) models, statistically noncontextual and context-invariant, reproducing all the von Neumann joint probabilities via non-negative values of real-valued measures and all the quantum product expectations—via the qHV (classical-like) average of the product of the corresponding random variables. In a context-invariant model, a quantum observable X can be represented by a variety of random variables satisfying the functional condition required in quantum foundations but each of these random variables equivalently models X under all joint von Neumann measurements, regardless of their contexts. The proved existence ofmore » this model negates the general opinion that, in terms of random variables, the Hilbert space description of all the joint von Neumann measurements for dimH≥3 can be reproduced only contextually. The existence of a statistically noncontextual qHV model, in particular, implies that every N-partite quantum state admits a local quasi hidden variable model introduced in Loubenets [J. Math. Phys. 53, 022201 (2012)]. The new results of the present paper point also to the generality of the quasi-classical probability model proposed in Loubenets [J. Phys. A: Math. Theor. 45, 185306 (2012)].« less
ERIC Educational Resources Information Center
Lin, Tony; Erfan, Sasan
2016-01-01
Mathematical modeling is an open-ended research subject where no definite answers exist for any problem. Math modeling enables thinking outside the box to connect different fields of studies together including statistics, algebra, calculus, matrices, programming and scientific writing. As an integral part of society, it is the foundation for many…
A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks
Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng
2009-01-01
Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885
Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong
2013-01-01
As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
2002-06-01
fits our actual data . To determine the goodness of fit, statisticians typically use the following four measures: R2 Statistic. The R2 statistic...reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of...mathematical model is developed to better estimate cleanup costs using historical cost data that could be used by the Defense Department prior to placing
Tonkin, Matthew J.; Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.
2007-01-01
The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one or more parameters is added.
Data-driven non-Markovian closure models
NASA Astrophysics Data System (ADS)
Kondrashov, Dmitri; Chekroun, Mickaël D.; Ghil, Michael
2015-03-01
This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori-Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka-Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model's parameter space and the existence of multiple attractor basins with fractal boundaries. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up.
Statistics and Style. Mathematical Linguistics and Automatic Language Processing No. 6.
ERIC Educational Resources Information Center
Dolezel, Lubomir, Ed.; Bailey, Richard W., Ed.
This collection of 17 articles concerning the application of mathematical models and techniques to the study of literary style is an attempt to overcome the communication barriers that exist between scholars in the various fields that find their meeting ground in statistical stylistics. The articles selected were chosen to represent the best…
Assessment of credit risk based on fuzzy relations
NASA Astrophysics Data System (ADS)
Tsabadze, Teimuraz
2017-06-01
The purpose of this paper is to develop a new approach for an assessment of the credit risk to corporate borrowers. There are different models for borrowers' risk assessment. These models are divided into two groups: statistical and theoretical. When assessing the credit risk for corporate borrowers, statistical model is unacceptable due to the lack of sufficiently large history of defaults. At the same time, we cannot use some theoretical models due to the lack of stock exchange. In those cases, when studying a particular borrower given that statistical base does not exist, the decision-making process is always of expert nature. The paper describes a new approach that may be used in group decision-making. An example of the application of the proposed approach is given.
Statistical estimation of femur micro-architecture using optimal shape and density predictors.
Lekadir, Karim; Hazrati-Marangalou, Javad; Hoogendoorn, Corné; Taylor, Zeike; van Rietbergen, Bert; Frangi, Alejandro F
2015-02-26
The personalization of trabecular micro-architecture has been recently shown to be important in patient-specific biomechanical models of the femur. However, high-resolution in vivo imaging of bone micro-architecture using existing modalities is still infeasible in practice due to the associated acquisition times, costs, and X-ray radiation exposure. In this study, we describe a statistical approach for the prediction of the femur micro-architecture based on the more easily extracted subject-specific bone shape and mineral density information. To this end, a training sample of ex vivo micro-CT images is used to learn the existing statistical relationships within the low and high resolution image data. More specifically, optimal bone shape and mineral density features are selected based on their predictive power and used within a partial least square regression model to estimate the unknown trabecular micro-architecture within the anatomical models of new subjects. The experimental results demonstrate the accuracy of the proposed approach, with average errors of 0.07 for both the degree of anisotropy and tensor norms. Copyright © 2015 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Enders, Craig K.
2005-01-01
The Bollen-Stine bootstrap can be used to correct for standard error and fit statistic bias that occurs in structural equation modeling (SEM) applications due to nonnormal data. The purpose of this article is to demonstrate the use of a custom SAS macro program that can be used to implement the Bollen-Stine bootstrap with existing SEM software.…
Research participant compensation: A matter of statistical inference as well as ethics.
Swanson, David M; Betensky, Rebecca A
2015-11-01
The ethics of compensation of research subjects for participation in clinical trials has been debated for years. One ethical issue of concern is variation among subjects in the level of compensation for identical treatments. Surprisingly, the impact of variation on the statistical inferences made from trial results has not been examined. We seek to identify how variation in compensation may influence any existing dependent censoring in clinical trials, thereby also influencing inference about the survival curve, hazard ratio, or other measures of treatment efficacy. In simulation studies, we consider a model for how compensation structure may influence the censoring model. Under existing dependent censoring, we estimate survival curves under different compensation structures and observe how these structures induce variability in the estimates. We show through this model that if the compensation structure affects the censoring model and dependent censoring is present, then variation in that structure induces variation in the estimates and affects the accuracy of estimation and inference on treatment efficacy. From the perspectives of both ethics and statistical inference, standardization and transparency in the compensation of participants in clinical trials are warranted. Copyright © 2015 Elsevier Inc. All rights reserved.
Modeling Group Interactions via Open Data Sources
2011-08-30
data. The state-of-art search engines are designed to help general query-specific search and not suitable for finding disconnected online groups. The...groups, (2) developing innovative mathematical and statistical models and efficient algorithms that leverage existing search engines and employ
Ambler, Graeme K; Gohel, Manjit S; Mitchell, David C; Loftus, Ian M; Boyle, Jonathan R
2015-01-01
Accurate adjustment of surgical outcome data for risk is vital in an era of surgeon-level reporting. Current risk prediction models for abdominal aortic aneurysm (AAA) repair are suboptimal. We aimed to develop a reliable risk model for in-hospital mortality after intervention for AAA, using rigorous contemporary statistical techniques to handle missing data. Using data collected during a 15-month period in the United Kingdom National Vascular Database, we applied multiple imputation methodology together with stepwise model selection to generate preoperative and perioperative models of in-hospital mortality after AAA repair, using two thirds of the available data. Model performance was then assessed on the remaining third of the data by receiver operating characteristic curve analysis and compared with existing risk prediction models. Model calibration was assessed by Hosmer-Lemeshow analysis. A total of 8088 AAA repair operations were recorded in the National Vascular Database during the study period, of which 5870 (72.6%) were elective procedures. Both preoperative and perioperative models showed excellent discrimination, with areas under the receiver operating characteristic curve of .89 and .92, respectively. This was significantly better than any of the existing models (area under the receiver operating characteristic curve for best comparator model, .84 and .88; P < .001 and P = .001, respectively). Discrimination remained excellent when only elective procedures were considered. There was no evidence of miscalibration by Hosmer-Lemeshow analysis. We have developed accurate models to assess risk of in-hospital mortality after AAA repair. These models were carefully developed with rigorous statistical methodology and significantly outperform existing methods for both elective cases and overall AAA mortality. These models will be invaluable for both preoperative patient counseling and accurate risk adjustment of published outcome data. Copyright © 2015 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
Modeling Human-Computer Decision Making with Covariance Structure Analysis.
ERIC Educational Resources Information Center
Coovert, Michael D.; And Others
Arguing that sufficient theory exists about the interplay between human information processing, computer systems, and the demands of various tasks to construct useful theories of human-computer interaction, this study presents a structural model of human-computer interaction and reports the results of various statistical analyses of this model.…
An Algebraic Implicitization and Specialization of Minimum KL-Divergence Models
NASA Astrophysics Data System (ADS)
Dukkipati, Ambedkar; Manathara, Joel George
In this paper we study representation of KL-divergence minimization, in the cases where integer sufficient statistics exists, using tools from polynomial algebra. We show that the estimation of parametric statistical models in this case can be transformed to solving a system of polynomial equations. In particular, we also study the case of Kullback-Csisźar iteration scheme. We present implicit descriptions of these models and show that implicitization preserves specialization of prior distribution. This result leads us to a Gröbner bases method to compute an implicit representation of minimum KL-divergence models.
Alanazi, Hamdan O; Abdullah, Abdul Hanan; Qureshi, Kashif Naseer
2017-04-01
Recently, Artificial Intelligence (AI) has been used widely in medicine and health care sector. In machine learning, the classification or prediction is a major field of AI. Today, the study of existing predictive models based on machine learning methods is extremely active. Doctors need accurate predictions for the outcomes of their patients' diseases. In addition, for accurate predictions, timing is another significant factor that influences treatment decisions. In this paper, existing predictive models in medicine and health care have critically reviewed. Furthermore, the most famous machine learning methods have explained, and the confusion between a statistical approach and machine learning has clarified. A review of related literature reveals that the predictions of existing predictive models differ even when the same dataset is used. Therefore, existing predictive models are essential, and current methods must be improved.
Wu, Hao
2018-05-01
In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.
Study of subgrid-scale velocity models for reacting and nonreacting flows
NASA Astrophysics Data System (ADS)
Langella, I.; Doan, N. A. K.; Swaminathan, N.; Pope, S. B.
2018-05-01
A study is conducted to identify advantages and limitations of existing large-eddy simulation (LES) closures for the subgrid-scale (SGS) kinetic energy using a database of direct numerical simulations (DNS). The analysis is conducted for both reacting and nonreacting flows, different turbulence conditions, and various filter sizes. A model, based on dissipation and diffusion of momentum (LD-D model), is proposed in this paper based on the observed behavior of four existing models. Our model shows the best overall agreements with DNS statistics. Two main investigations are conducted for both reacting and nonreacting flows: (i) an investigation on the robustness of the model constants, showing that commonly used constants lead to a severe underestimation of the SGS kinetic energy and enlightening their dependence on Reynolds number and filter size; and (ii) an investigation on the statistical behavior of the SGS closures, which suggests that the dissipation of momentum is the key parameter to be considered in such closures and that dilatation effect is important and must be captured correctly in reacting flows. Additional properties of SGS kinetic energy modeling are identified and discussed.
Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System
NASA Astrophysics Data System (ADS)
Demirdjian, L.; Zhou, Y.; Huffman, G. J.
2016-12-01
This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T
2016-05-01
Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
Serang, Oliver; Noble, William Stafford
2012-01-01
The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862
Data-based Non-Markovian Model Inference
NASA Astrophysics Data System (ADS)
Ghil, Michael
2015-04-01
This talk concentrates on obtaining stable and efficient data-based models for simulation and prediction in the geosciences and life sciences. The proposed model derivation relies on using a multivariate time series of partial observations from a large-dimensional system, and the resulting low-order models are compared with the optimal closures predicted by the non-Markovian Mori-Zwanzig formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a very broad generalization and a time-continuous limit of existing multilevel, regression-based approaches to data-based closure, in particular of empirical model reduction (EMR). We show that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the Mori-Zwanzig formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are given for the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a very broad class of MSM applications. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. The resulting reduced model with energy-conserving nonlinearities captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lokta-Volterra model of population dynamics in its chaotic regime. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up. This work is based on a close collaboration with M.D. Chekroun, D. Kondrashov, S. Kravtsov and A.W. Robertson.
Miao, Hui; Hartman, Mikael; Bhoo-Pathy, Nirmala; Lee, Soo-Chin; Taib, Nur Aishah; Tan, Ern-Yu; Chan, Patrick; Moons, Karel G M; Wong, Hoong-Seam; Goh, Jeremy; Rahim, Siti Mastura; Yip, Cheng-Har; Verkooijen, Helena M
2014-01-01
In Asia, up to 25% of breast cancer patients present with distant metastases at diagnosis. Given the heterogeneous survival probabilities of de novo metastatic breast cancer, individual outcome prediction is challenging. The aim of the study is to identify existing prognostic models for patients with de novo metastatic breast cancer and validate them in Asia. We performed a systematic review to identify prediction models for metastatic breast cancer. Models were validated in 642 women with de novo metastatic breast cancer registered between 2000 and 2010 in the Singapore Malaysia Hospital Based Breast Cancer Registry. Survival curves for low, intermediate and high-risk groups according to each prognostic score were compared by log-rank test and discrimination of the models was assessed by concordance statistic (C-statistic). We identified 16 prediction models, seven of which were for patients with brain metastases only. Performance status, estrogen receptor status, metastatic site(s) and disease-free interval were the most common predictors. We were able to validate nine prediction models. The capacity of the models to discriminate between poor and good survivors varied from poor to fair with C-statistics ranging from 0.50 (95% CI, 0.48-0.53) to 0.63 (95% CI, 0.60-0.66). The discriminatory performance of existing prediction models for de novo metastatic breast cancer in Asia is modest. Development of an Asian-specific prediction model is needed to improve prognostication and guide decision making.
Noninformative prior in the quantum statistical model of pure states
NASA Astrophysics Data System (ADS)
Tanaka, Fuyuhiko
2012-06-01
In the present paper, we consider a suitable definition of a noninformative prior on the quantum statistical model of pure states. While the full pure-states model is invariant under unitary rotation and admits the Haar measure, restricted models, which we often see in quantum channel estimation and quantum process tomography, have less symmetry and no compelling rationale for any choice. We adopt a game-theoretic approach that is applicable to classical Bayesian statistics and yields a noninformative prior for a general class of probability distributions. We define the quantum detection game and show that there exist noninformative priors for a general class of a pure-states model. Theoretically, it gives one of the ways that we represent ignorance on the given quantum system with partial information. Practically, our method proposes a default distribution on the model in order to use the Bayesian technique in the quantum-state tomography with a small sample.
Risk prediction models of breast cancer: a systematic review of model performances.
Anothaisintawee, Thunyarat; Teerawattananon, Yot; Wiratkapun, Chollathip; Kasamesup, Vijj; Thakkinstian, Ammarin
2012-05-01
The number of risk prediction models has been increasingly developed, for estimating about breast cancer in individual women. However, those model performances are questionable. We therefore have conducted a study with the aim to systematically review previous risk prediction models. The results from this review help to identify the most reliable model and indicate the strengths and weaknesses of each model for guiding future model development. We searched MEDLINE (PubMed) from 1949 and EMBASE (Ovid) from 1974 until October 2010. Observational studies which constructed models using regression methods were selected. Information about model development and performance were extracted. Twenty-five out of 453 studies were eligible. Of these, 18 developed prediction models and 7 validated existing prediction models. Up to 13 variables were included in the models and sample sizes for each study ranged from 550 to 2,404,636. Internal validation was performed in four models, while five models had external validation. Gail and Rosner and Colditz models were the significant models which were subsequently modified by other scholars. Calibration performance of most models was fair to good (expected/observe ratio: 0.87-1.12), but discriminatory accuracy was poor to fair both in internal validation (concordance statistics: 0.53-0.66) and in external validation (concordance statistics: 0.56-0.63). Most models yielded relatively poor discrimination in both internal and external validation. This poor discriminatory accuracy of existing models might be because of a lack of knowledge about risk factors, heterogeneous subtypes of breast cancer, and different distributions of risk factors across populations. In addition the concordance statistic itself is insensitive to measure the improvement of discrimination. Therefore, the new method such as net reclassification index should be considered to evaluate the improvement of the performance of a new develop model.
ERIC Educational Resources Information Center
Lachaud, Christian Michel; Renaud, Olivier
2011-01-01
This tutorial for the statistical processing of reaction times collected through a repeated-measure design is addressed to researchers in psychology. It aims at making explicit some important methodological issues, at orienting researchers to the existing solutions, and at providing them some evaluation tools for choosing the most robust and…
Applications of statistical physics and information theory to the analysis of DNA sequences
NASA Astrophysics Data System (ADS)
Grosse, Ivo
2000-10-01
DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Colegrave, Nick
2017-01-01
A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure. PMID:28330912
MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data
Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong
2015-01-01
Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201
Galaxy mergers and gravitational lens statistics
NASA Technical Reports Server (NTRS)
Rix, Hans-Walter; Maoz, Dan; Turner, Edwin L.; Fukugita, Masataka
1994-01-01
We investigate the impact of hierarchical galaxy merging on the statistics of gravitational lensing of distant sources. Since no definite theoretical predictions for the merging history of luminous galaxies exist, we adopt a parameterized prescription, which allows us to adjust the expected number of pieces comprising a typical present galaxy at z approximately 0.65. The existence of global parameter relations for elliptical galaxies and constraints on the evolution of the phase space density in dissipationless mergers, allow us to limit the possible evolution of galaxy lens properties under merging. We draw two lessons from implementing this lens evolution into statistical lens calculations: (1) The total optical depth to multiple imaging (e.g., of quasars) is quite insensitive to merging. (2) Merging leads to a smaller mean separation of observed multiple images. Because merging does not reduce drastically the expected lensing frequency, it cannot make lambda-dominated cosmologies compatible with the existing lensing observations. A comparison with the data from the Hubble Space Telescope (HST) Snapshot Survey shows that models with little or no evolution of the lens population are statistically favored over strong merging scenarios. A specific merging scenario proposed to Toomre can be rejected (95% level) by such a comparison. Some versions of the scenario proposed by Broadhurst, Ellis, & Glazebrook are statistically acceptable.
Simulation and assimilation of satellite altimeter data at the oceanic mesoscale
NASA Technical Reports Server (NTRS)
Demay, P.; Robinson, A. R.
1984-01-01
An improved "objective analysis' technique is used along with an altimeter signal statistical model, an altimeter noise statistical model, an orbital model, and synoptic surface current maps in the POLYMODE-SDE area, to evaluate the performance of various observational strategies in catching the mesoscale variability at mid-latitudes. In particular, simulated repetitive nominal orbits of ERS-1, TOPEX, and SPOT/POSEIDON are examined. Results show the critical importance of existence of a subcycle, scanning in either direction. Moreover, long repeat cycles ( 20 days) and short cross-track distances ( 300 km) seem preferable, since they match mesoscale statistics. Another goal of the study is to prepare and discuss sea-surface height (SSH) assimilation in quasigeostrophic models. Restored SSH maps are shown to meet that purpose, if an efficient extrapolation method or deep in-situ data (floats) are used on the vertical to start and update the model.
NASA Technical Reports Server (NTRS)
Forbes, G. S.; Pielke, R. A.
1985-01-01
Various empirical and statistical weather-forecasting studies which utilize stratification by weather regime are described. Objective classification was used to determine weather regime in some studies. In other cases the weather pattern was determined on the basis of a parameter representing the physical and dynamical processes relevant to the anticipated mesoscale phenomena, such as low level moisture convergence and convective precipitation, or the Froude number and the occurrence of cold-air damming. For mesoscale phenomena already in existence, new forecasting techniques were developed. The use of cloud models in operational forecasting is discussed. Models to calculate the spatial scales of forcings and resultant response for mesoscale systems are presented. The use of these models to represent the climatologically most prevalent systems, and to perform case-by-case simulations is reviewed. Operational implementation of mesoscale data into weather forecasts, using both actual simulation output and method-output statistics is discussed.
Incorporating signal-dependent noise for hyperspectral target detection
NASA Astrophysics Data System (ADS)
Morman, Christopher J.; Meola, Joseph
2015-05-01
The majority of hyperspectral target detection algorithms are developed from statistical data models employing stationary background statistics or white Gaussian noise models. Stationary background models are inaccurate as a result of two separate physical processes. First, varying background classes often exist in the imagery that possess different clutter statistics. Many algorithms can account for this variability through the use of subspaces or clustering techniques. The second physical process, which is often ignored, is a signal-dependent sensor noise term. For photon counting sensors that are often used in hyperspectral imaging systems, sensor noise increases as the measured signal level increases as a result of Poisson random processes. This work investigates the impact of this sensor noise on target detection performance. A linear noise model is developed describing sensor noise variance as a linear function of signal level. The linear noise model is then incorporated for detection of targets using data collected at Wright Patterson Air Force Base.
2012-06-01
generalized behavioral model characterized after the fictional Seldon equations (the one elaborated upon by Isaac Asimov in the 1951 novel, The...Foundation). Asimov described the Seldon equations as essentially statistical models with historical data of a sufficient size and variability that they
Schaid, Daniel J
2010-01-01
Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
FGWAS: Functional genome wide association analysis.
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-10-01
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Applications of spatial statistical network models to stream data
Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal
2014-01-01
Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.
Statistical significance of the rich-club phenomenon in complex networks
NASA Astrophysics Data System (ADS)
Jiang, Zhi-Qiang; Zhou, Wei-Xing
2008-04-01
We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.
Miao, Hui; Hartman, Mikael; Bhoo-Pathy, Nirmala; Lee, Soo-Chin; Taib, Nur Aishah; Tan, Ern-Yu; Chan, Patrick; Moons, Karel G. M.; Wong, Hoong-Seam; Goh, Jeremy; Rahim, Siti Mastura; Yip, Cheng-Har; Verkooijen, Helena M.
2014-01-01
Background In Asia, up to 25% of breast cancer patients present with distant metastases at diagnosis. Given the heterogeneous survival probabilities of de novo metastatic breast cancer, individual outcome prediction is challenging. The aim of the study is to identify existing prognostic models for patients with de novo metastatic breast cancer and validate them in Asia. Materials and Methods We performed a systematic review to identify prediction models for metastatic breast cancer. Models were validated in 642 women with de novo metastatic breast cancer registered between 2000 and 2010 in the Singapore Malaysia Hospital Based Breast Cancer Registry. Survival curves for low, intermediate and high-risk groups according to each prognostic score were compared by log-rank test and discrimination of the models was assessed by concordance statistic (C-statistic). Results We identified 16 prediction models, seven of which were for patients with brain metastases only. Performance status, estrogen receptor status, metastatic site(s) and disease-free interval were the most common predictors. We were able to validate nine prediction models. The capacity of the models to discriminate between poor and good survivors varied from poor to fair with C-statistics ranging from 0.50 (95% CI, 0.48–0.53) to 0.63 (95% CI, 0.60–0.66). Conclusion The discriminatory performance of existing prediction models for de novo metastatic breast cancer in Asia is modest. Development of an Asian-specific prediction model is needed to improve prognostication and guide decision making. PMID:24695692
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolly, S; Chen, H; Mutic, S
Purpose: A persistent challenge for the quality assessment of radiation therapy treatments (e.g. contouring accuracy) is the absence of the known, ground truth for patient data. Moreover, assessment results are often patient-dependent. Computer simulation studies utilizing numerical phantoms can be performed for quality assessment with a known ground truth. However, previously reported numerical phantoms do not include the statistical properties of inter-patient variations, as their models are based on only one patient. In addition, these models do not incorporate tumor data. In this study, a methodology was developed for generating numerical phantoms which encapsulate the statistical variations of patients withinmore » radiation therapy, including tumors. Methods: Based on previous work in contouring assessment, geometric attribute distribution (GAD) models were employed to model both the deterministic and stochastic properties of individual organs via principle component analysis. Using pre-existing radiation therapy contour data, the GAD models are trained to model the shape and centroid distributions of each organ. Then, organs with different shapes and positions can be generated by assigning statistically sound weights to the GAD model parameters. Organ contour data from 20 retrospective prostate patient cases were manually extracted and utilized to train the GAD models. As a demonstration, computer-simulated CT images of generated numerical phantoms were calculated and assessed subjectively and objectively for realism. Results: A cohort of numerical phantoms of the male human pelvis was generated. CT images were deemed realistic both subjectively and objectively in terms of image noise power spectrum. Conclusion: A methodology has been developed to generate realistic numerical anthropomorphic phantoms using pre-existing radiation therapy data. The GAD models guarantee that generated organs span the statistical distribution of observed radiation therapy patients, according to the training dataset. The methodology enables radiation therapy treatment assessment with multi-modality imaging and a known ground truth, and without patient-dependent bias.« less
Jasińska-Stroschein, Magdalena; Kurczewska, Urszula; Orszulak-Michalak, Daria
2017-05-01
When performing in vitro dissolution testing, especially in the area of biowaivers, it is necessary to follow regulatory guidelines to minimize the risk of an unsafe or ineffective product being approved. The present study examines model-independent and model-dependent methods of comparing dissolution profiles based on various compared and contrasted international guidelines. Dissolution profiles for immediate release solid oral dosage forms were generated. The test material comprised tablets containing several substances, with at least 85% of the labeled amount dissolved within 15 min, 20-30 min, or 45 min. Dissolution profile similarity can vary with regard to the following criteria: time point selection (including the last time point), coefficient of variation, and statistical method selection. Variation between regulatory guidance and statistical methods can raise methodological questions and result potentially in a different outcome when reporting dissolution profile testing. The harmonization of existing guidelines would address existing problems concerning the interpretation of regulatory recommendations and research findings. Copyright © 2017 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
Deep space network software cost estimation model
NASA Technical Reports Server (NTRS)
Tausworthe, R. C.
1981-01-01
A parametric software cost estimation model prepared for Jet PRopulsion Laboratory (JPL) Deep Space Network (DSN) Data System implementation tasks is described. The resource estimation mdel modifies and combines a number of existing models. The model calibrates the task magnitude and difficulty, development environment, and software technology effects through prompted responses to a set of approximately 50 questions. Parameters in the model are adjusted to fit JPL software life-cycle statistics.
Approximate Model Checking of PCTL Involving Unbounded Path Properties
NASA Astrophysics Data System (ADS)
Basu, Samik; Ghosh, Arka P.; He, Ru
We study the problem of applying statistical methods for approximate model checking of probabilistic systems against properties encoded as
Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten
2018-01-01
Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.
Statistical Methods for Rapid Aerothermal Analysis and Design Technology: Validation
NASA Technical Reports Server (NTRS)
DePriest, Douglas; Morgan, Carolyn
2003-01-01
The cost and safety goals for NASA s next generation of reusable launch vehicle (RLV) will require that rapid high-fidelity aerothermodynamic design tools be used early in the design cycle. To meet these requirements, it is desirable to identify adequate statistical models that quantify and improve the accuracy, extend the applicability, and enable combined analyses using existing prediction tools. The initial research work focused on establishing suitable candidate models for these purposes. The second phase is focused on assessing the performance of these models to accurately predict the heat rate for a given candidate data set. This validation work compared models and methods that may be useful in predicting the heat rate.
NASA Astrophysics Data System (ADS)
Sundberg, R.; Moberg, A.; Hind, A.
2012-08-01
A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.
Statistical modeling of an integrated boiler for coal fired thermal power plant.
Chandrasekharan, Sreepradha; Panda, Rames Chandra; Swaminathan, Bhuvaneswari Natrajan
2017-06-01
The coal fired thermal power plants plays major role in the power production in the world as they are available in abundance. Many of the existing power plants are based on the subcritical technology which can produce power with the efficiency of around 33%. But the newer plants are built on either supercritical or ultra-supercritical technology whose efficiency can be up to 50%. Main objective of the work is to enhance the efficiency of the existing subcritical power plants to compensate for the increasing demand. For achieving the objective, the statistical modeling of the boiler units such as economizer, drum and the superheater are initially carried out. The effectiveness of the developed models is tested using analysis methods like R 2 analysis and ANOVA (Analysis of Variance). The dependability of the process variable (temperature) on different manipulated variables is analyzed in the paper. Validations of the model are provided with their error analysis. Response surface methodology (RSM) supported by DOE (design of experiments) are implemented to optimize the operating parameters. Individual models along with the integrated model are used to study and design the predictive control of the coal-fired thermal power plant.
Hypothesis testing of a change point during cognitive decline among Alzheimer's disease patients.
Ji, Ming; Xiong, Chengjie; Grundman, Michael
2003-10-01
In this paper, we present a statistical hypothesis test for detecting a change point over the course of cognitive decline among Alzheimer's disease patients. The model under the null hypothesis assumes a constant rate of cognitive decline over time and the model under the alternative hypothesis is a general bilinear model with an unknown change point. When the change point is unknown, however, the null distribution of the test statistics is not analytically tractable and has to be simulated by parametric bootstrap. When the alternative hypothesis that a change point exists is accepted, we propose an estimate of its location based on the Akaike's Information Criterion. We applied our method to a data set from the Neuropsychological Database Initiative by implementing our hypothesis testing method to analyze Mini Mental Status Exam scores based on a random-slope and random-intercept model with a bilinear fixed effect. Our result shows that despite large amount of missing data, accelerated decline did occur for MMSE among AD patients. Our finding supports the clinical belief of the existence of a change point during cognitive decline among AD patients and suggests the use of change point models for the longitudinal modeling of cognitive decline in AD research.
Hybrid statistics-simulations based method for atom-counting from ADF STEM images.
De Wael, Annelies; De Backer, Annick; Jones, Lewys; Nellist, Peter D; Van Aert, Sandra
2017-06-01
A hybrid statistics-simulations based method for atom-counting from annular dark field scanning transmission electron microscopy (ADF STEM) images of monotype crystalline nanostructures is presented. Different atom-counting methods already exist for model-like systems. However, the increasing relevance of radiation damage in the study of nanostructures demands a method that allows atom-counting from low dose images with a low signal-to-noise ratio. Therefore, the hybrid method directly includes prior knowledge from image simulations into the existing statistics-based method for atom-counting, and accounts in this manner for possible discrepancies between actual and simulated experimental conditions. It is shown by means of simulations and experiments that this hybrid method outperforms the statistics-based method, especially for low electron doses and small nanoparticles. The analysis of a simulated low dose image of a small nanoparticle suggests that this method allows for far more reliable quantitative analysis of beam-sensitive materials. Copyright © 2017 Elsevier B.V. All rights reserved.
Predicting fire spread in Arizona's oak chaparral
A. W. Lindenmuth; James R. Davis
1973-01-01
Five existing fire models, both experimental and theoretical, did not adequately predict rate-of-spread (ROS) when tested on single- and multiclump fires in oak chaparral in Arizona. A statistical model developed using essentially the same input variables but weighted differently accounted for 81 percent ofthe variation in ROS. A chemical coefficient that accounts for...
An Investigation of Sample Size Splitting on ATFIND and DIMTEST
ERIC Educational Resources Information Center
Socha, Alan; DeMars, Christine E.
2013-01-01
Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
ERIC Educational Resources Information Center
Goldstein, Harvey; Bonnet, Gerard; Rocher, Thierry
2007-01-01
The Programme for International Student Assessment comparative study of reading performance among 15-year-olds is reanalyzed using statistical procedures that allow the full complexity of the data structures to be explored. The article extends existing multilevel factor analysis and structural equation models and shows how this can extract richer…
Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin
2017-09-01
N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.
Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.
Mørk, Søren; Holmes, Ian
2012-03-01
Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.
This paper provides an overview of existing statistical methodologies for the estimation of site-specific and regional trends in wet deposition. The interaction of atmospheric processes and emissions tend to produce wet deposition data patterns that show large spatial and tempora...
Evaluation and Applications of Cloud Climatologies from CALIOP
NASA Technical Reports Server (NTRS)
Winker, David; Getzewitch, Brian; Vaughan, Mark
2008-01-01
Clouds have a major impact on the Earth radiation budget and differences in the representation of clouds in global climate models are responsible for much of the spread in predicted climate sensitivity. Existing cloud climatologies, against which these models can be tested, have many limitations. The CALIOP lidar, carried on the CALIPSO satellite, has now acquired over two years of nearly continuous cloud and aerosol observations. This dataset provides an improved basis for the characterization of 3-D global cloudiness. Global average cloud cover measured by CALIOP is about 75%, significantly higher than for existing cloud climatologies due to the sensitivity of CALIOP to optically thin cloud. Day/night biases in cloud detection appear to be small. This presentation will discuss detection sensitivity and other issues associated with producing a cloud climatology, characteristics of cloud cover statistics derived from CALIOP data, and applications of those statistics.
McElreath, Richard; Bell, Adrian V; Efferson, Charles; Lubell, Mark; Richerson, Peter J; Waring, Timothy
2008-11-12
The existence of social learning has been confirmed in diverse taxa, from apes to guppies. In order to advance our understanding of the consequences of social transmission and evolution of behaviour, however, we require statistical tools that can distinguish among diverse social learning strategies. In this paper, we advance two main ideas. First, social learning is diverse, in the sense that individuals can take advantage of different kinds of information and combine them in different ways. Examining learning strategies for different information conditions illuminates the more detailed design of social learning. We construct and analyse an evolutionary model of diverse social learning heuristics, in order to generate predictions and illustrate the impact of design differences on an organism's fitness. Second, in order to eventually escape the laboratory and apply social learning models to natural behaviour, we require statistical methods that do not depend upon tight experimental control. Therefore, we examine strategic social learning in an experimental setting in which the social information itself is endogenous to the experimental group, as it is in natural settings. We develop statistical models for distinguishing among different strategic uses of social information. The experimental data strongly suggest that most participants employ a hierarchical strategy that uses both average observed pay-offs of options as well as frequency information, the same model predicted by our evolutionary analysis to dominate a wide range of conditions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sathaye, Jayant A.
2000-04-01
Integrated assessment (IA) modeling of climate policy is increasingly global in nature, with models incorporating regional disaggregation. The existing empirical basis for IA modeling, however, largely arises from research on industrialized economies. Given the growing importance of developing countries in determining long-term global energy and carbon emissions trends, filling this gap with improved statistical information on developing countries' energy and carbon-emissions characteristics is an important priority for enhancing IA modeling. Earlier research at LBNL on this topic has focused on assembling and analyzing statistical data on productivity trends and technological change in the energy-intensive manufacturing sectors of five developing countries,more » India, Brazil, Mexico, Indonesia, and South Korea. The proposed work will extend this analysis to the agriculture and electric power sectors in India, South Korea, and two other developing countries. They will also examine the impact of alternative model specifications on estimates of productivity growth and technological change for each of the three sectors, and estimate the contribution of various capital inputs--imported vs. indigenous, rigid vs. malleable-- in contributing to productivity growth and technological change. The project has already produced a data resource on the manufacturing sector which is being shared with IA modelers. This will be extended to the agriculture and electric power sectors, which would also be made accessible to IA modeling groups seeking to enhance the empirical descriptions of developing country characteristics. The project will entail basic statistical and econometric analysis of productivity and energy trends in these developing country sectors, with parameter estimates also made available to modeling groups. The parameter estimates will be developed using alternative model specifications that could be directly utilized by the existing IAMs for the manufacturing, agriculture, and electric power sectors.« less
General Blending Models for Data From Mixture Experiments
Brown, L.; Donev, A. N.; Bissett, A. C.
2015-01-01
We propose a new class of models providing a powerful unification and extension of existing statistical methodology for analysis of data obtained in mixture experiments. These models, which integrate models proposed by Scheffé and Becker, extend considerably the range of mixture component effects that may be described. They become complex when the studied phenomenon requires it, but remain simple whenever possible. This article has supplementary material online. PMID:26681812
Statistical wind analysis for near-space applications
NASA Astrophysics Data System (ADS)
Roney, Jason A.
2007-09-01
Statistical wind models were developed based on the existing observational wind data for near-space altitudes between 60 000 and 100 000 ft (18 30 km) above ground level (AGL) at two locations, Akon, OH, USA, and White Sands, NM, USA. These two sites are envisioned as playing a crucial role in the first flights of high-altitude airships. The analysis shown in this paper has not been previously applied to this region of the stratosphere for such an application. Standard statistics were compiled for these data such as mean, median, maximum wind speed, and standard deviation, and the data were modeled with Weibull distributions. These statistics indicated, on a yearly average, there is a lull or a “knee” in the wind between 65 000 and 72 000 ft AGL (20 22 km). From the standard statistics, trends at both locations indicated substantial seasonal variation in the mean wind speed at these heights. The yearly and monthly statistical modeling indicated that Weibull distributions were a reasonable model for the data. Forecasts and hindcasts were done by using a Weibull model based on 2004 data and comparing the model with the 2003 and 2005 data. The 2004 distribution was also a reasonable model for these years. Lastly, the Weibull distribution and cumulative function were used to predict the 50%, 95%, and 99% winds, which are directly related to the expected power requirements of a near-space station-keeping airship. These values indicated that using only the standard deviation of the mean may underestimate the operational conditions.
Statistical Model of Dynamic Markers of the Alzheimer's Pathological Cascade.
Balsis, Steve; Geraci, Lisa; Benge, Jared; Lowe, Deborah A; Choudhury, Tabina K; Tirso, Robert; Doody, Rachelle S
2018-05-05
Alzheimer's disease (AD) is a progressive disease reflected in markers across assessment modalities, including neuroimaging, cognitive testing, and evaluation of adaptive function. Identifying a single continuum of decline across assessment modalities in a single sample is statistically challenging because of the multivariate nature of the data. To address this challenge, we implemented advanced statistical analyses designed specifically to model complex data across a single continuum. We analyzed data from the Alzheimer's Disease Neuroimaging Initiative (ADNI; N = 1,056), focusing on indicators from the assessments of magnetic resonance imaging (MRI) volume, fluorodeoxyglucose positron emission tomography (FDG-PET) metabolic activity, cognitive performance, and adaptive function. Item response theory was used to identify the continuum of decline. Then, through a process of statistical scaling, indicators across all modalities were linked to that continuum and analyzed. Findings revealed that measures of MRI volume, FDG-PET metabolic activity, and adaptive function added measurement precision beyond that provided by cognitive measures, particularly in the relatively mild range of disease severity. More specifically, MRI volume, and FDG-PET metabolic activity become compromised in the very mild range of severity, followed by cognitive performance and finally adaptive function. Our statistically derived models of the AD pathological cascade are consistent with existing theoretical models.
Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel
2016-07-20
A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel
A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less
A biological compression model and its applications.
Cao, Minh Duc; Dix, Trevor I; Allison, Lloyd
2011-01-01
A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.
NASA Technical Reports Server (NTRS)
Urquhart, Erin A.; Zaitchik, Benjamin F.; Waugh, Darryn W.; Guikema, Seth D.; Del Castillo, Carlos E.
2014-01-01
The effect that climate change and variability will have on waterborne bacteria is a topic of increasing concern for coastal ecosystems, including the Chesapeake Bay. Surface water temperature trends in the Bay indicate a warming pattern of roughly 0.3-0.4 C per decade over the past 30 years. It is unclear what impact future warming will have on pathogens currently found in the Bay, including Vibrio spp. Using historical environmental data, combined with three different statistical models of Vibrio vulnificus probability, we explore the relationship between environmental change and predicted Vibrio vulnificus presence in the upper Chesapeake Bay. We find that the predicted response of V. vulnificus probability to high temperatures in the Bay differs systematically between models of differing structure. As existing publicly available datasets are inadequate to determine which model structure is most appropriate, the impact of climatic change on the probability of V. vulnificus presence in the Chesapeake Bay remains uncertain. This result points to the challenge of characterizing climate sensitivity of ecological systems in which data are sparse and only statistical models of ecological sensitivity exist.
BCM: toolkit for Bayesian analysis of Computational Models using samplers.
Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A
2016-10-21
Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Score tests for independence in semiparametric competing risks models.
Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul
2009-12-01
A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.
Tuition at PhD-Granting Institutions: A Supply and Demand Model.
ERIC Educational Resources Information Center
Koshal, Rajindar K.; And Others
1994-01-01
Builds and estimates a model that explains educational supply and demand behavior at PhD-granting institutions in the United States. The statistical analysis based on 1988-89 data suggests that student quantity, educational costs, average SAT score, class size, percentage of faculty with a PhD, graduation rate, ranking, and existence of a medical…
Fairchild, Amanda J.; Abara, Winston E.; Gottschall, Amanda C.; Tein, Jenn-Yun; Prinz, Ronald J.
2015-01-01
The purpose of this article is to introduce and describe a statistical model that researchers can use to evaluate underlying mechanisms of behavioral onset and other event occurrence outcomes. Specifically, the article develops a framework for estimating mediation effects with outcomes measured in discrete-time epochs by integrating the statistical mediation model with discrete-time survival analysis. The methodology has the potential to help strengthen health research by targeting prevention and intervention work more effectively as well as by improving our understanding of discretized periods of risk. The model is applied to an existing longitudinal data set to demonstrate its use, and programming code is provided to facilitate its implementation. PMID:24296470
Treated cabin acoustic prediction using statistical energy analysis
NASA Technical Reports Server (NTRS)
Yoerkie, Charles A.; Ingraham, Steven T.; Moore, James A.
1987-01-01
The application of statistical energy analysis (SEA) to the modeling and design of helicopter cabin interior noise control treatment is demonstrated. The information presented here is obtained from work sponsored at NASA Langley for the development of analytic modeling techniques and the basic understanding of cabin noise. Utility and executive interior models are developed directly from existing S-76 aircraft designs. The relative importance of panel transmission loss (TL), acoustic leakage, and absorption to the control of cabin noise is shown using the SEA modeling parameters. It is shown that the major cabin noise improvement below 1000 Hz comes from increased panel TL, while above 1000 Hz it comes from reduced acoustic leakage and increased absorption in the cabin and overhead cavities.
The Rise and Fall of Pentaquarks in Experiments
NASA Astrophysics Data System (ADS)
Schumacher, Reinhard A.
2006-07-01
Experimental evidence for and against the existence of pentaquarks has accumulated rapidly in the last three years. If they exist, they would be dramatic examples of hadronic states beyond our well-tested and successful particle models. The positive evidence suggests existence of baryonic objects with widths of at most a few MeV, some displaying exotic quantum numbers, such as baryons with strangeness S = +1. The non-observations of these states have often come from reaction channels very different from the positive evidence channels, making comparisons difficult. The situation has now been largely clarified, however, by high-statistics repetitions of the positive sightings, with the result that none of the positive sightings have been convincingly reproduced. The most recent unconfirmed positive sightings suffer again from low statistics and large backgrounds. It seems that a kind of "bandwagon" effect led to the overly-optimistic interpretation of numerous experiments in the earlier reports of exotic pentaquarks.
Cömert, Itır Tarı; Özyeşil, Zümra Atalay; Burcu Özgülük, S
2016-02-01
The aim of the current study was to investigate the contributions of sad childhood experiences, depression, anxiety, and stress, existence of a sense of meaning, and pursuit of meaning in explaining life satisfaction of young adults in Turkey. The sample comprised 400 undergraduate students ( M age = 20.2 yr.) selected via random cluster sampling. There were no statistically significant differences between men and women in terms of their scores on depression, existence of meaning, pursuit of meaning, and life satisfaction scores. However, there were statistically significant differences between men and women on the sad childhood experiences, anxiety and stress. In heirarchical regression analysis, the model as a whole was significant. Depression and existence of meaning in life made unique significant contributions to the variance in satisfaction in life. Students with lower depression and with a sense of meaning in life tended to be more satisfied with life.
Xia, Yinglin; Morrison-Beedy, Dianne; Ma, Jingming; Feng, Changyong; Cross, Wendi; Tu, Xin
2012-01-01
Modeling count data from sexual behavioral outcomes involves many challenges, especially when the data exhibit a preponderance of zeros and overdispersion. In particular, the popular Poisson log-linear model is not appropriate for modeling such outcomes. Although alternatives exist for addressing both issues, they are not widely and effectively used in sex health research, especially in HIV prevention intervention and related studies. In this paper, we discuss how to analyze count outcomes distributed with excess of zeros and overdispersion and introduce appropriate model-fit indices for comparing the performance of competing models, using data from a real study on HIV prevention intervention. The in-depth look at these common issues arising from studies involving behavioral outcomes will promote sound statistical analyses and facilitate research in this and other related areas. PMID:22536496
ASYMPTOTIC DISTRIBUTION OF ΔAUC, NRIs, AND IDI BASED ON THEORY OF U-STATISTICS
Demler, Olga V.; Pencina, Michael J.; Cook, Nancy R.; D’Agostino, Ralph B.
2017-01-01
The change in AUC (ΔAUC), the IDI, and NRI are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues we unite the ΔAUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ΔAUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ΔAUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ΔAUC, NRIs, or IDI. In the former case SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ΔAUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ΔAUC. PMID:28627112
Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U-statistics.
Demler, Olga V; Pencina, Michael J; Cook, Nancy R; D'Agostino, Ralph B
2017-09-20
The change in area under the curve (∆AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ∆AUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ∆AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ∆AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ∆AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ∆AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ∆AUC. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Statistical mechanics of soft-boson phase transitions
NASA Technical Reports Server (NTRS)
Gupta, Arun K.; Hill, Christopher T.; Holman, Richard; Kolb, Edward W.
1991-01-01
The existence of structure on large (100 Mpc) scales, and limits to anisotropies in the cosmic microwave background radiation (CMBR), have imperiled models of structure formation based solely upon the standard cold dark matter scenario. Novel scenarios, which may be compatible with large scale structure and small CMBR anisotropies, invoke nonlinear fluctuations in the density appearing after recombination, accomplished via the use of late time phase transitions involving ultralow mass scalar bosons. Herein, the statistical mechanics are studied of such phase transitions in several models involving naturally ultralow mass pseudo-Nambu-Goldstone bosons (pNGB's). These models can exhibit several interesting effects at high temperature, which is believed to be the most general possibilities for pNGB's.
Structure of wind-shear turbulence
NASA Technical Reports Server (NTRS)
Trevino, G.; Laituri, T. R.
1989-01-01
The statistical characteristics of wind shear turbulence are modelled. Isotropic turbulence serves as the basis of comparison for the anisotropic turbulence which exists in wind shear. The question of turbulence scales in wind shear is addressed from the perspective of power spectral density.
NASA Astrophysics Data System (ADS)
Adamaki, A.; Roberts, R.
2016-12-01
For many years an important aim in seismological studies has been forecasting the occurrence of large earthquakes. Despite some well-established statistical behavior of earthquake sequences, expressed by e.g. the Omori law for aftershock sequences and the Gutenburg-Richter distribution of event magnitudes, purely statistical approaches to short-term earthquake prediction have in general not been successful. It seems that better understanding of the processes leading to critical stress build-up prior to larger events is necessary to identify useful precursory activity, if this exists, and statistical analyses are an important tool in this context. There has been considerable debate on the usefulness or otherwise of foreshock studies for short-term earthquake prediction. We investigate generic patterns of foreshock activity using aggregated data and by studying not only strong but also moderate magnitude events. Aggregating empirical local seismicity time series prior to larger events observed in and around Greece reveals a statistically significant increasing rate of seismicity over 20 days prior to M>3.5 earthquakes. This increase cannot be explained by tempo-spatial clustering models such as ETAS, implying genuine changes in the mechanical situation just prior to larger events and thus the possible existence of useful precursory information. Because of tempo-spatial clustering, including aftershocks to foreshocks, even if such generic behavior exists it does not necessarily follow that foreshocks have the potential to provide useful precursory information for individual larger events. Using synthetic catalogs produced based on different clustering models and different presumed system sensitivities we are now investigating to what extent the apparently established generic foreshock rate acceleration may or may not imply that the foreshocks have potential in the context of routine forecasting of larger events. Preliminary results suggest that this is the case, but that it is likely that physically-based models of foreshock clustering will be a necessary, but not necessarily sufficient, basis for successful forecasting.
Armour, Cherie; O'Connor, Maja; Elklit, Ask; Elhai, Jon D
2013-10-01
The three-factor structure of posttraumatic stress disorder (PTSD) specified by the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, is not supported in the empirical literature. Two alternative four-factor models have received a wealth of empirical support. However, a consensus regarding which is superior has not been reached. A recent five-factor model has been shown to provide superior fit over the existing four-factor models. The present study investigated the fit of the five-factor model against the existing four-factor models and assessed the resultant factors' association with depression in a bereaved European trauma sample (N = 325). The participants were assessed for PTSD via the Harvard Trauma Questionnaire and depression via the Beck Depression Inventory. The five-factor model provided superior fit to the data compared with the existing four-factor models. In the dysphoric arousal model, depression was equally related to both dysphoric arousal and emotional numbing, whereas depression was more related to dysphoric arousal than to anxious arousal.
Avalappampatty Sivasamy, Aneetha; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
Sivasamy, Aneetha Avalappampatty; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel
2014-03-01
Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Technical Reports Server (NTRS)
Moin, Parviz; Reynolds, William C.
1988-01-01
Lagrangian techniques have found widespread application to the prediction and understanding of turbulent transport phenomena and have yielded satisfactory results for different cases of shear flow problems. However, it must be kept in mind that in most experiments what is really available are Eulerian statistics, and it is far from obvious how to extract from them the information relevant to the Lagrangian behavior of the flow; in consequence, Lagrangian models still include some hypothesis for which no adequate supporting evidence was until now available. Direct numerical simulation of turbulence offers a new way to obtain Lagrangian statistics and so verify the validity of the current predictive models and the accuracy of their results. After the pioneering work of Riley (Riley and Patterson, 1974) in the 70's, some such results have just appeared in the literature (Lee et al, Yeung and Pope). The present contribution follows in part similar lines, but focuses on two particle statistics and comparison with existing models.
Modelling the monetary value of a QALY: a new approach based on UK data.
Mason, Helen; Jones-Lee, Michael; Donaldson, Cam
2009-08-01
Debate about the monetary value of a quality-adjusted life year (QALY) has existed in the health economics literature for some time. More recently, concern about such a value has arisen in UK health policy. This paper reports on an attempt to 'model' a willingness-to-pay-based value of a QALY from the existing value of preventing a statistical fatality (VPF) currently used in UK public sector decision making. Two methods of deriving the value of a QALY from the existing UK VPF are outlined: one conventional and one new. The advantages and disadvantages of each of the approaches are discussed as well as the implications of the results for policy and health economic evaluation methodology.
Li, Jian-Long; Wang, Peng; Fung, Wing Kam; Zhou, Ji-Yuan
2017-10-16
For dichotomous traits, the generalized disequilibrium test with the moment estimate of the variance (GDT-ME) is a powerful family-based association method. Genomic imprinting is an important epigenetic phenomenon and currently, there has been increasing interest of incorporating imprinting to improve the test power of association analysis. However, GDT-ME does not take imprinting effects into account, and it has not been investigated whether it can be used for association analysis when the effects indeed exist. In this article, based on a novel decomposition of the genotype score according to the paternal or maternal source of the allele, we propose the generalized disequilibrium test with imprinting (GDTI) for complete pedigrees without any missing genotypes. Then, we extend GDTI and GDT-ME to accommodate incomplete pedigrees with some pedigrees having missing genotypes, by using a Monte Carlo (MC) sampling and estimation scheme to infer missing genotypes given available genotypes in each pedigree, denoted by MCGDTI and MCGDT-ME, respectively. The proposed GDTI and MCGDTI methods evaluate the differences of the paternal as well as maternal allele scores for all discordant relative pairs in a pedigree, including beyond first-degree relative pairs. Advantages of the proposed GDTI and MCGDTI test statistics over existing methods are demonstrated by simulation studies under various simulation settings and by application to the rheumatoid arthritis dataset. Simulation results show that the proposed tests control the size well under the null hypothesis of no association, and outperform the existing methods under various imprinting effect models. The existing GDT-ME and the proposed MCGDT-ME can be used to test for association even when imprinting effects exist. For the application to the rheumatoid arthritis data, compared to the existing methods, MCGDTI identifies more loci statistically significantly associated with the disease. Under complete and incomplete imprinting effect models, our proposed GDTI and MCGDTI methods, by considering the information on imprinting effects and all discordant relative pairs within each pedigree, outperform all the existing test statistics and MCGDTI can recapture much of the missing information. Therefore, MCGDTI is recommended in practice.
Chaotic oscillations and noise transformations in a simple dissipative system with delayed feedback
NASA Astrophysics Data System (ADS)
Zverev, V. V.; Rubinstein, B. Ya.
1991-04-01
We analyze the statistical behavior of signals in nonlinear circuits with delayed feedback in the presence of external Markovian noise. For the special class of circuits with intense phase mixing we develop an approach for the computation of the probability distributions and multitime correlation functions based on the random phase approximation. Both Gaussian and Kubo-Andersen models of external noise statistics are analyzed and the existence of the stationary (asymptotic) random process in the long-time limit is shown. We demonstrate that a nonlinear system with chaotic behavior becomes a noise amplifier with specific statistical transformation properties.
ProbOnto: ontology and knowledge base of probability distributions.
Swat, Maciej J; Grenon, Pierre; Wimalaratne, Sarala
2016-09-01
Probability distributions play a central role in mathematical and statistical modelling. The encoding, annotation and exchange of such models could be greatly simplified by a resource providing a common reference for the definition of probability distributions. Although some resources exist, no suitably detailed and complex ontology exists nor any database allowing programmatic access. ProbOnto, is an ontology-based knowledge base of probability distributions, featuring more than 80 uni- and multivariate distributions with their defining functions, characteristics, relationships and re-parameterization formulas. It can be used for model annotation and facilitates the encoding of distribution-based models, related functions and quantities. http://probonto.org mjswat@ebi.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Cormack Research Project: Glasgow University
NASA Technical Reports Server (NTRS)
Skinner, Susan; Ryan, James M.
1998-01-01
The aim of this project was to investigate and improve upon existing methods of analysing data from COMITEL on the Gamma Ray Observatory for neutrons emitted during solar flares. In particular, a strategy for placing confidence intervals on neutron energy distributions, due to uncertainties on the response matrix has been developed. We have also been able to demonstrate the superior performance of one of a range of possible statistical regularization strategies. A method of generating likely models of neutron energy distributions has also been developed as a tool to this end. The project involved solving an inverse problem with noise being added to the data in various ways. To achieve this pre-existing C code was used to run Fortran subroutines which performed statistical regularization on the data.
Comparison of the predictive validity of diagnosis-based risk adjusters for clinical outcomes.
Petersen, Laura A; Pietz, Kenneth; Woodard, LeChauncy D; Byrne, Margaret
2005-01-01
Many possible methods of risk adjustment exist, but there is a dearth of comparative data on their performance. We compared the predictive validity of 2 widely used methods (Diagnostic Cost Groups [DCGs] and Adjusted Clinical Groups [ACGs]) for 2 clinical outcomes using a large national sample of patients. We studied all patients who used Veterans Health Administration (VA) medical services in fiscal year (FY) 2001 (n = 3,069,168) and assigned both a DCG and an ACG to each. We used logistic regression analyses to compare predictive ability for death or long-term care (LTC) hospitalization for age/gender models, DCG models, and ACG models. We also assessed the effect of adding age to the DCG and ACG models. Patients in the highest DCG categories, indicating higher severity of illness, were more likely to die or to require LTC hospitalization. Surprisingly, the age/gender model predicted death slightly more accurately than the ACG model (c-statistic of 0.710 versus 0.700, respectively). The addition of age to the ACG model improved the c-statistic to 0.768. The highest c-statistic for prediction of death was obtained with a DCG/age model (0.830). The lowest c-statistics were obtained for age/gender models for LTC hospitalization (c-statistic 0.593). The c-statistic for use of ACGs to predict LTC hospitalization was 0.783, and improved to 0.792 with the addition of age. The c-statistics for use of DCGs and DCG/age to predict LTC hospitalization were 0.885 and 0.890, respectively, indicating the best prediction. We found that risk adjusters based upon diagnoses predicted an increased likelihood of death or LTC hospitalization, exhibiting good predictive validity. In this comparative analysis using VA data, DCG models were generally superior to ACG models in predicting clinical outcomes, although ACG model performance was enhanced by the addition of age.
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-01-01
Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing. PMID:24567836
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-08-01
To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing.
NASA Technical Reports Server (NTRS)
Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.
1992-01-01
The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.
Generalized Appended Product Indicator Procedure for Nonlinear Structural Equation Analysis.
ERIC Educational Resources Information Center
Wall, Melanie M.; Amemiya, Yasuo
2001-01-01
Considers the estimation of polynomial structural models and shows a limitation of an existing method. Introduces a new procedure, the generalized appended product indicator procedure, for nonlinear structural equation analysis. Addresses statistical issues associated with the procedure through simulation. (SLD)
Structure of wind-shear turbulence
NASA Technical Reports Server (NTRS)
Trevino, G.; Laituri, T. R.
1988-01-01
The statistical characteristics of wind-shear turbulence are modelled. Isotropic turbulence serves as the basis of comparison for the anisotropic turbulence which exists in wind shear. The question of how turbulence scales in a wind shear is addressed from the perspective of power spectral density.
Techniques in teaching statistics : linking research production and research use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martinez-Moyano, I .; Smith, A.; Univ. of Massachusetts at Boston)
In the spirit of closing the 'research-practice gap,' the authors extend evidence-based principles to statistics instruction in social science graduate education. The authors employ a Delphi method to survey experienced statistics instructors to identify teaching techniques to overcome the challenges inherent in teaching statistics to students enrolled in practitioner-oriented master's degree programs. Among the teaching techniques identi?ed as essential are using real-life examples, requiring data collection exercises, and emphasizing interpretation rather than results. Building on existing research, preliminary interviews, and the ?ndings from the study, the authors develop a model describing antecedents to the strength of the link between researchmore » and practice.« less
Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A
2011-10-01
Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Seasonal Drought Prediction: Advances, Challenges, and Future Prospects
NASA Astrophysics Data System (ADS)
Hao, Zengchao; Singh, Vijay P.; Xia, Youlong
2018-03-01
Drought prediction is of critical importance to early warning for drought managements. This review provides a synthesis of drought prediction based on statistical, dynamical, and hybrid methods. Statistical drought prediction is achieved by modeling the relationship between drought indices of interest and a suite of potential predictors, including large-scale climate indices, local climate variables, and land initial conditions. Dynamical meteorological drought prediction relies on seasonal climate forecast from general circulation models (GCMs), which can be employed to drive hydrological models for agricultural and hydrological drought prediction with the predictability determined by both climate forcings and initial conditions. Challenges still exist in drought prediction at long lead time and under a changing environment resulting from natural and anthropogenic factors. Future research prospects to improve drought prediction include, but are not limited to, high-quality data assimilation, improved model development with key processes related to drought occurrence, optimal ensemble forecast to select or weight ensembles, and hybrid drought prediction to merge statistical and dynamical forecasts.
Marshall, F.E.; Wingard, G.L.
2012-01-01
The upgraded method of coupled paleosalinity and hydrologic models was applied to the analysis of the circa-1900 CE segments of five estuarine sediment cores collected in Florida Bay. Comparisons of the observed mean stage (water level) data to the paleoecology-based model's averaged output show that the estimated stage in the Everglades wetlands was 0.3 to 1.6 feet higher at different locations. Observed mean flow data compared to the paleoecology-based model output show an estimated flow into Shark River Slough at Tamiami Trail of 401 to 2,539 cubic feet per second (cfs) higher than existing flows, and at Taylor Slough Bridge an estimated flow of 48 to 218 cfs above existing flows. For salinity in Florida Bay, the difference between paleoecology-based and observed mean salinity varies across the bay, from an aggregated average salinity of 14.7 less than existing in the northeastern basin to 1.0 less than existing in the western basin near the transition into the Gulf of Mexico. When the salinity differences are compared by region, the difference between paleoecology-based conditions and existing conditions are spatially consistent.
NASA Astrophysics Data System (ADS)
Takiyama, Ken
2017-12-01
How neural adaptation affects neural information processing (i.e. the dynamics and equilibrium state of neural activities) is a central question in computational neuroscience. In my previous works, I analytically clarified the dynamics and equilibrium state of neural activities in a ring-type neural network model that is widely used to model the visual cortex, motor cortex, and several other brain regions. The neural dynamics and the equilibrium state in the neural network model corresponded to a Bayesian computation and statistically optimal multiple information integration, respectively, under a biologically inspired condition. These results were revealed in an analytically tractable manner; however, adaptation effects were not considered. Here, I analytically reveal how the dynamics and equilibrium state of neural activities in a ring neural network are influenced by spike-frequency adaptation (SFA). SFA is an adaptation that causes gradual inhibition of neural activity when a sustained stimulus is applied, and the strength of this inhibition depends on neural activities. I reveal that SFA plays three roles: (1) SFA amplifies the influence of external input in neural dynamics; (2) SFA allows the history of the external input to affect neural dynamics; and (3) the equilibrium state corresponds to the statistically optimal multiple information integration independent of the existence of SFA. In addition, the equilibrium state in a ring neural network model corresponds to the statistically optimal integration of multiple information sources under biologically inspired conditions, independent of the existence of SFA.
Testing the Predictive Power of Coulomb Stress on Aftershock Sequences
NASA Astrophysics Data System (ADS)
Woessner, J.; Lombardi, A.; Werner, M. J.; Marzocchi, W.
2009-12-01
Empirical and statistical models of clustered seismicity are usually strongly stochastic and perceived to be uninformative in their forecasts, since only marginal distributions are used, such as the Omori-Utsu and Gutenberg-Richter laws. In contrast, so-called physics-based aftershock models, based on seismic rate changes calculated from Coulomb stress changes and rate-and-state friction, make more specific predictions: anisotropic stress shadows and multiplicative rate changes. We test the predictive power of models based on Coulomb stress changes against statistical models, including the popular Short Term Earthquake Probabilities and Epidemic-Type Aftershock Sequences models: We score and compare retrospective forecasts on the aftershock sequences of the 1992 Landers, USA, the 1997 Colfiorito, Italy, and the 2008 Selfoss, Iceland, earthquakes. To quantify predictability, we use likelihood-based metrics that test the consistency of the forecasts with the data, including modified and existing tests used in prospective forecast experiments within the Collaboratory for the Study of Earthquake Predictability (CSEP). Our results indicate that a statistical model performs best. Moreover, two Coulomb model classes seem unable to compete: Models based on deterministic Coulomb stress changes calculated from a given fault-slip model, and those based on fixed receiver faults. One model of Coulomb stress changes does perform well and sometimes outperforms the statistical models, but its predictive information is diluted, because of uncertainties included in the fault-slip model. Our results suggest that models based on Coulomb stress changes need to incorporate stochastic features that represent model and data uncertainty.
Detection of Erroneous Payments Utilizing Supervised And Unsupervised Data Mining Techniques
2004-09-01
will look at which statistical analysis technique will work best in developing and enhancing existing erroneous payment models . Chapter I and II... payment models that are used for selection of records to be audited. The models are set up such that if two or more records have the same payment...Identification Number, Invoice Number and Delivery Order Number are not compared. The DM0102 Duplicate Payment Model will be analyzed in this thesis
Comparing estimates of climate change impacts from process-based and statistical crop models
NASA Astrophysics Data System (ADS)
Lobell, David B.; Asseng, Senthold
2017-01-01
The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.
The importance of topographically corrected null models for analyzing ecological point processes.
McDowall, Philip; Lynch, Heather J
2017-07-01
Analyses of point process patterns and related techniques (e.g., MaxEnt) make use of the expected number of occurrences per unit area and second-order statistics based on the distance between occurrences. Ecologists working with point process data often assume that points exist on a two-dimensional x-y plane or within a three-dimensional volume, when in fact many observed point patterns are generated on a two-dimensional surface existing within three-dimensional space. For many surfaces, however, such as the topography of landscapes, the projection from the surface to the x-y plane preserves neither area nor distance. As such, when these point patterns are implicitly projected to and analyzed in the x-y plane, our expectations of the point pattern's statistical properties may not be met. When used in hypothesis testing, we find that the failure to account for the topography of the generating surface may bias statistical tests that incorrectly identify clustering and, furthermore, may bias coefficients in inhomogeneous point process models that incorporate slope as a covariate. We demonstrate the circumstances under which this bias is significant, and present simple methods that allow point processes to be simulated with corrections for topography. These point patterns can then be used to generate "topographically corrected" null models against which observed point processes can be compared. © 2017 by the Ecological Society of America.
Human-modified temperatures induce species changes: Joint attribution.
Root, Terry L; MacMynowski, Dena P; Mastrandrea, Michael D; Schneider, Stephen H
2005-05-24
Average global surface-air temperature is increasing. Contention exists over relative contributions by natural and anthropogenic forcings. Ecological studies attribute plant and animal changes to observed warming. Until now, temperature-species connections have not been statistically attributed directly to anthropogenic climatic change. Using modeled climatic variables and observed species data, which are independent of thermometer records and paleoclimatic proxies, we demonstrate statistically significant "joint attribution," a two-step linkage: human activities contribute significantly to temperature changes and human-changed temperatures are associated with discernible changes in plant and animal traits. Additionally, our analyses provide independent testing of grid-box-scale temperature projections from a general circulation model (HadCM3).
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.
Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray
2016-12-01
In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Michael, Andrew J.
2012-01-01
Estimates of the probability that an ML 4.8 earthquake, which occurred near the southern end of the San Andreas fault on 24 March 2009, would be followed by an M 7 mainshock over the following three days vary from 0.0009 using a Gutenberg–Richter model of aftershock statistics (Reasenberg and Jones, 1989) to 0.04 using a statistical model of foreshock behavior and long‐term estimates of large earthquake probabilities, including characteristic earthquakes (Agnew and Jones, 1991). I demonstrate that the disparity between the existing approaches depends on whether or not they conform to Gutenberg–Richter behavior. While Gutenberg–Richter behavior is well established over large regions, it could be violated on individual faults if they have characteristic earthquakes or over small areas if the spatial distribution of large‐event nucleations is disproportional to the rate of smaller events. I develop a new form of the aftershock model that includes characteristic behavior and combines the features of both models. This new model and the older foreshock model yield the same results when given the same inputs, but the new model has the advantage of producing probabilities for events of all magnitudes, rather than just for events larger than the initial one. Compared with the aftershock model, the new model has the advantage of taking into account long‐term earthquake probability models. Using consistent parameters, the probability of an M 7 mainshock on the southernmost San Andreas fault is 0.0001 for three days from long‐term models and the clustering probabilities following the ML 4.8 event are 0.00035 for a Gutenberg–Richter distribution and 0.013 for a characteristic‐earthquake magnitude–frequency distribution. Our decisions about the existence of characteristic earthquakes and how large earthquakes nucleate have a first‐order effect on the probabilities obtained from short‐term clustering models for these large events.
A Robust Adaptive Autonomous Approach to Optimal Experimental Design
NASA Astrophysics Data System (ADS)
Gu, Hairong
Experimentation is the fundamental tool of scientific inquiries to understand the laws governing the nature and human behaviors. Many complex real-world experimental scenarios, particularly in quest of prediction accuracy, often encounter difficulties to conduct experiments using an existing experimental procedure for the following two reasons. First, the existing experimental procedures require a parametric model to serve as the proxy of the latent data structure or data-generating mechanism at the beginning of an experiment. However, for those experimental scenarios of concern, a sound model is often unavailable before an experiment. Second, those experimental scenarios usually contain a large number of design variables, which potentially leads to a lengthy and costly data collection cycle. Incompetently, the existing experimental procedures are unable to optimize large-scale experiments so as to minimize the experimental length and cost. Facing the two challenges in those experimental scenarios, the aim of the present study is to develop a new experimental procedure that allows an experiment to be conducted without the assumption of a parametric model while still achieving satisfactory prediction, and performs optimization of experimental designs to improve the efficiency of an experiment. The new experimental procedure developed in the present study is named robust adaptive autonomous system (RAAS). RAAS is a procedure for sequential experiments composed of multiple experimental trials, which performs function estimation, variable selection, reverse prediction and design optimization on each trial. Directly addressing the challenges in those experimental scenarios of concern, function estimation and variable selection are performed by data-driven modeling methods to generate a predictive model from data collected during the course of an experiment, thus exempting the requirement of a parametric model at the beginning of an experiment; design optimization is performed to select experimental designs on the fly of an experiment based on their usefulness so that fewest designs are needed to reach useful inferential conclusions. Technically, function estimation is realized by Bayesian P-splines, variable selection is realized by Bayesian spike-and-slab prior, reverse prediction is realized by grid-search and design optimization is realized by the concepts of active learning. The present study demonstrated that RAAS achieves statistical robustness by making accurate predictions without the assumption of a parametric model serving as the proxy of latent data structure while the existing procedures can draw poor statistical inferences if a misspecified model is assumed; RAAS also achieves inferential efficiency by taking fewer designs to acquire useful statistical inferences than non-optimal procedures. Thus, RAAS is expected to be a principled solution to real-world experimental scenarios pursuing robust prediction and efficient experimentation.
Atrazine concentrations in near-surface aquifers: A censored regression approach
Liu, S.; Yen, S.T.; Kolpin, D.W.
1996-01-01
In 1991, the U.S. Geological Survey (USGS) conducted a study to investigate the occurrence of atrazine (2-chloro-4-ethylamino-6- isopropylamino-s-triazine) and other agricultural chemicals in near-surface aquifers in the midcontinental USA. Because about 83% of the atrazine concentrations from the USGS study were censored, standard statistical estimation procedures could not be used. To determine factors that affect atrazine concentrations in groundwater while accommodating the high degree of data censoring. Tobit models were used (normal homoscedastic, normal heteroscedastic, lognormal homoscedastic, and lognormal heteroscedastic). Empirical results suggest that the lognormal heteroscedastic Tobit model is the model of choice for this type of study. This model determined the following factors to have the strongest effect on atrazine concentrations in groundwater: percent of pasture within 3.2 km, percent of forest within 3.2 km (2 mi), mean open interval of the well, primary water use of a well, aquifer class (unconsolidated or bedrock), aquifer type (unconfined or confined), existence of a stream within 30 m (100 ft), existence of a stream within 30 m to 0.4 km (0.25 mi), and existence of a stream within 0.4 to 3.2 km. Examining the elasticities of the continuous explanatory factors provides further insight into their effects on atrazine concentrations in groundwater. This study documents a viable statistical method that can be used to accommodate the complicating presence of censured data, a feature that commonly occurs in environmental data.
NASA Astrophysics Data System (ADS)
Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.
2010-06-01
The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 80's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The r-largest annual maxima method provides more reliable predictions of the extreme values especially for small return periods (<100 years). Finally, the study statistically proves the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.
Vahedi, Shahram; Farrokhi, Farahman
2011-01-01
Objective The aim of this study is to explore the confirmatory factor analysis results of the Persian adaptation of Statistics Anxiety Measure (SAM), proposed by Earp. Method The validity and reliability assessments of the scale were performed on 298 college students chosen randomly from Tabriz University in Iran. Confirmatory factor analysis (CFA) was carried out to determine the factor structures of the Persian adaptation of SAM. Results As expected, the second order model provided a better fit to the data than the three alternative models. Conclusions Hence, SAM provides an equally valid measure for use among college students. The study both expands and adds support to the existing body of math anxiety literature. PMID:22952530
NASA Astrophysics Data System (ADS)
Lai, Jiawei; Alwazzan, Dana; Chakraborty, Nilanjan
2017-11-01
The statistical behaviour and the modelling of turbulent scalar flux transport have been analysed using a direct numerical simulation (DNS) database of head-on quenching of statistically planar turbulent premixed flames by an isothermal wall. A range of different values of Damköhler, Karlovitz numbers and Lewis numbers has been considered for this analysis. The magnitudes of the turbulent transport and mean velocity gradient terms in the turbulent scalar flux transport equation remain small in comparison to the pressure gradient, molecular dissipation and reaction-velocity fluctuation correlation terms in the turbulent scalar flux transport equation when the flame is away from the wall but the magnitudes of all these terms diminish and assume comparable values during flame quenching before vanishing altogether. It has been found that the existing models for the turbulent transport, pressure gradient, molecular dissipation and reaction-velocity fluctuation correlation terms in the turbulent scalar flux transport equation do not adequately address the respective behaviours extracted from DNS data in the near-wall region during flame quenching. Existing models for transport equation-based closures of turbulent scalar flux have been modified in such a manner that these models provide satisfactory prediction both near to and away from the wall.
Allstadt, Kate E.; Thompson, Eric M.; Hearne, Mike; Nowicki Jessee, M. Anna; Zhu, J.; Wald, David J.; Tanyas, Hakan
2017-01-01
The U.S. Geological Survey (USGS) has made significant progress toward the rapid estimation of shaking and shakingrelated losses through their Did You Feel It? (DYFI), ShakeMap, ShakeCast, and PAGER products. However, quantitative estimates of the extent and severity of secondary hazards (e.g., landsliding, liquefaction) are not currently included in scenarios and real-time post-earthquake products despite their significant contributions to hazard and losses for many events worldwide. We are currently running parallel global statistical models for landslides and liquefaction developed with our collaborators in testing mode, but much work remains in order to operationalize these systems. We are expanding our efforts in this area by not only improving the existing statistical models, but also by (1) exploring more sophisticated, physics-based models where feasible; (2) incorporating uncertainties; and (3) identifying and undertaking research and product development to provide useful landslide and liquefaction estimates and their uncertainties. Although our existing models use standard predictor variables that are accessible globally or regionally, including peak ground motions, topographic slope, and distance to water bodies, we continue to explore readily available proxies for rock and soil strength as well as other susceptibility terms. This work is based on the foundation of an expanding, openly available, case-history database we are compiling along with historical ShakeMaps for each event. The expected outcome of our efforts is a robust set of real-time secondary hazards products that meet the needs of a wide variety of earthquake information users. We describe the available datasets and models, developments currently underway, and anticipated products.
Statistics and classification of the microwave zebra patterns associated with solar flares
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tan, Baolin; Tan, Chengming; Zhang, Yin
2014-01-10
The microwave zebra pattern (ZP) is the most interesting, intriguing, and complex spectral structure frequently observed in solar flares. A comprehensive statistical study will certainly help us to understand the formation mechanism, which is not exactly clear now. This work presents a comprehensive statistical analysis of a big sample with 202 ZP events collected from observations at the Chinese Solar Broadband Radio Spectrometer at Huairou and the Ondŕejov Radiospectrograph in the Czech Republic at frequencies of 1.00-7.60 GHz from 2000 to 2013. After investigating the parameter properties of ZPs, such as the occurrence in flare phase, frequency range, polarization degree,more » duration, etc., we find that the variation of zebra stripe frequency separation with respect to frequency is the best indicator for a physical classification of ZPs. Microwave ZPs can be classified into three types: equidistant ZPs, variable-distant ZPs, and growing-distant ZPs, possibly corresponding to mechanisms of the Bernstein wave model, whistler wave model, and double plasma resonance model, respectively. This statistical classification may help us to clarify the controversies between the existing various theoretical models and understand the physical processes in the source regions.« less
Effects of Non-Normal Outlier-Prone Error Distribution on Kalman Filter Track
1991-09-01
other possibilities exist. For example the GST (Generic Statistical Tracker) uses four motion models [Ref. 41. The GST keeps track of both the target...1.011 + + + 3.113 1.291 4 Although this procedure is not easily statistically interpretable, it was used for the sake of comparison with the other... TRANSITOR TARGET’ WRITE(6,*)’ 3 SECOND ORDER GAUSS MARKOV TARGET’ WRITE(6,*)’ 4 RANDOM TOUR TARGET’ READ(6,*) CHOICE IF((CHOICE.LT.1).OR.(CHOICE.GT.4
Syndromic surveillance models using Web data: the case of scarlet fever in the UK.
Samaras, Loukas; García-Barriocanal, Elena; Sicilia, Miguel-Angel
2012-03-01
Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.
Understanding Statistics and Statistics Education: A Chinese Perspective
ERIC Educational Resources Information Center
Shi, Ning-Zhong; He, Xuming; Tao, Jian
2009-01-01
In recent years, statistics education in China has made great strides. However, there still exists a fairly large gap with the advanced levels of statistics education in more developed countries. In this paper, we identify some existing problems in statistics education in Chinese schools and make some proposals as to how they may be overcome. We…
Potential of hydraulically induced fractures to communicate with existing wellbores
NASA Astrophysics Data System (ADS)
Montague, James A.; Pinder, George F.
2015-10-01
The probability that new hydraulically fractured wells drilled within the area of New York underlain by the Marcellus Shale will intersect an existing wellbore is calculated using a statistical model, which incorporates: the depth of a new fracturing well, the vertical growth of induced fractures, and the depths and locations of existing nearby wells. The model first calculates the probability of encountering an existing well in plan view and combines this with the probability of an existing well-being at sufficient depth to intersect the fractured region. Average probability estimates for the entire region of New York underlain by the Marcellus Shale range from 0.00% to 3.45% based upon the input parameters used. The largest contributing parameter on the probability value calculated is the nearby density of wells meaning that due diligence by oil and gas companies during construction in identifying all nearby wells will have the greatest effect in reducing the probability of interwellbore communication.
NASA Astrophysics Data System (ADS)
Roşca, S.; Bilaşco, Ş.; Petrea, D.; Fodorean, I.; Vescan, I.; Filip, S.; Măguţ, F.-L.
2015-11-01
The existence of a large number of GIS models for the identification of landslide occurrence probability makes difficult the selection of a specific one. The present study focuses on the application of two quantitative models: the logistic and the BSA models. The comparative analysis of the results aims at identifying the most suitable model. The territory corresponding to the Niraj Mic Basin (87 km2) is an area characterised by a wide variety of the landforms with their morphometric, morphographical and geological characteristics as well as by a high complexity of the land use types where active landslides exist. This is the reason why it represents the test area for applying the two models and for the comparison of the results. The large complexity of input variables is illustrated by 16 factors which were represented as 72 dummy variables, analysed on the basis of their importance within the model structures. The testing of the statistical significance corresponding to each variable reduced the number of dummy variables to 12 which were considered significant for the test area within the logistic model, whereas for the BSA model all the variables were employed. The predictability degree of the models was tested through the identification of the area under the ROC curve which indicated a good accuracy (AUROC = 0.86 for the testing area) and predictability of the logistic model (AUROC = 0.63 for the validation area).
Forgetfulness can help you win games.
Burridge, James; Gao, Yu; Mao, Yong
2015-09-01
We present a simple game model where agents with different memory lengths compete for finite resources. We show by simulation and analytically that an instability exists at a critical memory length, and as a result, different memory lengths can compete and coexist in a dynamical equilibrium. Our analytical formulation makes a connection to statistical urn models, and we show that temperature is mirrored by the agent's memory. Our simple model of memory may be incorporated into other game models with implications that we briefly discuss.
NASA Astrophysics Data System (ADS)
Olugboji, T. M.; Lekic, V.; McDonough, W.
2017-07-01
We present a new approach for evaluating existing crustal models using ambient noise data sets and its associated uncertainties. We use a transdimensional hierarchical Bayesian inversion approach to invert ambient noise surface wave phase dispersion maps for Love and Rayleigh waves using measurements obtained from Ekström (2014). Spatiospectral analysis shows that our results are comparable to a linear least squares inverse approach (except at higher harmonic degrees), but the procedure has additional advantages: (1) it yields an autoadaptive parameterization that follows Earth structure without making restricting assumptions on model resolution (regularization or damping) and data errors; (2) it can recover non-Gaussian phase velocity probability distributions while quantifying the sources of uncertainties in the data measurements and modeling procedure; and (3) it enables statistical assessments of different crustal models (e.g., CRUST1.0, LITHO1.0, and NACr14) using variable resolution residual and standard deviation maps estimated from the ensemble. These assessments show that in the stable old crust of the Archean, the misfits are statistically negligible, requiring no significant update to crustal models from the ambient noise data set. In other regions of the U.S., significant updates to regionalization and crustal structure are expected especially in the shallow sedimentary basins and the tectonically active regions, where the differences between model predictions and data are statistically significant.
A statistical parts-based appearance model of inter-subject variability.
Toews, Matthew; Collins, D Louis; Arbel, Tal
2006-01-01
In this article, we present a general statistical parts-based model for representing the appearance of an image set, applied to the problem of inter-subject MR brain image matching. In contrast with global image representations such as active appearance models, the parts-based model consists of a collection of localized image parts whose appearance, geometry and occurrence frequency are quantified statistically. The parts-based approach explicitly addresses the case where one-to-one correspondence does not exist between subjects due to anatomical differences, as parts are not expected to occur in all subjects. The model can be learned automatically, discovering structures that appear with statistical regularity in a large set of subject images, and can be robustly fit to new images, all in the presence of significant inter-subject variability. As parts are derived from generic scale-invariant features, the framework can be applied in a wide variety of image contexts, in order to study the commonality of anatomical parts or to group subjects according to the parts they share. Experimentation shows that a parts-based model can be learned from a large set of MR brain images, and used to determine parts that are common within the group of subjects. Preliminary results indicate that the model can be used to automatically identify distinctive features for inter-subject image registration despite large changes in appearance.
2017-01-01
Producing predictions of the probabilistic risks of operating materials for given lengths of time at stated operating conditions requires the assimilation of existing deterministic creep life prediction models (that only predict the average failure time) with statistical models that capture the random component of creep. To date, these approaches have rarely been combined to achieve this objective. The first half of this paper therefore provides a summary review of some statistical models to help bridge the gap between these two approaches. The second half of the paper illustrates one possible assimilation using 1Cr1Mo-0.25V steel. The Wilshire equation for creep life prediction is integrated into a discrete hazard based statistical model—the former being chosen because of its novelty and proven capability in accurately predicting average failure times and the latter being chosen because of its flexibility in modelling the failure time distribution. Using this model it was found that, for example, if this material had been in operation for around 15 years at 823 K and 130 MPa, the chances of failure in the next year is around 35%. However, if this material had been in operation for around 25 years, the chance of failure in the next year rises dramatically to around 80%. PMID:29039773
NASA Astrophysics Data System (ADS)
Andersson, C. David; Hillgren, J. Mikael; Lindgren, Cecilia; Qian, Weixing; Akfur, Christine; Berg, Lotta; Ekström, Fredrik; Linusson, Anna
2015-03-01
Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds' physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure-activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules' properties before SAR and quantitative structure-activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSAR-models, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models.
Uchino, Bert N.; Bowen, Kimberly; Carlisle, McKenzie; Birmingham, Wendy
2012-01-01
Contemporary models postulate the importance of psychological mechanisms linking perceived and received social support to physical health outcomes. In this review, we examine studies that directly tested the potential psychological mechanisms responsible for links between social support and health-relevant physiological processes (1980s to 2010). Inconsistent with existing theoretical models, no evidence was found that psychological mechanisms such as depression, perceived stress, and other affective processes are directly responsible for links between support and health. We discuss the importance of considering statistical/design issues, emerging conceptual perspectives, and limitations of our existing models for future research aimed at elucidating the psychological mechanisms responsible for links between social support and physical health outcomes. PMID:22326104
A Diffusion Model for Two-sided Service Systems
NASA Astrophysics Data System (ADS)
Homma, Koichi; Yano, Koujin; Funabashi, Motohisa
A diffusion model is proposed for two-sided service systems. ‘Two-sided’ refers to the existence of an economic network effect between two different and interrelated groups, e.g., card holders and merchants in an electronic money service. The service benefit for a member of one side depends on the number and quality of the members on the other side. A mathematical model by J. H. Rohlfs explains the network (or bandwagon) effect of communications services. In Rohlfs' model, only the users' group exists and the model is one-sided. This paper extends Rohlfs' model to a two-sided model. We propose, first, a micro model that explains individual behavior in regard to service subscription of both sides and a computational method that drives the proposed model. Second, we develop macro models with two diffusion-rate variables by simplifying the micro model. As a case study, we apply the models to an electronic money service and discuss the simulation results and actual statistics.
An integrated logit model for contamination event detection in water distribution systems.
Housh, Mashor; Ostfeld, Avi
2015-05-15
The problem of contamination event detection in water distribution systems has become one of the most challenging research topics in water distribution systems analysis. Current attempts for event detection utilize a variety of approaches including statistical, heuristics, machine learning, and optimization methods. Several existing event detection systems share a common feature in which alarms are obtained separately for each of the water quality indicators. Unifying those single alarms from different indicators is usually performed by means of simple heuristics. A salient feature of the current developed approach is using a statistically oriented model for discrete choice prediction which is estimated using the maximum likelihood method for integrating the single alarms. The discrete choice model is jointly calibrated with other components of the event detection system framework in a training data set using genetic algorithms. The fusing process of each indicator probabilities, which is left out of focus in many existing event detection system models, is confirmed to be a crucial part of the system which could be modelled by exploiting a discrete choice model for improving its performance. The developed methodology is tested on real water quality data, showing improved performances in decreasing the number of false positive alarms and in its ability to detect events with higher probabilities, compared to previous studies. Copyright © 2015 Elsevier Ltd. All rights reserved.
Chen, Yong
2017-01-01
The expansion of shell disease is an emerging threat to the inshore lobster fisheries in the northeastern United States. The development of models to improve the efficiency and precision of existing monitoring programs is advocated as an important step in mitigating its harmful effects. The objective of this study is to construct a statistical model that could enhance the existing monitoring effort through (1) identification of potential disease-associated abiotic and biotic factors, and (2) estimation of spatial variation in disease prevalence in the lobster fishery. A delta-generalized additive modeling (GAM) approach was applied using bottom trawl survey data collected from 2001–2013 in Long Island Sound, a tidal estuary between New York and Connecticut states. Spatial distribution of shell disease prevalence was found to be strongly influenced by the interactive effects of latitude and longitude, possibly indicative of a geographic origin of shell disease. Bottom temperature, bottom salinity, and depth were also important factors affecting the spatial variability in shell disease prevalence. The delta-GAM projected high disease prevalence in non-surveyed locations. Additionally, a potential spatial discrepancy was found between modeled disease hotspots and survey-based gravity centers of disease prevalence. This study provides a modeling framework to enhance research, monitoring and management of emerging and continuing marine disease threats. PMID:28196150
Statistical physics of the symmetric group.
Williams, Mobolaji
2017-04-01
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Ezard, Thomas H.G.; Jørgensen, Peter S.; Zimmerman, Naupaka; Chamberlain, Scott; Salguero-Gómez, Roberto; Curran, Timothy J.; Poisot, Timothée
2014-01-01
Proficiency in mathematics and statistics is essential to modern ecological science, yet few studies have assessed the level of quantitative training received by ecologists. To do so, we conducted an online survey. The 937 respondents were mostly early-career scientists who studied biology as undergraduates. We found a clear self-perceived lack of quantitative training: 75% were not satisfied with their understanding of mathematical models; 75% felt that the level of mathematics was “too low” in their ecology classes; 90% wanted more mathematics classes for ecologists; and 95% more statistics classes. Respondents thought that 30% of classes in ecology-related degrees should be focused on quantitative disciplines, which is likely higher than for most existing programs. The main suggestion to improve quantitative training was to relate theoretical and statistical modeling to applied ecological problems. Improving quantitative training will require dedicated, quantitative classes for ecology-related degrees that contain good mathematical and statistical practice. PMID:24688862
Statistical physics of the symmetric group
NASA Astrophysics Data System (ADS)
Williams, Mobolaji
2017-04-01
Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Anyonic braiding in optical lattices
Zhang, Chuanwei; Scarola, V. W.; Tewari, Sumanta; Das Sarma, S.
2007-01-01
Topological quantum states of matter, both Abelian and non-Abelian, are characterized by excitations whose wavefunctions undergo nontrivial statistical transformations as one excitation is moved (braided) around another. Topological quantum computation proposes to use the topological protection and the braiding statistics of a non-Abelian topological state to perform quantum computation. The enormous technological prospect of topological quantum computation provides new motivation for experimentally observing a topological state. Here, we explicitly work out a realistic experimental scheme to create and braid the Abelian topological excitations in the Kitaev model built on a tunable robust system, a cold atom optical lattice. We also demonstrate how to detect the key feature of these excitations: their braiding statistics. Observation of this statistics would directly establish the existence of anyons, quantum particles that are neither fermions nor bosons. In addition to establishing topological matter, the experimental scheme we develop here can also be adapted to a non-Abelian topological state, supported by the same Kitaev model but in a different parameter regime, to eventually build topologically protected quantum gates. PMID:18000038
Nonlinear multi-analysis of agent-based financial market dynamics by epidemic system
NASA Astrophysics Data System (ADS)
Lu, Yunfan; Wang, Jun; Niu, Hongli
2015-10-01
Based on the epidemic dynamical system, we construct a new agent-based financial time series model. In order to check and testify its rationality, we compare the statistical properties of the time series model with the real stock market indices, Shanghai Stock Exchange Composite Index and Shenzhen Stock Exchange Component Index. For analyzing the statistical properties, we combine the multi-parameter analysis with the tail distribution analysis, the modified rescaled range analysis, and the multifractal detrended fluctuation analysis. For a better perspective, the three-dimensional diagrams are used to present the analysis results. The empirical research in this paper indicates that the long-range dependence property and the multifractal phenomenon exist in the real returns and the proposed model. Therefore, the new agent-based financial model can recurrence some important features of real stock markets.
On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.
Li, Bing; Chun, Hyonho; Zhao, Hongyu
2014-09-01
We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.
An Interinstitutional Analysis of Faculty Teaching Load.
ERIC Educational Resources Information Center
Ahrens, Stephen W.
A two-year interinstitutional study among 15 cooperating universities was conducted to determine whether significant differences exist in teaching loads among the selected universities as measured by student credit hours produced by full-time equivalent faculty. The statistical model was a multivariate analysis of variance with fixed effects and…
ERIC Educational Resources Information Center
Price, Thomas S.; Jaffee, Sara R.
2008-01-01
The classical twin study provides a useful resource for testing hypotheses about how the family environment influences children's development, including how genes can influence sensitivity to environmental effects. However, existing statistical models do not account for the possibility that children can inherit exposure to family environments…
Principles and Practice of Scaled Difference Chi-Square Testing
ERIC Educational Resources Information Center
Bryant, Fred B.; Satorra, Albert
2012-01-01
We highlight critical conceptual and statistical issues and how to resolve them in conducting Satorra-Bentler (SB) scaled difference chi-square tests. Concerning the original (Satorra & Bentler, 2001) and new (Satorra & Bentler, 2010) scaled difference tests, a fundamental difference exists in how to compute properly a model's scaling correction…
Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J
2016-05-01
Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.
A powerful score-based test statistic for detecting gene-gene co-association.
Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun
2016-01-29
The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.
Sun, Zhe; Wang, Ting; Deng, Ke; Wang, Xiao-Feng; Lafyatis, Robert; Ding, Ying; Hu, Ming; Chen, Wei
2018-01-01
Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods. DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html. wei.chen@chp.edu or hum@ccf.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Statistical distributions of avalanche size and waiting times in an inter-sandpile cascade model
NASA Astrophysics Data System (ADS)
Batac, Rene; Longjas, Anthony; Monterola, Christopher
2012-02-01
Sandpile-based models have successfully shed light on key features of nonlinear relaxational processes in nature, particularly the occurrence of fat-tailed magnitude distributions and exponential return times, from simple local stress redistributions. In this work, we extend the existing sandpile paradigm into an inter-sandpile cascade, wherein the avalanches emanating from a uniformly-driven sandpile (first layer) is used to trigger the next (second layer), and so on, in a successive fashion. Statistical characterizations reveal that avalanche size distributions evolve from a power-law p(S)≈S-1.3 for the first layer to gamma distributions p(S)≈Sαexp(-S/S0) for layers far away from the uniformly driven sandpile. The resulting avalanche size statistics is found to be associated with the corresponding waiting time distribution, as explained in an accompanying analytic formulation. Interestingly, both the numerical and analytic models show good agreement with actual inventories of non-uniformly driven events in nature.
Statistical inference to advance network models in epidemiology.
Welch, David; Bansal, Shweta; Hunter, David R
2011-03-01
Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data. Copyright © 2011 Elsevier B.V. All rights reserved.
Lin, Feng-Chang; Zhu, Jun
2012-01-01
We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.
Langevin modelling of high-frequency Hang-Seng index data
NASA Astrophysics Data System (ADS)
Tang, Lei-Han
2003-06-01
Accurate statistical characterization of financial time series, such as compound stock indices, foreign currency exchange rates, etc., is fundamental to investment risk management, pricing of derivative products and financial decision making. Traditionally, such data were analyzed and modeled from a purely statistics point of view, with little concern on the specifics of financial markets. Increasingly, however, attention has been paid to the underlying economic forces and the collective behavior of investors. Here we summarize a novel approach to the statistical modeling of a major stock index (the Hang Seng index). Based on mathematical results previously derived in the fluid turbulence literature, we show that a Langevin equation with a variable noise amplitude correctly reproduces the ubiquitous fat tails in the probability distribution of intra-day price moves. The form of the Langevin equation suggests that, despite the extremely complex nature of financial concerns and investment strategies at the individual's level, there exist simple universal rules governing the high-frequency price move in a stock market.
Solar granulation and statistical crystallography: A modeling approach using size-shape relations
NASA Technical Reports Server (NTRS)
Noever, D. A.
1994-01-01
The irregular polygonal pattern of solar granulation is analyzed for size-shape relations using statistical crystallography. In contrast to previous work which has assumed perfectly hexagonal patterns for granulation, more realistic accounting of cell (granule) shapes reveals a broader basis for quantitative analysis. Several features emerge as noteworthy: (1) a linear correlation between number of cell-sides and neighboring shapes (called Aboav-Weaire's law); (2) a linear correlation between both average cell area and perimeter and the number of cell-sides (called Lewis's law and a perimeter law, respectively) and (3) a linear correlation between cell area and squared perimeter (called convolution index). This statistical picture of granulation is consistent with a finding of no correlation in cell shapes beyond nearest neighbors. A comparative calculation between existing model predictions taken from luminosity data and the present analysis shows substantial agreements for cell-size distributions. A model for understanding grain lifetimes is proposed which links convective times to cell shape using crystallographic results.
Statistical surrogate models for prediction of high-consequence climate change.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Constantine, Paul; Field, Richard V., Jr.; Boslough, Mark Bruce Elrick
2011-09-01
In safety engineering, performance metrics are defined using probabilistic risk assessments focused on the low-probability, high-consequence tail of the distribution of possible events, as opposed to best estimates based on central tendencies. We frame the climate change problem and its associated risks in a similar manner. To properly explore the tails of the distribution requires extensive sampling, which is not possible with existing coupled atmospheric models due to the high computational cost of each simulation. We therefore propose the use of specialized statistical surrogate models (SSMs) for the purpose of exploring the probability law of various climate variables of interest.more » A SSM is different than a deterministic surrogate model in that it represents each climate variable of interest as a space/time random field. The SSM can be calibrated to available spatial and temporal data from existing climate databases, e.g., the Program for Climate Model Diagnosis and Intercomparison (PCMDI), or to a collection of outputs from a General Circulation Model (GCM), e.g., the Community Earth System Model (CESM) and its predecessors. Because of its reduced size and complexity, the realization of a large number of independent model outputs from a SSM becomes computationally straightforward, so that quantifying the risk associated with low-probability, high-consequence climate events becomes feasible. A Bayesian framework is developed to provide quantitative measures of confidence, via Bayesian credible intervals, in the use of the proposed approach to assess these risks.« less
Three-state combinatorial switch models as applied to the binding of oxygen by human hemoglobin.
Straume, M; Johnson, M L
1988-02-23
We have generated a series of all 6561 unique, discrete three-state combinatorial switch models to describe the partitioning of the cooperative oxygen-binding free change among the 10 variously ligated forms of human hemoglobin tetramers. These models were inspired by the experimental observation of Smith and Ackers that the cooperative free energy of the intersubunit contact regions of the 10 possible ligated forms of human hemoglobin tetramers can be represented by a particular distribution of three distinct energy levels [Smith, F. R., & Ackers, G. K. (1985) Proc. Natl. Acad. Sci. U.S.A. 82, 5347-5351]. A statistical thermodynamic formulation accounting for both dimer-tetramer equilibria and ligand binding properties of hemoglobin solutions as a function of oxygen and protein concentrations was utilized to exhaustively test these thermodynamic models. In this series of models each of the 10 ligated forms of the hemoglobin tetramer can exist in one, and only one, of three possible energy levels; i.e., each ligated form was assumed to be associated with a discrete energy state. This series of models includes all possible ways that the 10 ligation states of hemoglobin can be distributed into three distinct cooperative energy levels. The mathematical models, as presented here, do not permit equilibria between energy states to exist for any of the 10 unique ligated forms of hemoglobin tetramers. These models were analyzed by nonlinear least-squares estimation of the free energy parameters characteristic of this statistical thermodynamic development.(ABSTRACT TRUNCATED AT 250 WORDS)
Vehicle track segmentation using higher order random fields
Quach, Tu -Thach
2017-01-09
Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less
Vehicle track segmentation using higher order random fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quach, Tu -Thach
Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less
Statistical Methods Applied to Gamma-ray Spectroscopy Algorithms in Nuclear Security Missions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fagan, Deborah K.; Robinson, Sean M.; Runkle, Robert C.
2012-10-01
In a wide range of nuclear security missions, gamma-ray spectroscopy is a critical research and development priority. One particularly relevant challenge is the interdiction of special nuclear material for which gamma-ray spectroscopy supports the goals of detecting and identifying gamma-ray sources. This manuscript examines the existing set of spectroscopy methods, attempts to categorize them by the statistical methods on which they rely, and identifies methods that have yet to be considered. Our examination shows that current methods effectively estimate the effect of counting uncertainty but in many cases do not address larger sources of decision uncertainty—ones that are significantly moremore » complex. We thus explore the premise that significantly improving algorithm performance requires greater coupling between the problem physics that drives data acquisition and statistical methods that analyze such data. Untapped statistical methods, such as Bayes Modeling Averaging and hierarchical and empirical Bayes methods have the potential to reduce decision uncertainty by more rigorously and comprehensively incorporating all sources of uncertainty. We expect that application of such methods will demonstrate progress in meeting the needs of nuclear security missions by improving on the existing numerical infrastructure for which these analyses have not been conducted.« less
Benefit-cost estimation for alternative drinking water maximum contaminant levels
NASA Astrophysics Data System (ADS)
Gurian, Patrick L.; Small, Mitchell J.; Lockwood, John R.; Schervish, Mark J.
2001-08-01
A simulation model for estimating compliance behavior and resulting costs at U.S. Community Water Suppliers is developed and applied to the evaluation of a more stringent maximum contaminant level (MCL) for arsenic. Probability distributions of source water arsenic concentrations are simulated using a statistical model conditioned on system location (state) and source water type (surface water or groundwater). This model is fit to two recent national surveys of source waters, then applied with the model explanatory variables for the population of U.S. Community Water Suppliers. Existing treatment types and arsenic removal efficiencies are also simulated. Utilities with finished water arsenic concentrations above the proposed MCL are assumed to select the least cost option compatible with their existing treatment from among 21 available compliance strategies and processes for meeting the standard. Estimated costs and arsenic exposure reductions at individual suppliers are aggregated to estimate the national compliance cost, arsenic exposure reduction, and resulting bladder cancer risk reduction. Uncertainties in the estimates are characterized based on uncertainties in the occurrence model parameters, existing treatment types, treatment removal efficiencies, costs, and the bladder cancer dose-response function for arsenic.
Development of Accommodation Models for Soldiers in Vehicles: Squad
2014-09-01
average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed...unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT Data from a previous study of Soldier posture and position were analyzed to develop statistical...range of seat height and seat back angle. All of the models include the effects of body armor and body borne gear. 15. SUBJECT TERMS Anthropometry
A statistical method to estimate low-energy hadronic cross sections
NASA Astrophysics Data System (ADS)
Balassa, Gábor; Kovács, Péter; Wolf, György
2018-02-01
In this article we propose a model based on the Statistical Bootstrap approach to estimate the cross sections of different hadronic reactions up to a few GeV in c.m.s. energy. The method is based on the idea, when two particles collide a so-called fireball is formed, which after a short time period decays statistically into a specific final state. To calculate the probabilities we use a phase space description extended with quark combinatorial factors and the possibility of more than one fireball formation. In a few simple cases the probability of a specific final state can be calculated analytically, where we show that the model is able to reproduce the ratios of the considered cross sections. We also show that the model is able to describe proton-antiproton annihilation at rest. In the latter case we used a numerical method to calculate the more complicated final state probabilities. Additionally, we examined the formation of strange and charmed mesons as well, where we used existing data to fit the relevant model parameters.
Heterogeneous Structure of Stem Cells Dynamics: Statistical Models and Quantitative Predictions
Bogdan, Paul; Deasy, Bridget M.; Gharaibeh, Burhan; Roehrs, Timo; Marculescu, Radu
2014-01-01
Understanding stem cell (SC) population dynamics is essential for developing models that can be used in basic science and medicine, to aid in predicting cells fate. These models can be used as tools e.g. in studying patho-physiological events at the cellular and tissue level, predicting (mal)functions along the developmental course, and personalized regenerative medicine. Using time-lapsed imaging and statistical tools, we show that the dynamics of SC populations involve a heterogeneous structure consisting of multiple sub-population behaviors. Using non-Gaussian statistical approaches, we identify the co-existence of fast and slow dividing subpopulations, and quiescent cells, in stem cells from three species. The mathematical analysis also shows that, instead of developing independently, SCs exhibit a time-dependent fractal behavior as they interact with each other through molecular and tactile signals. These findings suggest that more sophisticated models of SC dynamics should view SC populations as a collective and avoid the simplifying homogeneity assumption by accounting for the presence of more than one dividing sub-population, and their multi-fractal characteristics. PMID:24769917
Stochastic Models for Precipitable Water in Convection
NASA Astrophysics Data System (ADS)
Leung, Kimberly
Atmospheric precipitable water vapor (PWV) is the amount of water vapor in the atmosphere within a vertical column of unit cross-sectional area and is a critically important parameter of precipitation processes. However, accurate high-frequency and long-term observations of PWV in the sky were impossible until the availability of modern instruments such as radar. The United States Department of Energy (DOE)'s Atmospheric Radiation Measurement (ARM) Program facility made the first systematic and high-resolution observations of PWV at Darwin, Australia since 2002. At a resolution of 20 seconds, this time series allowed us to examine the volatility of PWV, including fractal behavior with dimension equal to 1.9, higher than the Brownian motion dimension of 1.5. Such strong fractal behavior calls for stochastic differential equation modeling in an attempt to address some of the difficulties of convective parameterization in various kinds of climate models, ranging from general circulation models (GCM) to weather research forecasting (WRF) models. This important observed data at high resolution can capture the fractal behavior of PWV and enables stochastic exploration into the next generation of climate models which considers scales from micrometers to thousands of kilometers. As a first step, this thesis explores a simple stochastic differential equation model of water mass balance for PWV and assesses accuracy, robustness, and sensitivity of the stochastic model. A 1000-day simulation allows for the determination of the best-fitting 25-day period as compared to data from the TWP-ICE field campaign conducted out of Darwin, Australia in early 2006. The observed data and this portion of the simulation had a correlation coefficient of 0.6513 and followed similar statistics and low-resolution temporal trends. Building on the point model foundation, a similar algorithm was applied to the National Center for Atmospheric Research (NCAR)'s existing single-column model as a test-of-concept for eventual inclusion in a general circulation model. The stochastic scheme was designed to be coupled with the deterministic single-column simulation by modifying results of the existing convective scheme (Zhang-McFarlane) and was able to produce a 20-second resolution time series that effectively simulated observed PWV, as measured by correlation coefficient (0.5510), fractal dimension (1.9), statistics, and visual examination of temporal trends. Results indicate that simulation of a highly volatile time series of observed PWV is certainly achievable and has potential to improve prediction capabilities in climate modeling. Further, this study demonstrates the feasibility of adding a mathematics- and statistics-based stochastic scheme to an existing deterministic parameterization to simulate observed fractal behavior.
Deep space network software cost estimation model
NASA Technical Reports Server (NTRS)
Tausworthe, R. C.
1981-01-01
A parametric software cost estimation model prepared for Deep Space Network (DSN) Data Systems implementation tasks is presented. The resource estimation model incorporates principles and data from a number of existing models. The model calibrates task magnitude and difficulty, development environment, and software technology effects through prompted responses to a set of approximately 50 questions. Parameters in the model are adjusted to fit DSN software life cycle statistics. The estimation model output scales a standard DSN Work Breakdown Structure skeleton, which is then input into a PERT/CPM system, producing a detailed schedule and resource budget for the project being planned.
NASA Astrophysics Data System (ADS)
Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.
2009-09-01
The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 70's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The study shows the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.
Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.
Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying
2018-01-01
Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.
The statistical analysis of global climate change studies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hardin, J.W.
1992-01-01
The focus of this work is to contribute to the enhancement of the relationship between climatologists and statisticians. The analysis of global change data has been underway for many years by atmospheric scientists. Much of this analysis includes a heavy reliance on statistics and statistical inference. Some specific climatological analyses are presented and the dependence on statistics is documented before the analysis is undertaken. The first problem presented involves the fluctuation-dissipation theorem and its application to global climate models. This problem has a sound theoretical niche in the literature of both climate modeling and physics, but a statistical analysis inmore » which the data is obtained from the model to show graphically the relationship has not been undertaken. It is under this motivation that the author presents this problem. A second problem concerning the standard errors in estimating global temperatures is purely statistical in nature although very little materials exists for sampling on such a frame. This problem not only has climatological and statistical ramifications, but political ones as well. It is planned to use these results in a further analysis of global warming using actual data collected on the earth. In order to simplify the analysis of these problems, the development of a computer program, MISHA, is presented. This interactive program contains many of the routines, functions, graphics, and map projections needed by the climatologist in order to effectively enter the arena of data visualization.« less
A κ-generalized statistical mechanics approach to income analysis
NASA Astrophysics Data System (ADS)
Clementi, F.; Gallegati, M.; Kaniadakis, G.
2009-02-01
This paper proposes a statistical mechanics approach to the analysis of income distribution and inequality. A new distribution function, having its roots in the framework of κ-generalized statistics, is derived that is particularly suitable for describing the whole spectrum of incomes, from the low-middle income region up to the high income Pareto power-law regime. Analytical expressions for the shape, moments and some other basic statistical properties are given. Furthermore, several well-known econometric tools for measuring inequality, which all exist in a closed form, are considered. A method for parameter estimation is also discussed. The model is shown to fit remarkably well the data on personal income for the United States, and the analysis of inequality performed in terms of its parameters is revealed as very powerful.
A Review of the Statistical and Quantitative Methods Used to Study Alcohol-Attributable Crime.
Fitterer, Jessica L; Nelson, Trisalyn A
2015-01-01
Modelling the relationship between alcohol consumption and crime generates new knowledge for crime prevention strategies. Advances in data, particularly data with spatial and temporal attributes, have led to a growing suite of applied methods for modelling. In support of alcohol and crime researchers we synthesized and critiqued existing methods of spatially and quantitatively modelling the effects of alcohol exposure on crime to aid method selection, and identify new opportunities for analysis strategies. We searched the alcohol-crime literature from 1950 to January 2014. Analyses that statistically evaluated or mapped the association between alcohol and crime were included. For modelling purposes, crime data were most often derived from generalized police reports, aggregated to large spatial units such as census tracts or postal codes, and standardized by residential population data. Sixty-eight of the 90 selected studies included geospatial data of which 48 used cross-sectional datasets. Regression was the prominent modelling choice (n = 78) though dependent on data many variations existed. There are opportunities to improve information for alcohol-attributable crime prevention by using alternative population data to standardize crime rates, sourcing crime information from non-traditional platforms (social media), increasing the number of panel studies, and conducting analysis at the local level (neighbourhood, block, or point). Due to the spatio-temporal advances in crime data, we expect a continued uptake of flexible Bayesian hierarchical modelling, a greater inclusion of spatial-temporal point pattern analysis, and shift toward prospective (forecast) modelling over small areas (e.g., blocks).
A Review of the Statistical and Quantitative Methods Used to Study Alcohol-Attributable Crime
Fitterer, Jessica L.; Nelson, Trisalyn A.
2015-01-01
Modelling the relationship between alcohol consumption and crime generates new knowledge for crime prevention strategies. Advances in data, particularly data with spatial and temporal attributes, have led to a growing suite of applied methods for modelling. In support of alcohol and crime researchers we synthesized and critiqued existing methods of spatially and quantitatively modelling the effects of alcohol exposure on crime to aid method selection, and identify new opportunities for analysis strategies. We searched the alcohol-crime literature from 1950 to January 2014. Analyses that statistically evaluated or mapped the association between alcohol and crime were included. For modelling purposes, crime data were most often derived from generalized police reports, aggregated to large spatial units such as census tracts or postal codes, and standardized by residential population data. Sixty-eight of the 90 selected studies included geospatial data of which 48 used cross-sectional datasets. Regression was the prominent modelling choice (n = 78) though dependent on data many variations existed. There are opportunities to improve information for alcohol-attributable crime prevention by using alternative population data to standardize crime rates, sourcing crime information from non-traditional platforms (social media), increasing the number of panel studies, and conducting analysis at the local level (neighbourhood, block, or point). Due to the spatio-temporal advances in crime data, we expect a continued uptake of flexible Bayesian hierarchical modelling, a greater inclusion of spatial-temporal point pattern analysis, and shift toward prospective (forecast) modelling over small areas (e.g., blocks). PMID:26418016
Statistical physics of interacting neural networks
NASA Astrophysics Data System (ADS)
Kinzel, Wolfgang; Metzler, Richard; Kanter, Ido
2001-12-01
Recent results on the statistical physics of time series generation and prediction are presented. A neural network is trained on quasi-periodic and chaotic sequences and overlaps to the sequence generator as well as the prediction errors are calculated numerically. For each network there exists a sequence for which it completely fails to make predictions. Two interacting networks show a transition to perfect synchronization. A pool of interacting networks shows good coordination in the minority game-a model of competition in a closed market. Finally, as a demonstration, a perceptron predicts bit sequences produced by human beings.
Hensman, James; Lawrence, Neil D; Rattray, Magnus
2013-08-20
Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.
NASA Astrophysics Data System (ADS)
Thomas, E. G.; Shepherd, S. G.
2017-12-01
Global patterns of ionospheric convection have been widely studied in terms of the interplanetary magnetic field (IMF) magnitude and orientation in both the Northern and Southern Hemispheres using observations from the Super Dual Auroral Radar Network (SuperDARN). The dynamic range of driving conditions under which existing SuperDARN statistical models are valid is currently limited to periods when the high-latitude convection pattern remains above about 60° geomagnetic latitude. Cousins and Shepherd [2010] found this to correspond to intervals when the solar wind electric field Esw < 4.1 mV/m and IMF Bz is negative. Conversely, under northward IMF conditions (Bz > 0) the high-latitude radars often experience difficulties in measuring convection above about 85° geomagnetic latitude. In this presentation, we introduce a new statistical model of ionospheric convection which is valid for much more dominant IMF Bz conditions than was previously possible by including velocity measurements from the newly constructed tiers of radars in the Northern Hemisphere at midlatitudes and in the polar cap. This new model (TS17) is compared to previous statistical models derived from high-latitude SuperDARN observations (RG96, PSR10, CS10) and its impact on instantaneous Map Potential solutions is examined.
The Reliability of Setting Grade Boundaries Using Comparative Judgement
ERIC Educational Resources Information Center
Benton, Tom; Elliott, Gill
2016-01-01
In recent years the use of expert judgement to set and maintain examination standards has been increasingly criticised in favour of approaches based on statistical modelling. This paper reviews existing research on this controversy and attempts to unify the evidence within a framework where expertise is utilised in the form of comparative…
Participatory Equity and Student Outcomes in Living-Learning Programs of Differing Thematic Types
ERIC Educational Resources Information Center
Soldner, Matthew Edward
2011-01-01
This study evaluated participatory equity in varying thematic types of living-learning programs and, for a subset of student group x program type combinations found to be below equity, used latent mean modeling to determine whether statistically significant mean differences existed between the outcome scores of living-learning participants and…
Statistical Power for a Simultaneous Test of Factorial and Predictive Invariance
ERIC Educational Resources Information Center
Olivera-Aguilar, Margarita; Millsap, Roger E.
2013-01-01
A common finding in studies of differential prediction across groups is that although regression slopes are the same or similar across groups, group differences exist in regression intercepts. Building on earlier work by Birnbaum (1979), Millsap (1998) presented an invariant factor model that would explain such intercept differences as arising due…
USDA-ARS?s Scientific Manuscript database
The generation of realistic future precipitation scenarios is crucial for assessing their impacts on a range of environmental and socio-economic impact sectors. A scale mismatch exists, however, between the coarse spatial resolution at which global climate models (GCMs) output future climate scenari...
NASA Astrophysics Data System (ADS)
Steger, Stefan; Brenning, Alexander; Bell, Rainer; Glade, Thomas
2016-12-01
There is unanimous agreement that a precise spatial representation of past landslide occurrences is a prerequisite to produce high quality statistical landslide susceptibility models. Even though perfectly accurate landslide inventories rarely exist, investigations of how landslide inventory-based errors propagate into subsequent statistical landslide susceptibility models are scarce. The main objective of this research was to systematically examine whether and how inventory-based positional inaccuracies of different magnitudes influence modelled relationships, validation results, variable importance and the visual appearance of landslide susceptibility maps. The study was conducted for a landslide-prone site located in the districts of Amstetten and Waidhofen an der Ybbs, eastern Austria, where an earth-slide point inventory was available. The methodological approach comprised an artificial introduction of inventory-based positional errors into the present landslide data set and an in-depth evaluation of subsequent modelling results. Positional errors were introduced by artificially changing the original landslide position by a mean distance of 5, 10, 20, 50 and 120 m. The resulting differently precise response variables were separately used to train logistic regression models. Odds ratios of predictor variables provided insights into modelled relationships. Cross-validation and spatial cross-validation enabled an assessment of predictive performances and permutation-based variable importance. All analyses were additionally carried out with synthetically generated data sets to further verify the findings under rather controlled conditions. The results revealed that an increasing positional inventory-based error was generally related to increasing distortions of modelling and validation results. However, the findings also highlighted that interdependencies between inventory-based spatial inaccuracies and statistical landslide susceptibility models are complex. The systematic comparisons of 12 models provided valuable evidence that the respective error-propagation was not only determined by the degree of positional inaccuracy inherent in the landslide data, but also by the spatial representation of landslides and the environment, landslide magnitude, the characteristics of the study area, the selected classification method and an interplay of predictors within multiple variable models. Based on the results, we deduced that a direct propagation of minor to moderate inventory-based positional errors into modelling results can be partly counteracted by adapting the modelling design (e.g. generalization of input data, opting for strongly generalizing classifiers). Since positional errors within landslide inventories are common and subsequent modelling and validation results are likely to be distorted, the potential existence of inventory-based positional inaccuracies should always be considered when assessing landslide susceptibility by means of empirical models.
An Overview of R in Health Decision Sciences.
Jalal, Hawre; Pechlivanoglou, Petros; Krijkamp, Eline; Alarid-Escudero, Fernando; Enns, Eva; Hunink, M G Myriam
2017-10-01
As the complexity of health decision science applications increases, high-level programming languages are increasingly adopted for statistical analyses and numerical computations. These programming languages facilitate sophisticated modeling, model documentation, and analysis reproducibility. Among the high-level programming languages, the statistical programming framework R is gaining increased recognition. R is freely available, cross-platform compatible, and open source. A large community of users who have generated an extensive collection of well-documented packages and functions supports it. These functions facilitate applications of health decision science methodology as well as the visualization and communication of results. Although R's popularity is increasing among health decision scientists, methodological extensions of R in the field of decision analysis remain isolated. The purpose of this article is to provide an overview of existing R functionality that is applicable to the various stages of decision analysis, including model design, input parameter estimation, and analysis of model outputs.
NASA Astrophysics Data System (ADS)
Gimenez, M. Cecilia; Paz García, Ana Pamela; Burgos Paci, Maxi A.; Reinaudi, Luis
2016-04-01
The evolution of public opinion using tools and concepts borrowed from Statistical Physics is an emerging area within the field of Sociophysics. In the present paper, a Statistical Physics model was developed to study the evolution of the ideological self-positioning of an ensemble of agents. The model consists of an array of L components, each one of which represents the ideology of an agent. The proposed mechanism is based on the ;voter model;, in which one agent can adopt the opinion of another one if the difference of their opinions lies within a certain range. The existence of ;undecided; agents (i.e. agents with no definite opinion) was implemented in the model. The possibility of radicalization of an agent's opinion upon interaction with another one was also implemented. The results of our simulations are compared to statistical data taken from the Latinobarómetro databank for the cases of Argentina, Chile, Brazil and Uruguay in the last decade. Among other results, the effect of taking into account the undecided agents is the formation of a single peak at the middle of the ideological spectrum (which corresponds to a centrist ideological position), in agreement with the real cases studied.
Statistical downscaling of precipitation using long short-term memory recurrent neural networks
NASA Astrophysics Data System (ADS)
Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra
2017-11-01
Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
Is there a metric for mineral deposit occurrence probabilities?
Drew, L.J.; Menzie, W.D.
1993-01-01
Traditionally, mineral resource assessments have been used to estimate the physical inventory of critical and strategic mineral commodities that occur in pieces of land and to assess the consequences of supply disruptions of these commodities. More recently, these assessments have been used to estimate the undiscovered mineral wealth in such pieces of land to assess the opportunity cost of using the land for purposes other than mineral production. The field of mineral resource assessment is an interdisciplinary field that draws elements from the disciplines of geology, economic geology (descriptive models), statistics and management science (grade and tonnage models), mineral economics, and operations research (computer simulation models). The purpose of this study is to assert that an occurrenceprobability metric exists that is useful in "filling out" an assessment both for areas in which only a trivial probability exists that a new mining district could be present and for areas where nontrivial probabilities exist for such districts. ?? 1993 Oxford University Press.
[Application of Stata software to test heterogeneity in meta-analysis method].
Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong
2008-07-01
To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.
Automation of Ocean Product Metrics
2008-09-30
Presented in: Ocean Sciences 2008 Conf., 5 Mar 2008. Shriver, J., J. D. Dykes, and J. Fabre: Automation of Operational Ocean Product Metrics. Presented in 2008 EGU General Assembly , 14 April 2008. 9 ...processing (multiple data cuts per day) and multiple-nested models. Routines for generating automated evaluations of model forecast statistics will be...developed and pre-existing tools will be collected to create a generalized tool set, which will include user-interface tools to the metrics data
Wiedermann, Wolfgang; Li, Xintong
2018-04-16
In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.
Wang, Ming; Long, Qi
2016-09-01
Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kruger, Albert A.; Muller, I.; Gilbo, K.
2013-11-13
The objectives of this work are aimed at the development of enhanced LAW propertycomposition models that expand the composition region covered by the models. The models of interest include PCT, VHT, viscosity and electrical conductivity. This is planned as a multi-year effort that will be performed in phases with the objectives listed below for the current phase. Incorporate property- composition data from the new glasses into the database. Assess the database and identify composition spaces in the database that need augmentation. Develop statistically-designed composition matrices to cover the composition regions identified in the above analysis. Preparemore » crucible melts of glass compositions from the statistically-designed composition matrix and measure the properties of interest. Incorporate the above property-composition data into the database. Assess existing models against the complete dataset and, as necessary, start development of new models.« less
NASA Astrophysics Data System (ADS)
Wang, Dong
2016-03-01
Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
1988-09-01
S P a .E REPORT DOCUMENTATION PAGE OMR;oJ ’ , CRR Eo Dale n2 ;R6 ’a 4EPOR- SCRFT CASS F.C.T ON ’b RES’RICTI’,E MARKINGS Unclassified a ECRIT y...and selection of test waves 30. Measured prototype wave data on which a comprehensive statistical analysis of wave conditions could be based were...Tests Existing conditions 32. Prior to testing of the various improvement plans, comprehensive tests were conducted for existing conditions (Plate 1
A 3D model retrieval approach based on Bayesian networks lightfield descriptor
NASA Astrophysics Data System (ADS)
Xiao, Qinhan; Li, Yanjun
2009-12-01
A new 3D model retrieval methodology is proposed by exploiting a novel Bayesian networks lightfield descriptor (BNLD). There are two key novelties in our approach: (1) a BN-based method for building lightfield descriptor; and (2) a 3D model retrieval scheme based on the proposed BNLD. To overcome the disadvantages of the existing 3D model retrieval methods, we explore BN for building a new lightfield descriptor. Firstly, 3D model is put into lightfield, about 300 binary-views can be obtained along a sphere, then Fourier descriptors and Zernike moments descriptors can be calculated out from binaryviews. Then shape feature sequence would be learned into a BN model based on BN learning algorithm; Secondly, we propose a new 3D model retrieval method by calculating Kullback-Leibler Divergence (KLD) between BNLDs. Beneficial from the statistical learning, our BNLD is noise robustness as compared to the existing methods. The comparison between our method and the lightfield descriptor-based approach is conducted to demonstrate the effectiveness of our proposed methodology.
Current algebra, statistical mechanics and quantum models
NASA Astrophysics Data System (ADS)
Vilela Mendes, R.
2017-11-01
Results obtained in the past for free boson systems at zero and nonzero temperatures are revisited to clarify the physical meaning of current algebra reducible functionals which are associated to systems with density fluctuations, leading to observable effects on phase transitions. To use current algebra as a tool for the formulation of quantum statistical mechanics amounts to the construction of unitary representations of diffeomorphism groups. Two mathematical equivalent procedures exist for this purpose. One searches for quasi-invariant measures on configuration spaces, the other for a cyclic vector in Hilbert space. Here, one argues that the second approach is closer to the physical intuition when modelling complex systems. An example of application of the current algebra methodology to the pairing phenomenon in two-dimensional fermion systems is discussed.
Stationarity: Wanted dead or alive?
Lins, Larry F.; Cohn, Timothy A.
2011-01-01
Aligning engineering practice with natural process behavior would appear, on its face, to be a prudent and reasonable course of action. However, if we do not understand the long-term characteristics of hydroclimatic processes, how does one find the prudent and reasonable course needed for water management? We consider this question in light of three aspects of existing and unresolved issues affecting hydroclimatic variability and statistical inference: Hurst-Kolmogorov phenomena; the complications long-term persistence introduces with respect to statistical understanding; and the dependence of process understanding on arbitrary sampling choices. These problems are not easily addressed. In such circumstances, humility may be more important than physics; a simple model with well-understood flaws may be preferable to a sophisticated model whose correspondence to reality is uncertain.
Desensitized Optimal Filtering and Sensor Fusion Toolkit
NASA Technical Reports Server (NTRS)
Karlgaard, Christopher D.
2015-01-01
Analytical Mechanics Associates, Inc., has developed a software toolkit that filters and processes navigational data from multiple sensor sources. A key component of the toolkit is a trajectory optimization technique that reduces the sensitivity of Kalman filters with respect to model parameter uncertainties. The sensor fusion toolkit also integrates recent advances in adaptive Kalman and sigma-point filters for non-Gaussian problems with error statistics. This Phase II effort provides new filtering and sensor fusion techniques in a convenient package that can be used as a stand-alone application for ground support and/or onboard use. Its modular architecture enables ready integration with existing tools. A suite of sensor models and noise distribution as well as Monte Carlo analysis capability are included to enable statistical performance evaluations.
Statistical analysis of fNIRS data: a comprehensive review.
Tak, Sungho; Ye, Jong Chul
2014-01-15
Functional near-infrared spectroscopy (fNIRS) is a non-invasive method to measure brain activities using the changes of optical absorption in the brain through the intact skull. fNIRS has many advantages over other neuroimaging modalities such as positron emission tomography (PET), functional magnetic resonance imaging (fMRI), or magnetoencephalography (MEG), since it can directly measure blood oxygenation level changes related to neural activation with high temporal resolution. However, fNIRS signals are highly corrupted by measurement noises and physiology-based systemic interference. Careful statistical analyses are therefore required to extract neuronal activity-related signals from fNIRS data. In this paper, we provide an extensive review of historical developments of statistical analyses of fNIRS signal, which include motion artifact correction, short source-detector separation correction, principal component analysis (PCA)/independent component analysis (ICA), false discovery rate (FDR), serially-correlated errors, as well as inference techniques such as the standard t-test, F-test, analysis of variance (ANOVA), and statistical parameter mapping (SPM) framework. In addition, to provide a unified view of various existing inference techniques, we explain a linear mixed effect model with restricted maximum likelihood (ReML) variance estimation, and show that most of the existing inference methods for fNIRS analysis can be derived as special cases. Some of the open issues in statistical analysis are also described. Copyright © 2013 Elsevier Inc. All rights reserved.
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.
Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R
2018-06-01
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.
Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro
2016-01-01
In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.
Allele-sharing models: LOD scores and accurate linkage tests.
Kong, A; Cox, N J
1997-11-01
Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested.
Allele-sharing models: LOD scores and accurate linkage tests.
Kong, A; Cox, N J
1997-01-01
Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested. PMID:9345087
Variety and volatility in financial markets
NASA Astrophysics Data System (ADS)
Lillo, Fabrizio; Mantegna, Rosario N.
2000-11-01
We study the price dynamics of stocks traded in a financial market by considering the statistical properties of both a single time series and an ensemble of stocks traded simultaneously. We use the n stocks traded on the New York Stock Exchange to form a statistical ensemble of daily stock returns. For each trading day of our database, we study the ensemble return distribution. We find that a typical ensemble return distribution exists in most of the trading days with the exception of crash and rally days and of the days following these extreme events. We analyze each ensemble return distribution by extracting its first two central moments. We observe that these moments fluctuate in time and are stochastic processes, themselves. We characterize the statistical properties of ensemble return distribution central moments by investigating their probability density functions and temporal correlation properties. In general, time-averaged and portfolio-averaged price returns have different statistical properties. We infer from these differences information about the relative strength of correlation between stocks and between different trading days. Last, we compare our empirical results with those predicted by the single-index model and we conclude that this simple model cannot explain the statistical properties of the second moment of the ensemble return distribution.
Murchie, Brent; Tandon, Kanwarpreet; Hakim, Seifeldin; Shah, Kinchit; O'Rourke, Colin; Castro, Fernando J
2017-04-01
Colorectal cancer (CRC) screening guidelines likely over-generalizes CRC risk, 35% of Americans are not up to date with screening, and there is growing incidence of CRC in younger patients. We developed a practical prediction model for high-risk colon adenomas in an average-risk population, including an expanded definition of high-risk polyps (≥3 nonadvanced adenomas), exposing higher than average-risk patients. We also compared results with previously created calculators. Patients aged 40 to 59 years, undergoing first-time average-risk screening or diagnostic colonoscopies were evaluated. Risk calculators for advanced adenomas and high-risk adenomas were created based on age, body mass index, sex, race, and smoking history. Previously established calculators with similar risk factors were selected for comparison of concordance statistic (c-statistic) and external validation. A total of 5063 patients were included. Advanced adenomas, and high-risk adenomas were seen in 5.7% and 7.4% of the patient population, respectively. The c-statistic for our calculator was 0.639 for the prediction of advanced adenomas, and 0.650 for high-risk adenomas. When applied to our population, all previous models had lower c-statistic results although one performed similarly. Our model compares favorably to previously established prediction models. Age and body mass index were used as continuous variables, likely improving the c-statistic. It also reports absolute predictive probabilities of advanced and high-risk polyps, allowing for more individualized risk assessment of CRC.
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.
Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao
2015-08-01
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.
A Developed Meta-model for Selection of Cotton Fabrics Using Design of Experiments and TOPSIS Method
NASA Astrophysics Data System (ADS)
Chakraborty, Shankar; Chatterjee, Prasenjit
2017-12-01
Selection of cotton fabrics for providing optimal clothing comfort is often considered as a multi-criteria decision making problem consisting of an array of candidate alternatives to be evaluated based of several conflicting properties. In this paper, design of experiments and technique for order preference by similarity to ideal solution (TOPSIS) are integrated so as to develop regression meta-models for identifying the most suitable cotton fabrics with respect to the computed TOPSIS scores. The applicability of the adopted method is demonstrated using two real time examples. These developed models can also identify the statistically significant fabric properties and their interactions affecting the measured TOPSIS scores and final selection decisions. There exists good degree of congruence between the ranking patterns as derived using these meta-models and the existing methods for cotton fabric ranking and subsequent selection.
Population models for passerine birds: structure, parameterization, and analysis
Noon, B.R.; Sauer, J.R.; McCullough, D.R.; Barrett, R.H.
1992-01-01
Population models have great potential as management tools, as they use infonnation about the life history of a species to summarize estimates of fecundity and survival into a description of population change. Models provide a framework for projecting future populations, determining the effects of management decisions on future population dynamics, evaluating extinction probabilities, and addressing a variety of questions of ecological and evolutionary interest. Even when insufficient information exists to allow complete identification of the model, the modelling procedure is useful because it forces the investigator to consider the life history of the species when determining what parameters should be estimated from field studies and provides a context for evaluating the relative importance of demographic parameters. Models have been little used in the study of the population dynamics of passerine birds because of: (1) widespread misunderstandings of the model structures and parameterizations, (2) a lack of knowledge of life histories of many species, (3) difficulties in obtaining statistically reliable estimates of demographic parameters for most passerine species, and (4) confusion about functional relationships among demographic parameters. As a result, studies of passerine demography are often designed inappropriately and fail to provide essential data. We review appropriate models for passerine bird populations and illustrate their possible uses in evaluating the effects of management or other environmental influences on population dynamics. We identify environmental influences on population dynamics. We identify parameters that must be estimated from field data, briefly review existing statistical methods for obtaining valid estimates, and evaluate the present status of knowledge of these parameters.
Osman, Mugtaba; Parnell, Andrew C; Haley, Clifford
2017-02-01
Suicide is criminalized in more than 100 countries around the world. A dearth of research exists into the effect of suicide legislation on suicide rates and available statistics are mixed. This study investigates 10,353 suicide deaths in Ireland that took place between 1970 and 2000. Irish 1970-2000 annual suicide data were obtained from the Central Statistics Office and modelled via a negative binomial regression approach. We examined the effect of suicide legislation on different age groups and on both sexes. We used Bonferroni correction for multiple modelling. Statistical analysis was performed using the R statistical package version 3.1.2. The coefficient for the effect of suicide act on overall suicide deaths was -9.094 (95 % confidence interval (CI) -34.086 to 15.899), statistically non-significant (p = 0.476). The coefficient for the effect suicide act on undetermined deaths was statistically significant (p < 0.001) and was estimated to be -644.4 (95 % CI -818.6 to -469.9). The results of our study indicate that legalization of suicide is not associated with a significant increase in subsequent suicide deaths. However, undetermined death verdict rates have significantly dropped following legalization of suicide.
Takahara, Mitsuyoshi; Katakami, Naoto; Kaneto, Hideaki; Noguchi, Midori; Shimomura, Iichiro
2014-01-01
The aim of the current study was to develop a predictive model of insulin resistance using general health checkup data in Japanese employees with one or more metabolic risk factors. We used a database of 846 Japanese employees with one or more metabolic risk factors who underwent general health checkup and a 75-g oral glucose tolerance test (OGTT). Logistic regression models were developed to predict existing insulin resistance evaluated using the Matsuda index. The predictive performance of these models was assessed using the C statistic. The C statistics of body mass index (BMI), waist circumference and their combined use were 0.743, 0.732 and 0.749, with no significant differences. The multivariate backward selection model, in which BMI, the levels of plasma glucose, high-density lipoprotein (HDL) cholesterol, log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment remained, had a C statistic of 0.816, with a significant difference compared to the combined use of BMI and waist circumference (p<0.01). The C statistic was not significantly reduced when the levels of log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment were simultaneously excluded from the multivariate model (p=0.14). On the other hand, further exclusion of any of the remaining three variables significantly reduced the C statistic (all p<0.01). When predicting the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors, it is important to take into consideration the BMI and fasting plasma glucose and HDL cholesterol levels.
Exploring Explanations of Subglacial Bedform Sizes Using Statistical Models.
Hillier, John K; Kougioumtzoglou, Ioannis A; Stokes, Chris R; Smith, Michael J; Clark, Chris D; Spagnolo, Matteo S
2016-01-01
Sediments beneath modern ice sheets exert a key control on their flow, but are largely inaccessible except through geophysics or boreholes. In contrast, palaeo-ice sheet beds are accessible, and typically characterised by numerous bedforms. However, the interaction between bedforms and ice flow is poorly constrained and it is not clear how bedform sizes might reflect ice flow conditions. To better understand this link we present a first exploration of a variety of statistical models to explain the size distribution of some common subglacial bedforms (i.e., drumlins, ribbed moraine, MSGL). By considering a range of models, constructed to reflect key aspects of the physical processes, it is possible to infer that the size distributions are most effectively explained when the dynamics of ice-water-sediment interaction associated with bedform growth is fundamentally random. A 'stochastic instability' (SI) model, which integrates random bedform growth and shrinking through time with exponential growth, is preferred and is consistent with other observations of palaeo-bedforms and geophysical surveys of active ice sheets. Furthermore, we give a proof-of-concept demonstration that our statistical approach can bridge the gap between geomorphological observations and physical models, directly linking measurable size-frequency parameters to properties of ice sheet flow (e.g., ice velocity). Moreover, statistically developing existing models as proposed allows quantitative predictions to be made about sizes, making the models testable; a first illustration of this is given for a hypothesised repeat geophysical survey of bedforms under active ice. Thus, we further demonstrate the potential of size-frequency distributions of subglacial bedforms to assist the elucidation of subglacial processes and better constrain ice sheet models.
Empirical Reference Distributions for Networks of Different Size
Smith, Anna; Calder, Catherine A.; Browning, Christopher R.
2016-01-01
Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although “normalized” versions of some network statistics exist, we demonstrate via simulation why direct comparison is often inappropriate. We consider normalizing network statistics relative to a simple fully parameterized reference distribution and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of ecological networks derived from the Los Angeles Family and Neighborhood Survey activity location data. PMID:27721556
Zhao, Tuo; Liu, Han
2016-01-01
We propose an accelerated path-following iterative shrinkage thresholding algorithm (APISTA) for solving high dimensional sparse nonconvex learning problems. The main difference between APISTA and the path-following iterative shrinkage thresholding algorithm (PISTA) is that APISTA exploits an additional coordinate descent subroutine to boost the computational performance. Such a modification, though simple, has profound impact: APISTA not only enjoys the same theoretical guarantee as that of PISTA, i.e., APISTA attains a linear rate of convergence to a unique sparse local optimum with good statistical properties, but also significantly outperforms PISTA in empirical benchmarks. As an application, we apply APISTA to solve a family of nonconvex optimization problems motivated by estimating sparse semiparametric graphical models. APISTA allows us to obtain new statistical recovery results which do not exist in the existing literature. Thorough numerical results are provided to back up our theory. PMID:28133430
Zero-state Markov switching count-data models: an empirical assessment.
Malyshkina, Nataliya V; Mannering, Fred L
2010-01-01
In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.
Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John
2018-03-07
DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
Havens, Timothy C; Roggemann, Michael C; Schulz, Timothy J; Brown, Wade W; Beyer, Jeff T; Otten, L John
2002-05-20
We discuss a method of data reduction and analysis that has been developed for a novel experiment to detect anisotropic turbulence in the tropopause and to measure the spatial statistics of these flows. The experimental concept is to make measurements of temperature at 15 points on a hexagonal grid for altitudes from 12,000 to 18,000 m while suspended from a balloon performing a controlled descent. From the temperature data, we estimate the index of refraction and study the spatial statistics of the turbulence-induced index of refraction fluctuations. We present and evaluate the performance of a processing approach to estimate the parameters of an anisotropic model for the spatial power spectrum of the turbulence-induced index of refraction fluctuations. A Gaussian correlation model and a least-squares optimization routine are used to estimate the parameters of the model from the measurements. In addition, we implemented a quick-look algorithm to have a computationally nonintensive way of viewing the autocorrelation function of the index fluctuations. The autocorrelation of the index of refraction fluctuations is binned and interpolated onto a uniform grid from the sparse points that exist in our experiment. This allows the autocorrelation to be viewed with a three-dimensional plot to determine whether anisotropy exists in a specific data slab. Simulation results presented here show that, in the presence of the anticipated levels of measurement noise, the least-squares estimation technique allows turbulence parameters to be estimated with low rms error.
Rhodes, Lindsay A; Huisingh, Carrie E; Quinn, Adam E; McGwin, Gerald; LaRussa, Frank; Box, Daniel; Owsley, Cynthia; Girkin, Christopher A
2017-02-01
To examine if racial differences in Bruch's membrane opening minimum rim width (BMO-MRW) in spectral-domain optical coherence tomography (SDOCT) exist, specifically between people of African descent (AD) and European descent (ED) in normal ocular health. Cross-sectional study. Patients presenting for a comprehensive eye examination at retail-based primary eye clinics were enrolled based on ≥1 of the following at-risk criteria for glaucoma: AD aged ≥40 years, ED aged ≥50 years, diabetes, family history of glaucoma, and/or pre-existing diagnosis of glaucoma. Participants with normal optic nerves on examination received SDOCT of the optic nerve head (24 radial scans). Global and regional (temporal, superotemporal, inferotemporal, nasal, superonasal, and inferonasal) BMO-MRW were measured and compared by race using generalized estimating equations. Models were adjusted for age, sex, and BMO area. SDOCT scans from 269 eyes (148 participants) were included in the analysis. Mean global BMO-MRW declined as age increased. After adjusting for age, sex, and BMO area, there was not a statistically significant difference in mean global BMO-MRW by race (P = .60). Regionally, the mean BMO-MRW was lower in the crude model among AD eyes in the temporal, superotemporal, and nasal regions and higher in the inferotemporal, superonasal, and inferonasal regions. However, in the adjusted model, these differences were not statistically significant. BMO-MRW was not statistically different between those of AD and ED. Race-specific normative data may not be necessary for the deployment of BMO-MRW in AD patients. Copyright © 2016 Elsevier Inc. All rights reserved.
A Class of Population Covariance Matrices in the Bootstrap Approach to Covariance Structure Analysis
ERIC Educational Resources Information Center
Yuan, Ke-Hai; Hayashi, Kentaro; Yanagihara, Hirokazu
2007-01-01
Model evaluation in covariance structure analysis is critical before the results can be trusted. Due to finite sample sizes and unknown distributions of real data, existing conclusions regarding a particular statistic may not be applicable in practice. The bootstrap procedure automatically takes care of the unknown distribution and, for a given…
ERIC Educational Resources Information Center
Korkofingas, Con; Macri, Joseph
2013-01-01
This paper examines, using regression modelling, whether a statistically significant relationship exists between the time spent by a student using the course website and the student's assessment performance for a large third year university business forecasting course. We utilise the online tracking system in Blackboard, a web-based software…
Metrics, The Measure of Your Future: Evaluation Report, 1977.
ERIC Educational Resources Information Center
North Carolina State Dept. of Public Instruction, Raleigh. Div. of Development.
The primary goal of the Metric Education Project was the systematic development of a replicable educational model to facilitate the system-wide conversion to the metric system during the next five to ten years. This document is an evaluation of that project. Three sets of statistical evidence exist to support the fact that the project has been…
The National Evaluation of School Nutrition Programs. Final Report - Executive Summary.
ERIC Educational Resources Information Center
Radzikowski, Jack
This is a summary of the final report of a study (begun in 1979) of the National School Lunch, School Breakfast, and Special Milk Programs. The major objectives of the evaluation were to (1) identify existing information on the school nutrition programs; (2) identify determinants of participation in the programs and develop statistical models for…
Student Financial Aid and Choice of Undergraduate Major. ASHE Annual Meeting Paper.
ERIC Educational Resources Information Center
Zito, Eileen H.
This study evaluated whether the use of educational loans has an impact upon student choice of majors. In addition, the study demonstrated that the statistical technique of two-stage least squares can be appropriately used with educational data when reciprocal causation exists in the theoretical model. It was hypothesized that, since a majority of…
Path statistics, memory, and coarse-graining of continuous-time random walks on networks
Kion-Crosby, Willow; Morozov, Alexandre V.
2015-01-01
Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states (for example, those employed in studies of molecular kinetics) or spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. Here we use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach can be applied to calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks, we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models, we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN (Path Matrix Algorithm for Networks), a Python script that users can apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs. PMID:26646868
Statistical microeconomics and commodity prices: theory and empirical results.
Baaquie, Belal E
2016-01-13
A review is made of the statistical generalization of microeconomics by Baaquie (Baaquie 2013 Phys. A 392, 4400-4416. (doi:10.1016/j.physa.2013.05.008)), where the market price of every traded commodity, at each instant of time, is considered to be an independent random variable. The dynamics of commodity market prices is given by the unequal time correlation function and is modelled by the Feynman path integral based on an action functional. The correlation functions of the model are defined using the path integral. The existence of the action functional for commodity prices that was postulated to exist in Baaquie (Baaquie 2013 Phys. A 392, 4400-4416. (doi:10.1016/j.physa.2013.05.008)) has been empirically ascertained in Baaquie et al. (Baaquie et al. 2015 Phys. A 428, 19-37. (doi:10.1016/j.physa.2015.02.030)). The model's action functionals for different commodities has been empirically determined and calibrated using the unequal time correlation functions of the market commodity prices using a perturbation expansion (Baaquie et al. 2015 Phys. A 428, 19-37. (doi:10.1016/j.physa.2015.02.030)). Nine commodities drawn from the energy, metal and grain sectors are empirically studied and their auto-correlation for up to 300 days is described by the model to an accuracy of R(2)>0.90-using only six parameters. © 2015 The Author(s).
Federal Register 2010, 2011, 2012, 2013, 2014
2011-05-26
... Sentenced Population Movement--National Prisoner Statistics, Extension and Revision of Existing Collection...) Title of the Form/Collection: Summary of Sentenced Population Movement--National Prisoner Statistics (3...
Bashir, Mubasher A; Radke, Wolfgang
2012-02-17
The retention behavior of a range of statistical poly(styrene/ethylacrylate) copolymers is investigated, in order to determine the possibility to predict retention volumes of these copolymers based on a suitable chromatographic retention model. It was found that the composition of elution in gradient chromatography of the copolymers is closely related to the eluent composition at which, in isocratic chromatography, the transition from elution in adsorption to exclusion mode occurs. For homopolymers this transition takes place at a critical eluent composition at which the molar mass dependence of elution volume vanishes. Thus, similar critical eluent compositions can be defined for statistical copolymers. The existence of a critical eluent composition is further supported by the narrower peak width, indicating that the broad molar mass distribution of the samples does not contribute to the retention volume. It is shown that the existing retention model for homopolymers allows for correct quantitative predictions of retention volumes based on only three appropriate initial experiments. The selection of these initial experiments involves a gradient run and two isocratic experiments, one at the composition of elution calculated from first gradient run and second at a slightly higher eluent strength. Copyright © 2011 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Rudoy, Yu. G.; Kotelnikova, O. A.
2012-10-01
The problem of existence of long-range order in the isotropic quantum Heisenberg model on the D=1 lattice is reconsidered in view of the possibility of sufficiently slow decaying exchange interaction with infinite effective radius. It is shown that the macrosopic arguments given by Landau and Lifshitz and then supported microscopically by Mermin and Wagner fail for this case so that the non-zero spontaneous magnetization may yet exist. This result was anticipated by Thouless on the grounds of phenomenological analysis, and we give its microscopic foundation, which amounts to the generalization of Mermin-Wagner theorem for the case of the infinite second moment of the exchange interaction. Two well known in lattice statistics models - i.e., Kac-I and Kac-II - illustrate our results.
Making Spatial Statistics Service Accessible On Cloud Platform
NASA Astrophysics Data System (ADS)
Mu, X.; Wu, J.; Li, T.; Zhong, Y.; Gao, X.
2014-04-01
Web service can bring together applications running on diverse platforms, users can access and share various data, information and models more effectively and conveniently from certain web service platform. Cloud computing emerges as a paradigm of Internet computing in which dynamical, scalable and often virtualized resources are provided as services. With the rampant growth of massive data and restriction of net, traditional web services platforms have some prominent problems existing in development such as calculation efficiency, maintenance cost and data security. In this paper, we offer a spatial statistics service based on Microsoft cloud. An experiment was carried out to evaluate the availability and efficiency of this service. The results show that this spatial statistics service is accessible for the public conveniently with high processing efficiency.
Generating survival times to simulate Cox proportional hazards models with time-varying covariates.
Austin, Peter C
2012-12-20
Simulations and Monte Carlo methods serve an important role in modern statistical research. They allow for an examination of the performance of statistical procedures in settings in which analytic and mathematical derivations may not be feasible. A key element in any statistical simulation is the existence of an appropriate data-generating process: one must be able to simulate data from a specified statistical model. We describe data-generating processes for the Cox proportional hazards model with time-varying covariates when event times follow an exponential, Weibull, or Gompertz distribution. We consider three types of time-varying covariates: first, a dichotomous time-varying covariate that can change at most once from untreated to treated (e.g., organ transplant); second, a continuous time-varying covariate such as cumulative exposure at a constant dose to radiation or to a pharmaceutical agent used for a chronic condition; third, a dichotomous time-varying covariate with a subject being able to move repeatedly between treatment states (e.g., current compliance or use of a medication). In each setting, we derive closed-form expressions that allow one to simulate survival times so that survival times are related to a vector of fixed or time-invariant covariates and to a single time-varying covariate. We illustrate the utility of our closed-form expressions for simulating event times by using Monte Carlo simulations to estimate the statistical power to detect as statistically significant the effect of different types of binary time-varying covariates. This is compared with the statistical power to detect as statistically significant a binary time-invariant covariate. Copyright © 2012 John Wiley & Sons, Ltd.
System and Software Reliability (C103)
NASA Technical Reports Server (NTRS)
Wallace, Dolores
2003-01-01
Within the last decade better reliability models (hardware. software, system) than those currently used have been theorized and developed but not implemented in practice. Previous research on software reliability has shown that while some existing software reliability models are practical, they are no accurate enough. New paradigms of development (e.g. OO) have appeared and associated reliability models have been proposed posed but not investigated. Hardware models have been extensively investigated but not integrated into a system framework. System reliability modeling is the weakest of the three. NASA engineers need better methods and tools to demonstrate that the products meet NASA requirements for reliability measurement. For the new models for the software component of the last decade, there is a great need to bring them into a form that they can be used on software intensive systems. The Statistical Modeling and Estimation of Reliability Functions for Systems (SMERFS'3) tool is an existing vehicle that may be used to incorporate these new modeling advances. Adapting some existing software reliability modeling changes to accommodate major changes in software development technology may also show substantial improvement in prediction accuracy. With some additional research, the next step is to identify and investigate system reliability. System reliability models could then be incorporated in a tool such as SMERFS'3. This tool with better models would greatly add value in assess in GSFC projects.
Edla, Shwetha; Kovvali, Narayan; Papandreou-Suppappola, Antonia
2012-01-01
Constructing statistical models of electrocardiogram (ECG) signals, whose parameters can be used for automated disease classification, is of great importance in precluding manual annotation and providing prompt diagnosis of cardiac diseases. ECG signals consist of several segments with different morphologies (namely the P wave, QRS complex and the T wave) in a single heart beat, which can vary across individuals and diseases. Also, existing statistical ECG models exhibit a reliance upon obtaining a priori information from the ECG data by using preprocessing algorithms to initialize the filter parameters, or to define the user-specified model parameters. In this paper, we propose an ECG modeling technique using the sequential Markov chain Monte Carlo (SMCMC) filter that can perform simultaneous model selection, by adaptively choosing from different representations depending upon the nature of the data. Our results demonstrate the ability of the algorithm to track various types of ECG morphologies, including intermittently occurring ECG beats. In addition, we use the estimated model parameters as the feature set to classify between ECG signals with normal sinus rhythm and four different types of arrhythmia.
Lord, Dominique; Washington, Simon P; Ivan, John N
2005-01-01
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states-perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of "excess" zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to "excess" zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed-and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros.
NASA Technical Reports Server (NTRS)
Crutcher, H. L.; Falls, L. W.
1976-01-01
Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.
NASA Astrophysics Data System (ADS)
Hofer, Marlis; MöLg, Thomas; Marzeion, Ben; Kaser, Georg
2010-06-01
Recently initiated observation networks in the Cordillera Blanca (Peru) provide temporally high-resolution, yet short-term, atmospheric data. The aim of this study is to extend the existing time series into the past. We present an empirical-statistical downscaling (ESD) model that links 6-hourly National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis data to air temperature and specific humidity, measured at the tropical glacier Artesonraju (northern Cordillera Blanca). The ESD modeling procedure includes combined empirical orthogonal function and multiple regression analyses and a double cross-validation scheme for model evaluation. Apart from the selection of predictor fields, the modeling procedure is automated and does not include subjective choices. We assess the ESD model sensitivity to the predictor choice using both single-field and mixed-field predictors. Statistical transfer functions are derived individually for different months and times of day. The forecast skill largely depends on month and time of day, ranging from 0 to 0.8. The mixed-field predictors perform better than the single-field predictors. The ESD model shows added value, at all time scales, against simpler reference models (e.g., the direct use of reanalysis grid point values). The ESD model forecast 1960-2008 clearly reflects interannual variability related to the El Niño/Southern Oscillation but is sensitive to the chosen predictor type.
Approximate Single-Diode Photovoltaic Model for Efficient I-V Characteristics Estimation
Ting, T. O.; Zhang, Nan; Guan, Sheng-Uei; Wong, Prudence W. H.
2013-01-01
Precise photovoltaic (PV) behavior models are normally described by nonlinear analytical equations. To solve such equations, it is necessary to use iterative procedures. Aiming to make the computation easier, this paper proposes an approximate single-diode PV model that enables high-speed predictions for the electrical characteristics of commercial PV modules. Based on the experimental data, statistical analysis is conducted to validate the approximate model. Simulation results show that the calculated current-voltage (I-V) characteristics fit the measured data with high accuracy. Furthermore, compared with the existing modeling methods, the proposed model reduces the simulation time by approximately 30% in this work. PMID:24298205
A Synergy Cropland of China by Fusing Multiple Existing Maps and Statistics.
Lu, Miao; Wu, Wenbin; You, Liangzhi; Chen, Di; Zhang, Li; Yang, Peng; Tang, Huajun
2017-07-12
Accurate information on cropland extent is critical for scientific research and resource management. Several cropland products from remotely sensed datasets are available. Nevertheless, significant inconsistency exists among these products and the cropland areas estimated from these products differ considerably from statistics. In this study, we propose a hierarchical optimization synergy approach (HOSA) to develop a hybrid cropland map of China, circa 2010, by fusing five existing cropland products, i.e., GlobeLand30, Climate Change Initiative Land Cover (CCI-LC), GlobCover 2009, MODIS Collection 5 (MODIS C5), and MODIS Cropland, and sub-national statistics of cropland area. HOSA simplifies the widely used method of score assignment into two steps, including determination of optimal agreement level and identification of the best product combination. The accuracy assessment indicates that the synergy map has higher accuracy of spatial locations and better consistency with statistics than the five existing datasets individually. This suggests that the synergy approach can improve the accuracy of cropland mapping and enhance consistency with statistics.
Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling
Wood, John
2017-01-01
Recently, evidence for endemically low statistical power has cast neuroscience findings into doubt. If low statistical power plagues neuroscience, then this reduces confidence in the reported effects. However, if statistical power is not uniformly low, then such blanket mistrust might not be warranted. Here, we provide a different perspective on this issue, analyzing data from an influential study reporting a median power of 21% across 49 meta-analyses (Button et al., 2013). We demonstrate, using Gaussian mixture modeling, that the sample of 730 studies included in that analysis comprises several subcomponents so the use of a single summary statistic is insufficient to characterize the nature of the distribution. We find that statistical power is extremely low for studies included in meta-analyses that reported a null result and that it varies substantially across subfields of neuroscience, with particularly low power in candidate gene association studies. Therefore, whereas power in neuroscience remains a critical issue, the notion that studies are systematically underpowered is not the full story: low power is far from a universal problem. SIGNIFICANCE STATEMENT Recently, researchers across the biomedical and psychological sciences have become concerned with the reliability of results. One marker for reliability is statistical power: the probability of finding a statistically significant result given that the effect exists. Previous evidence suggests that statistical power is low across the field of neuroscience. Our results present a more comprehensive picture of statistical power in neuroscience: on average, studies are indeed underpowered—some very seriously so—but many studies show acceptable or even exemplary statistical power. We show that this heterogeneity in statistical power is common across most subfields in neuroscience. This new, more nuanced picture of statistical power in neuroscience could affect not only scientific understanding, but potentially policy and funding decisions for neuroscience research. PMID:28706080
Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.
Mesadi, Fitsum; Cetin, Mujdat; Tasdizen, Tolga
2015-10-01
The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. Active shape and appearance models require landmark points and assume unimodal shape and appearance distributions. Level set based shape priors are limited to global shape similarity. In this paper, we present a novel shape and appearance priors for image segmentation based on an implicit parametric shape representation called disjunctive normal shape model (DNSM). DNSM is formed by disjunction of conjunctions of half-spaces defined by discriminants. We learn shape and appearance statistics at varying spatial scales using nonparametric density estimation. Our method can generate a rich set of shape variations by locally combining training shapes. Additionally, by studying the intensity and texture statistics around each discriminant of our shape model, we construct a local appearance probability map. Experiments carried out on both medical and natural image datasets show the potential of the proposed method.
Lee, Thomas; Bocquet, Lydéric; Coasne, Benoit
2016-06-21
Hydrocarbon recovery from unconventional reservoirs (shale gas) is debated due to its environmental impact and uncertainties on its predictability. But a lack of scientific knowledge impedes the proposal of reliable alternatives. The requirement of hydrofracking, fast recovery decay and ultra-low permeability-inherent to their nanoporosity-are specificities of these reservoirs, which challenge existing frameworks. Here we use molecular simulation and statistical models to show that recovery is hampered by interfacial effects at the wet kerogen surface. Recovery is shown to be thermally activated with an energy barrier modelled from the interface wetting properties. We build a statistical model of the recovery kinetics with a two-regime decline that is consistent with published data: a short time decay, consistent with Darcy description, followed by a fast algebraic decay resulting from increasingly unreachable energy barriers. Replacing water by CO2 or propane eliminates the barriers, therefore raising hopes for clean/efficient recovery.
On an additive partial correlation operator and nonparametric estimation of graphical models.
Lee, Kuang-Yao; Li, Bing; Zhao, Hongyu
2016-09-01
We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.
On an additive partial correlation operator and nonparametric estimation of graphical models
Li, Bing; Zhao, Hongyu
2016-01-01
Abstract We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance. PMID:29422689
An information hidden model holding cover distributions
NASA Astrophysics Data System (ADS)
Fu, Min; Cai, Chao; Dai, Zuxu
2018-03-01
The goal of steganography is to embed secret data into a cover so no one apart from the sender and intended recipients can find the secret data. Usually, the way the cover changing was decided by a hidden function. There were no existing model could be used to find an optimal function which can greatly reduce the distortion the cover suffered. This paper considers the cover carrying secret message as a random Markov chain, taking the advantages of a deterministic relation between initial distributions and transferring matrix of the Markov chain, and takes the transferring matrix as a constriction to decrease statistical distortion the cover suffered in the process of information hiding. Furthermore, a hidden function is designed and the transferring matrix is also presented to be a matrix from the original cover to the stego cover. Experiment results show that the new model preserves a consistent statistical characterizations of original and stego cover.
Statistical Inference in Hidden Markov Models Using k-Segment Constraints
Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher
2016-01-01
Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674
Quantum error-correction failure distributions: Comparison of coherent and stochastic error models
NASA Astrophysics Data System (ADS)
Barnes, Jeff P.; Trout, Colin J.; Lucarelli, Dennis; Clader, B. D.
2017-06-01
We compare failure distributions of quantum error correction circuits for stochastic errors and coherent errors. We utilize a fully coherent simulation of a fault-tolerant quantum error correcting circuit for a d =3 Steane and surface code. We find that the output distributions are markedly different for the two error models, showing that no simple mapping between the two error models exists. Coherent errors create very broad and heavy-tailed failure distributions. This suggests that they are susceptible to outlier events and that mean statistics, such as pseudothreshold estimates, may not provide the key figure of merit. This provides further statistical insight into why coherent errors can be so harmful for quantum error correction. These output probability distributions may also provide a useful metric that can be utilized when optimizing quantum error correcting codes and decoding procedures for purely coherent errors.
Narayan, Manjari; Allen, Genevera I.
2016-01-01
Many complex brain disorders, such as autism spectrum disorders, exhibit a wide range of symptoms and disability. To understand how brain communication is impaired in such conditions, functional connectivity studies seek to understand individual differences in brain network structure in terms of covariates that measure symptom severity. In practice, however, functional connectivity is not observed but estimated from complex and noisy neural activity measurements. Imperfect subject network estimates can compromise subsequent efforts to detect covariate effects on network structure. We address this problem in the case of Gaussian graphical models of functional connectivity, by proposing novel two-level models that treat both subject level networks and population level covariate effects as unknown parameters. To account for imperfectly estimated subject level networks when fitting these models, we propose two related approaches—R2 based on resampling and random effects test statistics, and R3 that additionally employs random adaptive penalization. Simulation studies using realistic graph structures reveal that R2 and R3 have superior statistical power to detect covariate effects compared to existing approaches, particularly when the number of within subject observations is comparable to the size of subject networks. Using our novel models and methods to study parts of the ABIDE dataset, we find evidence of hypoconnectivity associated with symptom severity in autism spectrum disorders, in frontoparietal and limbic systems as well as in anterior and posterior cingulate cortices. PMID:27147940
NASA Astrophysics Data System (ADS)
Goldsworthy, M. J.
2012-10-01
One of the most useful tools for modelling rarefied hypersonic flows is the Direct Simulation Monte Carlo (DSMC) method. Simulator particle movement and collision calculations are combined with statistical procedures to model thermal non-equilibrium flow-fields described by the Boltzmann equation. The Macroscopic Chemistry Method for DSMC simulations was developed to simplify the inclusion of complex thermal non-equilibrium chemistry. The macroscopic approach uses statistical information which is calculated during the DSMC solution process in the modelling procedures. Here it is shown how inclusion of macroscopic information in models of chemical kinetics, electronic excitation, ionization, and radiation can enhance the capabilities of DSMC to model flow-fields where a range of physical processes occur. The approach is applied to the modelling of a 6.4 km/s nitrogen shock wave and results are compared with those from existing shock-tube experiments and continuum calculations. Reasonable agreement between the methods is obtained. The quality of the comparison is highly dependent on the set of vibrational relaxation and chemical kinetic parameters employed.
Graph theory applied to noise and vibration control in statistical energy analysis models.
Guasch, Oriol; Cortés, Lluís
2009-06-01
A fundamental aspect of noise and vibration control in statistical energy analysis (SEA) models consists in first identifying and then reducing the energy flow paths between subsystems. In this work, it is proposed to make use of some results from graph theory to address both issues. On the one hand, linear and path algebras applied to adjacency matrices of SEA graphs are used to determine the existence of any order paths between subsystems, counting and labeling them, finding extremal paths, or determining the power flow contributions from groups of paths. On the other hand, a strategy is presented that makes use of graph cut algorithms to reduce the energy flow from a source subsystem to a receiver one, modifying as few internal and coupling loss factors as possible.
DOE Office of Scientific and Technical Information (OSTI.GOV)
More, R.M.
A new statistical model (the quantum-statistical model (QSM)) was recently introduced by Kalitkin and Kuzmina for the calculation of thermodynamic properties of compressed matter. This paper examines the QSM and gives (i) a numerical QSM calculation of pressure and energy for aluminum and comparison to existing augmented-plane-wave data; (ii) display of separate kinetic, exchange, and quantum pressure terms; (iii) a study of electron density at the nucleus; (iv) a study of the effects of the Kirzhnitz-Weizsacker parameter controlling the gradient terms; (v) an analytic expansion for very high densities; and (vi) rigorous pressure theorems including a general version of themore » virial theorem which applies to an arbitrary microscopic volume. It is concluded that the QSM represents the most accurate and consistent theory of the Thomas-Fermi type.« less
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Huizinga, Richard J.
2008-01-01
In cooperation with the Missouri Department of Transportation, the U.S. Geological Survey determined hydrologic and hydraulic parameters for the Gasconade River at the site of a proposed bridge replacement and highway realignment of State Highway 17 near Waynesville, Missouri. Information from a discontinued streamflow-gaging station on the Gasconade River near Waynesville was used to determine streamflow statistics for analysis of the 25-, 50-, 100-, and 500-year floods at the site. Analysis of the streamflow-gaging stations on the Gasconade River upstream and downstream from Waynesville indicate that flood peaks attenuate between the upstream gaging station near Hazelgreen and the Waynesville gaging station, such that the peak discharge observed on the Gasconade River near Waynesville will be equal to or only slightly greater (7 percent or less) than that observed near Hazelgreen. A flood event occurred on the Gasconade River in March 2008, and a flood measurement was obtained near the peak at State Highway 17. The elevation of high-water marks from that event indicated it was the highest measured flood on record with a measured discharge of 95,400 cubic feet per second, and a water-surface elevation of 766.18 feet near the location of the Waynesville gaging station. The measurements obtained for the March flood resulted in a shift of the original stage-discharge relation for the Waynesville gaging station, and the streamflow statistics were modified based on the new data. A two-dimensional hydrodynamic flow model was used to simulate flow conditions on the Gasconade River in the vicinity of State Highway 17. A model was developed that represents existing (2008) conditions on State Highway 17 (the 'model of existing conditions'), and was calibrated to the floods of March 20, 2008, December 4, 1982, and April 14, 1945. Modifications were made to the model of existing conditions to create a model that represents conditions along the same reach of the Gasconade River with preliminary proposed replacement bridges and realignment of State Highway 17 (the 'model of proposed conditions'). The models of existing and proposed conditions were used to simulate the 25-, 50-, 100-, and 500-year recurrence floods, as well as the March 20, 2008 flood. Results from the model of proposed conditions show that the proposed replacement structures and realignment of State Highway 17 will result in additional backwater upstream from State Highway 17 ranging from approximately 0.18 foot for the 25-year flood to 0.32 foot for the 500-year flood. Velocity magnitudes in the proposed overflow structures were greater than in the existing structures [by as much as 4.9 feet per second in the left (west) overflow structure for the 500-year flood], and shallow, high-velocity flow occurs at the upstream edges of the abutments of the proposed overflow structures in the 100- and 500-year floods where flow overtops parts of the existing road embankment that will be left in place in the proposed scenario. Velocity magnitude in the main channel of the model of proposed conditions increased by a maximum of 1.2 feet per second over the model of existing conditions, with the maximum occurring approximately 1,500 feet downstream from existing main channel structure J-802.
Yobbi, D.K.
2000-01-01
A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.
ERIC Educational Resources Information Center
Katsioloudis, Petros J.; Jones, Mildred V.
2018-01-01
A number of studies indicate that the use of holographic displays can influence spatial visualization ability; however, research provides inconsistent results. Considering this, a quasi-experimental study was conducted to identify the existence of statistically significant effects on sectional view drawing ability due to the impacts of holographic…
Bose condensation of nuclei in heavy ion collisions
NASA Technical Reports Server (NTRS)
Tripathi, Ram K.; Townsend, Lawrence W.
1994-01-01
Using a fully self-consistent quantum statistical model, we demonstrate the possibility of Bose condensation of nuclei in heavy ion collisions. The most favorable conditions of high densities and low temperatures are usually associated with astrophysical processes and may be difficult to achieve in heavy ion collisions. Nonetheless, some suggestions for the possible experimental verification of the existence of this phenomenon are made.
On the limitations of statistical absorption studies with the Sloan Digital Sky Surveys I-III
NASA Astrophysics Data System (ADS)
Lan, Ting-Wen; Ménard, Brice; Baron, Dalya; Johnson, Sean; Poznanski, Dovi; Prochaska, J. Xavier; O'Meara, John M.
2018-07-01
We investigate the limitations of statistical absorption measurements with the Sloan Digital Sky Survey (SDSS) optical spectroscopic surveys. We show that changes in the data reduction strategy throughout different data releases have led to a better accuracy at long wavelengths, in particular for sky line subtraction, but a degradation at short wavelengths with the emergence of systematic spectral features with an amplitude of about 1 per cent. We show that these features originate from inaccuracy in the fitting of modelled F-star spectra used for flux calibration. The best-fitting models for those stars are found to systematically overestimate the strength of metal lines and underestimate that of Lithium. We also identify the existence of artefacts due to masking and interpolation procedures at the wavelengths of the hydrogen Balmer series leading to the existence of artificial Balmer α absorption in all SDSS optical spectra. All these effects occur in the rest frame of the standard stars and therefore present Galactic longitude variations due to the rotation of the Galaxy. We demonstrate that the detection of certain weak absorption lines reported in the literature is solely due to calibration effects. Finally, we discuss new strategies to mitigate these issues.
On the limitations of statistical absorption studies with the Sloan Digital Sky Surveys I-III
NASA Astrophysics Data System (ADS)
Lan, Ting-Wen; Ménard, Brice; Baron, Dalya; Johnson, Sean; Poznanski, Dovi; Prochaska, J. Xavier; O'Meara, John M.
2018-04-01
We investigate the limitations of statistical absorption measurements with the SDSS optical spectroscopic surveys. We show that changes in the data reduction strategy throughout different data releases have led to a better accuracy at long wavelengths, in particular for sky line subtraction, but a degradation at short wavelengths with the emergence of systematic spectral features with an amplitude of about one percent. We show that these features originate from inaccuracy in the fitting of modeled F-star spectra used for flux calibration. The best-fit models for those stars are found to systematically over-estimate the strength of metal lines and under-estimate that of Lithium. We also identify the existence of artifacts due to masking and interpolation procedures at the wavelengths of the hydrogen Balmer series leading to the existence of artificial Balmer α absorption in all SDSS optical spectra. All these effects occur in the rest-frame of the standard stars and therefore present Galactic longitude variations due to the rotation of the Galaxy. We demonstrate that the detection of certain weak absorption lines reported in the literature are solely due to calibration effects. Finally, we discuss new strategies to mitigate these issues.
Using existing case-mix methods to fund trauma cases.
Monakova, Julia; Blais, Irene; Botz, Charles; Chechulin, Yuriy; Picciano, Gino; Basinski, Antoni
2010-01-01
Policymakers frequently face the need to increase funding in isolated and frequently heterogeneous (clinically and in terms of resource consumption) patient subpopulations. This article presents a methodologic solution for testing the appropriateness of using existing grouping and weighting methodologies for funding subsets of patients in the scenario where a case-mix approach is preferable to a flat-rate based payment system. Using as an example the subpopulation of trauma cases of Ontario lead trauma hospitals, the statistical techniques of linear and nonlinear regression models, regression trees, and spline models were applied to examine the fit of the existing case-mix groups and reference weights for the trauma cases. The analyses demonstrated that for funding Ontario trauma cases, the existing case-mix systems can form the basis for rational and equitable hospital funding, decreasing the need to develop a different grouper for this subset of patients. This study confirmed that Injury Severity Score is a poor predictor of costs for trauma patients. Although our analysis used the Canadian case-mix classification system and cost weights, the demonstrated concept of using existing case-mix systems to develop funding rates for specific subsets of patient populations may be applicable internationally.
Modelling the participation decision and duration of sporting activity in Scotland
Eberth, Barbara; Smith, Murray D.
2010-01-01
Motivating individuals to actively engage in physical activity due to its beneficial health effects has been an integral part of Scotland's health policy agenda. The current Scottish guidelines recommend individuals participate in physical activity of moderate vigour for 30 min at least five times per week. For an individual contemplating the recommendation, decisions have to be made in regard of participation, intensity, duration and multiplicity. For the policy maker, understanding the determinants of each decision will assist in designing an intervention to effect the recommended policy. With secondary data sourced from the 2003 Scottish Health Survey (SHeS) we statistically model the combined decisions process, employing a copula approach to model specification. In taking this approach the model flexibly accounts for any statistical associations that may exist between the component decisions. Thus, we model the endogenous relationship between the decision of individuals to participate in sporting activities and, amongst those who participate, the duration of time spent undertaking their chosen activities. The main focus is to establish whether dependence exists between the two random variables assuming the vigour with which sporting activity is performed to be independent of the participation and duration decision. We allow for a variety of controls including demographic factors such as age and gender, economic factors such as income and educational attainment, lifestyle factors such as smoking, alcohol consumption, healthy eating and medical history. We use the model to compare the effect of interventions designed to increase the vigour with which individuals undertake their sport, relating it to obesity as a health outcome. PMID:20640033
Shi, Jie; Collignon, Olivier; Xu, Liang; Wang, Gang; Kang, Yue; Leporé, Franco; Lao, Yi; Joshi, Anand A; Leporé, Natasha; Wang, Yalin
2015-07-01
Blindness represents a unique model to study how visual experience may shape the development of brain organization. Exploring how the structure of the corpus callosum (CC) reorganizes ensuing visual deprivation is of particular interest due to its important functional implication in vision (e.g., via the splenium of the CC). Moreover, comparing early versus late visually deprived individuals has the potential to unravel the existence of a sensitive period for reshaping the CC structure. Here, we develop a novel framework to capture a complete set of shape differences in the CC between congenitally blind (CB), late blind (LB) and sighted control (SC) groups. The CCs were manually segmented from T1-weighted brain MRI and modeled by 3D tetrahedral meshes. We statistically compared the combination of local area and thickness at each point between subject groups. Differences in area are found using surface tensor-based morphometry; thickness is estimated by tracing the streamlines in the volumetric harmonic field. Group differences were assessed on this combined measure using Hotelling's T(2) test. Interestingly, we observed that the total callosal volume did not differ between the groups. However, our fine-grained analysis reveals significant differences mostly localized around the splenium areas between both blind groups and the sighted group (general effects of blindness) and, importantly, specific dissimilarities between the LB and CB groups, illustrating the existence of a sensitive period for reorganization. The new multivariate statistics also gave better effect sizes for detecting morphometric differences, relative to other statistics. They may boost statistical power for CC morphometric analyses.
Shi, Jie; Collignon, Olivier; Xu, Liang; Wang, Gang; Kang, Yue; Leporé, Franco; Lao, Yi; Joshi, Anand A.
2015-01-01
Blindness represents a unique model to study how visual experience may shape the development of brain organization. Exploring how the structure of the corpus callosum (CC) reorganizes ensuing visual deprivation is of particular interest due to its important functional implication in vision (e.g. via the splenium of the CC). Moreover, comparing early versus late visually deprived individuals has the potential to unravel the existence of a sensitive period for reshaping the CC structure. Here, we develop a novel framework to capture a complete set of shape differences in the CC between congenitally blind (CB), late blind (LB) and sighted control (SC) groups. The CCs were manually segmented from T1-weighted brain MRI and modeled by 3D tetrahedral meshes. We statistically compared the combination of local area and thickness at each point between subject groups. Differences in area are found using surface tensor-based morphometry; thickness is estimated by tracing the streamlines in the volumetric harmonic field. Group differences were assessed on this combined measure using Hotelling’s T2 test. Interestingly, we observed that the total callosal volume did not differ between the groups. However, our fine-grained analysis reveals significant differences mostly localized around the splenium areas between both blind groups and the sighted group (general effects of blindness) and, importantly, specific dissimilarities between the LB and CB groups, illustrating the existence of a sensitive period for reorganization. The new multivariate statistics also gave better effect sizes for detecting morphometric differences, relative to other statistics. They may boost statistical power for CC morphometric analyses. PMID:25649876
NASA Astrophysics Data System (ADS)
Ionita, M.; Grosfeld, K.; Scholz, P.; Lohmann, G.
2016-12-01
Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad information interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability depends on various climate parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal, we developed a robust statistical model based on ocean heat content, sea surface temperature and atmospheric variables to calculate an estimate of the September minimum sea ice extent for every year. Although previous statistical attempts at monthly/seasonal forecasts of September sea ice minimum show a relatively reduced skill, here it is shown that more than 97% (r = 0.98) of the September sea ice extent can predicted three months in advance by using previous months conditions via a multiple linear regression model based on global sea surface temperature (SST), mean sea level pressure (SLP), air temperature at 850hPa (TT850), surface winds and sea ice extent persistence. The statistical model is based on the identification of regions with stable teleconnections between the predictors (climatological parameters) and the predictand (here sea ice extent). The results based on our statistical model contribute to the sea ice prediction network for the sea ice outlook report (https://www.arcus.org/sipn) and could provide a tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.
Chen, Gang; Glen, Daniel R.; Saad, Ziad S.; Hamilton, J. Paul; Thomason, Moriah E.; Gotlib, Ian H.; Cox, Robert W.
2011-01-01
Vector autoregression (VAR) and structural equation modeling (SEM) are two popular brain-network modeling tools. VAR, which is a data-driven approach, assumes that connected regions exert time-lagged influences on one another. In contrast, the hypothesis-driven SEM is used to validate an existing connectivity model where connected regions have contemporaneous interactions among them. We present the two models in detail and discuss their applicability to FMRI data, and interpretational limits. We also propose a unified approach that models both lagged and contemporaneous effects. The unifying model, structural vector autoregression (SVAR), may improve statistical and explanatory power, and avoids some prevalent pitfalls that can occur when VAR and SEM are utilized separately. PMID:21975109
Exploring Explanations of Subglacial Bedform Sizes Using Statistical Models
Kougioumtzoglou, Ioannis A.; Stokes, Chris R.; Smith, Michael J.; Clark, Chris D.; Spagnolo, Matteo S.
2016-01-01
Sediments beneath modern ice sheets exert a key control on their flow, but are largely inaccessible except through geophysics or boreholes. In contrast, palaeo-ice sheet beds are accessible, and typically characterised by numerous bedforms. However, the interaction between bedforms and ice flow is poorly constrained and it is not clear how bedform sizes might reflect ice flow conditions. To better understand this link we present a first exploration of a variety of statistical models to explain the size distribution of some common subglacial bedforms (i.e., drumlins, ribbed moraine, MSGL). By considering a range of models, constructed to reflect key aspects of the physical processes, it is possible to infer that the size distributions are most effectively explained when the dynamics of ice-water-sediment interaction associated with bedform growth is fundamentally random. A ‘stochastic instability’ (SI) model, which integrates random bedform growth and shrinking through time with exponential growth, is preferred and is consistent with other observations of palaeo-bedforms and geophysical surveys of active ice sheets. Furthermore, we give a proof-of-concept demonstration that our statistical approach can bridge the gap between geomorphological observations and physical models, directly linking measurable size-frequency parameters to properties of ice sheet flow (e.g., ice velocity). Moreover, statistically developing existing models as proposed allows quantitative predictions to be made about sizes, making the models testable; a first illustration of this is given for a hypothesised repeat geophysical survey of bedforms under active ice. Thus, we further demonstrate the potential of size-frequency distributions of subglacial bedforms to assist the elucidation of subglacial processes and better constrain ice sheet models. PMID:27458921
Inferring general relations between network characteristics from specific network ensembles.
Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan
2012-01-01
Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.
Khana, Diba; Rossen, Lauren M; Hedegaard, Holly; Warner, Margaret
2018-01-01
Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005-2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software R-INLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.
NASA Astrophysics Data System (ADS)
Fripp, Jurgen; Crozier, Stuart; Warfield, Simon K.; Ourselin, Sébastien
2006-03-01
Subdivision surfaces and parameterization are desirable for many algorithms that are commonly used in Medical Image Analysis. However, extracting an accurate surface and parameterization can be difficult for many anatomical objects of interest, due to noisy segmentations and the inherent variability of the object. The thin cartilages of the knee are an example of this, especially after damage is incurred from injuries or conditions like osteoarthritis. As a result, the cartilages can have different topologies or exist in multiple pieces. In this paper we present a topology preserving (genus 0) subdivision-based parametric deformable model that is used to extract the surfaces of the patella and tibial cartilages in the knee. These surfaces have minimal thickness in areas without cartilage. The algorithm inherently incorporates several desirable properties, including: shape based interpolation, sub-division remeshing and parameterization. To illustrate the usefulness of this approach, the surfaces and parameterizations of the patella cartilage are used to generate a 3D statistical shape model.
Dendritic growth model of multilevel marketing
NASA Astrophysics Data System (ADS)
Pang, James Christopher S.; Monterola, Christopher P.
2017-02-01
Biologically inspired dendritic network growth is utilized to model the evolving connections of a multilevel marketing (MLM) enterprise. Starting from agents at random spatial locations, a network is formed by minimizing a distance cost function controlled by a parameter, termed the balancing factor bf, that weighs the wiring and the path length costs of connection. The paradigm is compared to an actual MLM membership data and is shown to be successful in statistically capturing the membership distribution, better than the previously reported agent based preferential attachment or analytic branching process models. Moreover, it recovers the known empirical statistics of previously studied MLM, specifically: (i) a membership distribution characterized by the existence of peak levels indicating limited growth, and (ii) an income distribution obeying the 80 - 20 Pareto principle. Extensive types of income distributions from uniform to Pareto to a "winner-take-all" kind are also modeled by varying bf. Finally, the robustness of our dendritic growth paradigm to random agent removals is explored and its implications to MLM income distributions are discussed.
Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris
2018-05-01
Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.
Empirical flow parameters : a tool for hydraulic model validity
Asquith, William H.; Burley, Thomas E.; Cleveland, Theodore G.
2013-01-01
The objectives of this project were (1) To determine and present from existing data in Texas, relations between observed stream flow, topographic slope, mean section velocity, and other hydraulic factors, to produce charts such as Figure 1 and to produce empirical distributions of the various flow parameters to provide a methodology to "check if model results are way off!"; (2) To produce a statistical regional tool to estimate mean velocity or other selected parameters for storm flows or other conditional discharges at ungauged locations (most bridge crossings) in Texas to provide a secondary way to compare such values to a conventional hydraulic modeling approach. (3.) To present ancillary values such as Froude number, stream power, Rosgen channel classification, sinuosity, and other selected characteristics (readily determinable from existing data) to provide additional information to engineers concerned with the hydraulic-soil-foundation component of transportation infrastructure.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants
Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.
2016-01-01
Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Pang, Kun; Sun, Xiao-Wen; Liu, Shi-Bo; Li, Wei-Guo; Shao, Yi; Zhuo, Jian; Wei, Hai-Bin; Xia, Shu-Jie
2012-11-13
To explore the application of thulium laser (2 µm laser) in managing bladder cuff in nephroureterectomy for upper urinary tract urothelium carcinoma (UUT-UC). The medical records of 56 patients undergoing nephroureterectomy at our hospital were reviewed retrospectively. The operative indicators, oncologic outcomes and clinicopathologic data were compared among the groups of open surgery (Group A), electric coagulation (Group B) and thulium laser technique (Group C). Furthermore a model of burst pressure measurement was built to measure the different burst pressures of sealing distal ureter. The follow-up results: when the indicators of operative duration, intraoperative blood loss volume, removal time of drainage tube, removal time of catheter and hospital stays were compared among three groups, Group A had no statistical differences with Group B/C in terms of removal time of drainage tube and removal time of catheter. But significant statistical differences existed in terms of operative duration, intraoperative blood loss volume and hospital stays ((232 ± 52) vs (148 ± 47) and (130 ± 49) min, (358 ± 81) vs (136 ± 74) and (145 ± 70) ml, (13 ± 3) vs (11 ± 4) and (10 ± 3) d, all P < 0.05). No statistical differences existed between Groups B and C in terms of all the above indicators. Burst pressure measurement results: no statistical differences existed between Group C and B ((116 ± 21) vs (139 ± 32) cm H2O, P > 0.05). For the surgical treatment of UUT-UC, thulium laser technique has no difference in operation indicators and oncologic outcomes compared to open surgery. Besides, it has the advantages of improved spatial beam quality and more precise tissue incision.
Sculpting bespoke mountains: Determining free energies with basis expansions
NASA Astrophysics Data System (ADS)
Whitmer, Jonathan K.; Fluitt, Aaron M.; Antony, Lucas; Qin, Jian; McGovern, Michael; de Pablo, Juan J.
2015-07-01
The intriguing behavior of a wide variety of physical systems, ranging from amorphous solids or glasses to proteins, is a direct manifestation of underlying free energy landscapes riddled with local minima separated by large barriers. Exploring such landscapes has arguably become one of statistical physics's great challenges. A new method is proposed here for uniform sampling of rugged free energy surfaces. The method, which relies on special Green's functions to approximate the Dirac delta function, improves significantly on existing simulation techniques by providing a boundary-agnostic approach that is capable of mapping complex features in multidimensional free energy surfaces. The usefulness of the proposed approach is established in the context of a simple model glass former and model proteins, demonstrating improved convergence and accuracy over existing methods.
Aksamija, Goran; Mulabdic, Adi; Rasic, Ismar; Muhovic, Samir; Gavric, Igor
2011-01-01
Polytrauma is defined as an injury where they are affected by at least two different organ systems or body, with at least one life-threatening injuries. Given the multilevel model care of polytrauma patients within KCUS are inevitable weaknesses in the management of this category of patients. To determine the dynamics of existing procedures in treatment of polytrauma patients on admission to KCUS, and based on statistical analysis of variables applied to determine and define the factors that influence the final outcome of treatment, and determine their mutual relationship, which may result in eliminating the flaws in the approach to the problem. The study was based on 263 polytrauma patients. Parametric and non-parametric statistical methods were used. Basic statistics were calculated, based on the calculated parameters for the final achievement of research objectives, multicoleration analysis, image analysis, discriminant analysis and multifactorial analysis were used. From the universe of variables for this study we selected sample of n = 25 variables, of which the first two modular, others belong to the common measurement space (n = 23) and in this paper defined as a system variable methods, procedures and assessments of polytrauma patients. After the multicoleration analysis, since the image analysis gave a reliable measurement results, we started the analysis of eigenvalues, that is defining the factors upon which they obtain information about the system solve the problem of the existing model and its correlation with treatment outcome. The study singled out the essential factors that determine the current organizational model of care, which may affect the treatment and better outcome of polytrauma patients. This analysis has shown the maximum correlative relationships between these practices and contributed to development guidelines that are defined by isolated factors.
NASA Astrophysics Data System (ADS)
Uddameri, V.
2007-01-01
Reliable forecasts of monthly and quarterly fluctuations in groundwater levels are necessary for short- and medium-term planning and management of aquifers to ensure proper service of seasonal demands within a region. Development of physically based transient mathematical models at this time scale poses considerable challenges due to lack of suitable data and other uncertainties. Artificial neural networks (ANN) possess flexible mathematical structures and are capable of mapping highly nonlinear relationships. Feed-forward neural network models were constructed and trained using the back-percolation algorithm to forecast monthly and quarterly time-series water levels at a well that taps into the deeper Evangeline formation of the Gulf Coast aquifer in Victoria, TX. Unlike unconfined formations, no causal relationships exist between water levels and hydro-meteorological variables measured near the vicinity of the well. As such, an endogenous forecasting model using dummy variables to capture short-term seasonal fluctuations and longer-term (decadal) trends was constructed. The root mean square error, mean absolute deviation and correlation coefficient ( R) were noted to be 1.40, 0.33 and 0.77 m, respectively, for an evaluation dataset of quarterly measurements and 1.17, 0.46, and 0.88 m for an evaluative monthly dataset not used to train or test the model. These statistics were better for the ANN model than those developed using statistical regression techniques.
The basis function approach for modeling autocorrelation in ecological data
Hefley, Trevor J.; Broms, Kristin M.; Brost, Brian M.; Buderman, Frances E.; Kay, Shannon L.; Scharf, Henry; Tipton, John; Williams, Perry J.; Hooten, Mevin B.
2017-01-01
Analyzing ecological data often requires modeling the autocorrelation created by spatial and temporal processes. Many seemingly disparate statistical methods used to account for autocorrelation can be expressed as regression models that include basis functions. Basis functions also enable ecologists to modify a wide range of existing ecological models in order to account for autocorrelation, which can improve inference and predictive accuracy. Furthermore, understanding the properties of basis functions is essential for evaluating the fit of spatial or time-series models, detecting a hidden form of collinearity, and analyzing large data sets. We present important concepts and properties related to basis functions and illustrate several tools and techniques ecologists can use when modeling autocorrelation in ecological data.
Sturm, Marc; Quinten, Sascha; Huber, Christian G.; Kohlbacher, Oliver
2007-01-01
We propose a new model for predicting the retention time of oligonucleotides. The model is based on ν support vector regression using features derived from base sequence and predicted secondary structure of oligonucleotides. Because of the secondary structure information, the model is applicable even at relatively low temperatures where the secondary structure is not suppressed by thermal denaturing. This makes the prediction of oligonucleotide retention time for arbitrary temperatures possible, provided that the target temperature lies within the temperature range of the training data. We describe different possibilities of feature calculation from base sequence and secondary structure, present the results and compare our model to existing models. PMID:17567619
Managing heteroscedasticity in general linear models.
Rosopa, Patrick J; Schaffer, Meline M; Schroeder, Amber N
2013-09-01
Heteroscedasticity refers to a phenomenon where data violate a statistical assumption. This assumption is known as homoscedasticity. When the homoscedasticity assumption is violated, this can lead to increased Type I error rates or decreased statistical power. Because this can adversely affect substantive conclusions, the failure to detect and manage heteroscedasticity could have serious implications for theory, research, and practice. In addition, heteroscedasticity is not uncommon in the behavioral and social sciences. Thus, in the current article, we synthesize extant literature in applied psychology, econometrics, quantitative psychology, and statistics, and we offer recommendations for researchers and practitioners regarding available procedures for detecting heteroscedasticity and mitigating its effects. In addition to discussing the strengths and weaknesses of various procedures and comparing them in terms of existing simulation results, we describe a 3-step data-analytic process for detecting and managing heteroscedasticity: (a) fitting a model based on theory and saving residuals, (b) the analysis of residuals, and (c) statistical inferences (e.g., hypothesis tests and confidence intervals) involving parameter estimates. We also demonstrate this data-analytic process using an illustrative example. Overall, detecting violations of the homoscedasticity assumption and mitigating its biasing effects can strengthen the validity of inferences from behavioral and social science data.
Power Enhancement in High Dimensional Cross-Sectional Tests
Fan, Jianqing; Liao, Yuan; Yao, Jiawei
2016-01-01
We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846
Scaling Laws in Canopy Flows: A Wind-Tunnel Analysis
NASA Astrophysics Data System (ADS)
Segalini, Antonio; Fransson, Jens H. M.; Alfredsson, P. Henrik
2013-08-01
An analysis of velocity statistics and spectra measured above a wind-tunnel forest model is reported. Several measurement stations downstream of the forest edge have been investigated and it is observed that, while the mean velocity profile adjusts quickly to the new canopy boundary condition, the turbulence lags behind and shows a continuous penetration towards the free stream along the canopy model. The statistical profiles illustrate this growth and do not collapse when plotted as a function of the vertical coordinate. However, when the statistics are plotted as function of the local mean velocity (normalized with a characteristic velocity scale), they do collapse, independently of the streamwise position and freestream velocity. A new scaling for the spectra of all three velocity components is proposed based on the velocity variance and integral time scale. This normalization improves the collapse of the spectra compared to existing scalings adopted in atmospheric measurements, and allows the determination of a universal function that provides the velocity spectrum. Furthermore, a comparison of the proposed scaling laws for two different canopy densities is shown, demonstrating that the vertical velocity variance is the most sensible statistical quantity to the characteristics of the canopy roughness.
NASA Astrophysics Data System (ADS)
Kato, Takeyoshi; Sugimoto, Hiroyuki; Suzuoki, Yasuo
We established a procedure for estimating regional electricity demand and regional potential capacity of distributed generators (DGs) by using a grid square statistics data set. A photovoltaic power system (PV system) for residential use and a co-generation system (CGS) for both residential and commercial use were taken into account. As an example, the result regarding Aichi prefecture was presented in this paper. The statistical data of the number of households by family-type and the number of employees by business category for about 4000 grid-square with 1km × 1km area was used to estimate the floor space or the electricity demand distribution. The rooftop area available for installing PV systems was also estimated with the grid-square statistics data set. Considering the relation between a capacity of existing CGS and a scale-index of building where CGS is installed, the potential capacity of CGS was estimated for three business categories, i.e. hotel, hospital, store. In some regions, the potential capacity of PV systems was estimated to be about 10,000kW/km2, which corresponds to the density of the existing area with intensive installation of PV systems. Finally, we discussed the ratio of regional potential capacity of DGs to regional maximum electricity demand for deducing the appropriate capacity of DGs in the model of future electricity distribution system.
NASA Astrophysics Data System (ADS)
Raulier, Jonathan; Dansereau, Véronique; Fichefet, Thierry; Legat, Vincent; Weiss, Jérôme
2017-04-01
Sea ice is a highly dynamical environment characterized by a dense mesh of fractures or leads, constantly opening and closing over short time scales. This characteristic geomorphology is linked to the existence of linear kinematic features, which consist of quasi-linear patterns emerging from the observed strain rate field of sea ice. Standard rheologies used in most state-of-the-art sea ice models, like the well-known elastic-viscous-plastic rheology, are thought to misrepresent those linear kinematic features and the observed statistical distribution of deformation rates. Dedicated rheologies built to catch the processes known to be at the origin of the formation of leads are developed but still need evaluations on the global scale. One of them, based on a Maxwell elasto-brittle formulation, is being integrated in the NEMO-LIM3 global ocean-sea ice model (www.nemo-ocean.eu; www.elic.ucl.ac.be/lim). In the present study, we compare the results of the sea ice model LIM3 obtained with two different rheologies: the elastic-viscous-plastic rheology commonly used in LIM3 and a Maxwell elasto-brittle rheology. This comparison is focused on the statistical characteristics of the simulated deformation rate and on the ability of the model to reproduce the existence of leads within the ice pack. The impact of the lead representation on fluxes between ice, atmosphere and ocean is also assessed.
Webster, R J; Williams, A; Marchetti, F; Yauk, C L
2018-07-01
Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Yamagata, Koichi; Yamanishi, Ayako; Kokubu, Chikara; Takeda, Junji; Sese, Jun
2016-01-01
An important challenge in cancer genomics is precise detection of structural variations (SVs) by high-throughput short-read sequencing, which is hampered by the high false discovery rates of existing analysis tools. Here, we propose an accurate SV detection method named COSMOS, which compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. COSMOS also prioritizes the candidate SVs using strand-specific read-depth information. Performance tests on modeled tumor genomes revealed that COSMOS outperformed existing methods in terms of F-measure. We also applied COSMOS to an experimental mouse cell-based model, in which SVs were induced by genome engineering and gamma-ray irradiation, followed by polymerase chain reaction-based confirmation. The precision of COSMOS was 84.5%, while the next best existing method was 70.4%. Moreover, the sensitivity of COSMOS was the highest, indicating that COSMOS has great potential for cancer genome analysis. PMID:26833260
Analyzing Responses of Chemical Sensor Arrays
NASA Technical Reports Server (NTRS)
Zhou, Hanying
2007-01-01
NASA is developing a third-generation electronic nose (ENose) capable of continuous monitoring of the International Space Station s cabin atmosphere for specific, harmful airborne contaminants. Previous generations of the ENose have been described in prior NASA Tech Briefs issues. Sensor selection is critical in both (prefabrication) sensor material selection and (post-fabrication) data analysis of the ENose, which detects several analytes that are difficult to detect, or that are at very low concentration ranges. Existing sensor selection approaches usually include limited statistical measures, where selectivity is more important but reliability and sensitivity are not of concern. When reliability and sensitivity can be major limiting factors in detecting target compounds reliably, the existing approach is not able to provide meaningful selection that will actually improve data analysis results. The approach and software reported here consider more statistical measures (factors) than existing approaches for a similar purpose. The result is a more balanced and robust sensor selection from a less than ideal sensor array. The software offers quick, flexible, optimal sensor selection and weighting for a variety of purposes without a time-consuming, iterative search by performing sensor calibrations to a known linear or nonlinear model, evaluating the individual sensor s statistics, scoring the individual sensor s overall performance, finding the best sensor array size to maximize class separation, finding optimal weights for the remaining sensor array, estimating limits of detection for the target compounds, evaluating fingerprint distance between group pairs, and finding the best event-detecting sensors.
Interactions and triggering in a 3D rate and state asperity model
NASA Astrophysics Data System (ADS)
Dublanchet, P.; Bernard, P.
2012-12-01
Precise relocation of micro-seismicity and careful analysis of seismic source parameters have progressively imposed the concept of seismic asperities embedded in a creeping fault segment as being one of the most important aspect that should appear in a realistic representation of micro-seismic sources. Another important issue concerning micro-seismic activity is the existence of robust empirical laws describing the temporal and magnitude distribution of earthquakes, such as the Omori law, the distribution of inter-event time and the Gutenberg-Richter law. In this framework, this study aims at understanding statistical properties of earthquakes, by generating synthetic catalogs with a 3D, quasi-dynamic continuous rate and state asperity model, that takes into account a realistic geometry of asperities. Our approach contrasts with ETAS models (Kagan and Knopoff, 1981) usually implemented to produce earthquake catalogs, in the sense that the non linearity observed in rock friction experiments (Dieterich, 1979) is fully taken into account by the use of rate and state friction law. Furthermore, our model differs from discrete models of faults (Ziv and Cochard, 2006) because the continuity allows us to define realistic geometries and distributions of asperities by the assembling of sub-critical computational cells that always fail in a single event. Moreover, this model allows us to adress the question of the influence of barriers and distribution of asperities on the event statistics. After recalling the main observations of asperities in the specific case of Parkfield segment of San-Andreas Fault, we analyse earthquake statistical properties computed for this area. Then, we present synthetic statistics obtained by our model that allow us to discuss the role of barriers on clustering and triggering phenomena among a population of sources. It appears that an effective size of barrier, that depends on its frictional strength, controls the presence or the absence, in the synthetic catalog, of statistical laws that are similar to what is observed for real earthquakes. As an application, we attempt to draw a comparison between synthetic statistics and the observed statistics of Parkfield in order to characterize what could be a realistic frictional model of Parkfield area. More generally, we obtained synthetic statistical properties that are in agreement with power-law decays characterized by exponents that match the observations at a global scale, showing that our mechanical model is able to provide new insights into the understanding of earthquake interaction processes in general.
Curran, Patrick J.; Howard, Andrea L.; Bainter, Sierra; Lane, Stephanie T.; McGinley, James S.
2014-01-01
Objective Although recent statistical and computational developments allow for the empirical testing of psychological theories in ways not previously possible, one particularly vexing challenge remains: how to optimally model the prospective, reciprocal relations between two constructs as they developmentally unfold over time. Several analytic methods currently exist that attempt to model these types of relations, and each approach is successful to varying degrees. However, none provide the unambiguous separation of between-person and within-person components of stability and change over time, components that are often hypothesized to exist in the psychological sciences. The goal of our paper is to propose and demonstrate a novel extension of the multivariate latent curve model to allow for the disaggregation of these effects. Method We begin with a review of the standard latent curve models and describe how these primarily capture between-person differences in change. We then extend this model to allow for regression structures among the time-specific residuals to capture within-person differences in change. Results We demonstrate this model using an artificial data set generated to mimic the developmental relation between alcohol use and depressive symptomatology spanning five repeated measures. Conclusions We obtain a specificity of results from the proposed analytic strategy that are not available from other existing methodologies. We conclude with potential limitations of our approach and directions for future research. PMID:24364798
Quasi-decadal Oscillation in the CMIP5 and CMIP3 Climate Model Simulations: California Case
NASA Astrophysics Data System (ADS)
Wang, J.; Yin, H.; Reyes, E.; Chung, F. I.
2014-12-01
The ongoing three drought years in California are reminding us of two other historical long drought periods: 1987-1992 and 1928-1934. This kind of interannual variability is corresponding to the dominating 7-15 yr quasi-decadal oscillation in precipitation and streamflow in California. When using global climate model projections to assess the climate change impact on water resources planning in California, it is natural to ask if global climate models are able to reproduce the observed interannual variability like 7-15 yr quasi-decadal oscillation. Further spectral analysis to tree ring retrieved precipitation and historical precipitation record proves the existence of 7-15 yr quasi-decadal oscillation in California. But while implementing spectral analysis to all the CMIP5 and CMIP3 global climate model historical simulations using wavelet analysis approach, it was found that only two models in CMIP3 , CGCM 2.3.2a of MRI and NCAP PCM1.0, and only two models in CMIP5, MIROC5 and CESM1-WACCM, have statistically significant 7-15 yr quasi-decadal oscillations in California. More interesting, the existence of 7-15 yr quasi-decadal oscillation in the global climate model simulation is also sensitive to initial conditions. 12-13 yr quasi-decadal oscillation occurs in one ensemble run of CGCM 2.3.2a of MRI but does not exist in the other four ensemble runs.
α -induced reactions on 115In: Cross section measurements and statistical model analysis
NASA Astrophysics Data System (ADS)
Kiss, G. G.; Szücs, T.; Mohr, P.; Török, Zs.; Huszánk, R.; Gyürky, Gy.; Fülöp, Zs.
2018-05-01
Background: α -nucleus optical potentials are basic ingredients of statistical model calculations used in nucleosynthesis simulations. While the nucleon+nucleus optical potential is fairly well known, for the α +nucleus optical potential several different parameter sets exist and large deviations, reaching sometimes even an order of magnitude, are found between the cross section predictions calculated using different parameter sets. Purpose: A measurement of the radiative α -capture and the α -induced reaction cross sections on the nucleus 115In at low energies allows a stringent test of statistical model predictions. Since experimental data are scarce in this mass region, this measurement can be an important input to test the global applicability of α +nucleus optical model potentials and further ingredients of the statistical model. Methods: The reaction cross sections were measured by means of the activation method. The produced activities were determined by off-line detection of the γ rays and characteristic x rays emitted during the electron capture decay of the produced Sb isotopes. The 115In(α ,γ )119Sb and 115In(α ,n )Sb118m reaction cross sections were measured between Ec .m .=8.83 and 15.58 MeV, and the 115In(α ,n )Sb118g reaction was studied between Ec .m .=11.10 and 15.58 MeV. The theoretical analysis was performed within the statistical model. Results: The simultaneous measurement of the (α ,γ ) and (α ,n ) cross sections allowed us to determine a best-fit combination of all parameters for the statistical model. The α +nucleus optical potential is identified as the most important input for the statistical model. The best fit is obtained for the new Atomki-V1 potential, and good reproduction of the experimental data is also achieved for the first version of the Demetriou potentials and the simple McFadden-Satchler potential. The nucleon optical potential, the γ -ray strength function, and the level density parametrization are also constrained by the data although there is no unique best-fit combination. Conclusions: The best-fit calculations allow us to extrapolate the low-energy (α ,γ ) cross section of 115In to the astrophysical Gamow window with reasonable uncertainties. However, still further improvements of the α -nucleus potential are required for a global description of elastic (α ,α ) scattering and α -induced reactions in a wide range of masses and energies.
Empirical intrinsic geometry for nonlinear modeling and time series filtering.
Talmon, Ronen; Coifman, Ronald R
2013-07-30
In this paper, we present a method for time series analysis based on empirical intrinsic geometry (EIG). EIG enables one to reveal the low-dimensional parametric manifold as well as to infer the underlying dynamics of high-dimensional time series. By incorporating concepts of information geometry, this method extends existing geometric analysis tools to support stochastic settings and parametrizes the geometry of empirical distributions. However, the statistical models are not required as priors; hence, EIG may be applied to a wide range of real signals without existing definitive models. We show that the inferred model is noise-resilient and invariant under different observation and instrumental modalities. In addition, we show that it can be extended efficiently to newly acquired measurements in a sequential manner. These two advantages enable us to revisit the Bayesian approach and incorporate empirical dynamics and intrinsic geometry into a nonlinear filtering framework. We show applications to nonlinear and non-Gaussian tracking problems as well as to acoustic signal localization.
Efficient Bayesian mixed model analysis increases association power in large cohorts
Loh, Po-Ru; Tucker, George; Bulik-Sullivan, Brendan K; Vilhjálmsson, Bjarni J; Finucane, Hilary K; Salem, Rany M; Chasman, Daniel I; Ridker, Paul M; Neale, Benjamin M; Berger, Bonnie; Patterson, Nick; Price, Alkes L
2014-01-01
Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts, and may not optimize power. All existing methods require time cost O(MN2) (where N = #samples and M = #SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here, we present a far more efficient mixed model association method, BOLT-LMM, which requires only a small number of O(MN)-time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to nine quantitative traits in 23,294 samples from the Women’s Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for GWAS in large cohorts. PMID:25642633
Empirical microeconomics action functionals
NASA Astrophysics Data System (ADS)
Baaquie, Belal E.; Du, Xin; Tanputraman, Winson
2015-06-01
A statistical generalization of microeconomics has been made in Baaquie (2013), where the market price of every traded commodity, at each instant of time, is considered to be an independent random variable. The dynamics of commodity market prices is modeled by an action functional-and the focus of this paper is to empirically determine the action functionals for different commodities. The correlation functions of the model are defined using a Feynman path integral. The model is calibrated using the unequal time correlation of the market commodity prices as well as their cubic and quartic moments using a perturbation expansion. The consistency of the perturbation expansion is verified by a numerical evaluation of the path integral. Nine commodities drawn from the energy, metal and grain sectors are studied and their market behavior is described by the model to an accuracy of over 90% using only six parameters. The paper empirically establishes the existence of the action functional for commodity prices that was postulated to exist in Baaquie (2013).
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
Evaluation of cancer mortality in a cohort of workers exposed to low-level radiation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lea, C.S.
1995-12-01
The purpose of this dissertation was to re-analyze existing data to explore methodologic approaches that may determine whether excess cancer mortality in the ORNL cohort can be explained by time-related factors not previously considered; grouping of cancer outcomes; selection bias due to choice of method selected to incorporate an empirical induction period; or the type of statistical model chosen.
Symmetry and Degeneracy in Quantum Mechanics. Self-Duality in Finite Spin Systems
ERIC Educational Resources Information Center
Osacar, C.; Pacheco, A. F.
2009-01-01
The symmetry of self-duality (Savit 1980 "Rev. Mod. Phys. 52" 453) of some models of statistical mechanics and quantum field theory is discussed for finite spin blocks of the Ising chain in a transverse magnetic field. The existence of this symmetry in a specific type of these blocks, and not in others, is manifest by the degeneracy of their…
Silver, Matt; Montana, Giovanni
2012-01-01
Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways. We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights” (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets. In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small. PMID:22499682
Representation of the contextual statistical model by hyperbolic amplitudes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khrennikov, Andrei
We continue the development of a so-called contextual statistical model (here context has the meaning of a complex of physical conditions). It is shown that, besides contexts producing the conventional trigonometric cos-interference, there exist contexts producing the hyperbolic cos-interference. Starting with the corresponding interference formula of total probability we represent such contexts by hyperbolic probabilistic amplitudes or in the abstract formalism by normalized vectors of a hyperbolic analogue of the Hilbert space. There is obtained a hyperbolic Born's rule. Incompatible observables are represented by noncommutative operators. This paper can be considered as the first step towards hyperbolic quantum probability. Wemore » also discuss possibilities of experimental verification of hyperbolic quantum mechanics: in physics of elementary particles, string theory as well as in experiments with nonphysical systems, e.g., in psychology, cognitive sciences, and economy.« less
Representation of the contextual statistical model by hyperbolic amplitudes
NASA Astrophysics Data System (ADS)
Khrennikov, Andrei
2005-06-01
We continue the development of a so-called contextual statistical model (here context has the meaning of a complex of physical conditions). It is shown that, besides contexts producing the conventional trigonometric cos-interference, there exist contexts producing the hyperbolic cos-interference. Starting with the corresponding interference formula of total probability we represent such contexts by hyperbolic probabilistic amplitudes or in the abstract formalism by normalized vectors of a hyperbolic analogue of the Hilbert space. There is obtained a hyperbolic Born's rule. Incompatible observables are represented by noncommutative operators. This paper can be considered as the first step towards hyperbolic quantum probability. We also discuss possibilities of experimental verification of hyperbolic quantum mechanics: in physics of elementary particles, string theory as well as in experiments with nonphysical systems, e.g., in psychology, cognitive sciences, and economy.
Optimization models for degrouping population data.
Bermúdez, Silvia; Blanquero, Rafael
2016-07-01
In certain countries population data are available in grouped form only, usually as quinquennial age groups plus a large open-ended range for the elderly. However, official statistics call for data by individual age since many statistical operations, such as the calculation of demographic indicators, require the use of ungrouped population data. In this paper a number of mathematical models are proposed which, starting from population data given in age groups, enable these ranges to be degrouped into age-specific population values without leaving a fractional part. Unlike other existing procedures for disaggregating demographic data, ours makes it possible to process several years' data simultaneously in a coherent way, and provides accurate results longitudinally as well as transversally. This procedure is also shown to be helpful in dealing with degrouped population data affected by noise, such as those affected by the age-heaping phenomenon.
Nonlinear scalar forcing based on a reaction analogy
NASA Astrophysics Data System (ADS)
Daniel, Don; Livescu, Daniel
2017-11-01
We present a novel reaction analogy (RA) based forcing method for generating stationary passive scalar fields in incompressible turbulence. The new method can produce more general scalar PDFs (e.g. double-delta) than current methods, while ensuring that scalar fields remain bounded, unlike existent forcing methodologies that can potentially violate naturally existing bounds. Such features are useful for generating initial fields in non-premixed combustion or for studying non-Gaussian scalar turbulence. The RA method mathematically models hypothetical chemical reactions that convert reactants in a mixed state back into its pure unmixed components. Various types of chemical reactions are formulated and the corresponding mathematical expressions derived. For large values of the scalar dissipation rate, the method produces statistically steady double-delta scalar PDFs. Gaussian scalar statistics are recovered for small values of the scalar dissipation rate. In contrast, classical forcing methods consistently produce unimodal Gaussian scalar fields. The ability of the new method to produce fully developed scalar fields is discussed using 2563, 5123, and 10243 periodic box simulations.
Zhang, Pan; Moore, Cristopher
2014-01-01
Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory ‘‘communities’’ in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature and using an efficient belief propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all of the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods. PMID:25489096
SCOUT: A Fast Monte-Carlo Modeling Tool of Scintillation Camera Output
Hunter, William C. J.; Barrett, Harrison H.; Lewellen, Thomas K.; Miyaoka, Robert S.; Muzi, John P.; Li, Xiaoli; McDougald, Wendy; MacDonald, Lawrence R.
2011-01-01
We have developed a Monte-Carlo photon-tracking and readout simulator called SCOUT to study the stochastic behavior of signals output from a simplified rectangular scintillation-camera design. SCOUT models the salient processes affecting signal generation, transport, and readout. Presently, we compare output signal statistics from SCOUT to experimental results for both a discrete and a monolithic camera. We also benchmark the speed of this simulation tool and compare it to existing simulation tools. We find this modeling tool to be relatively fast and predictive of experimental results. Depending on the modeled camera geometry, we found SCOUT to be 4 to 140 times faster than other modeling tools. PMID:22072297
SCOUT: a fast Monte-Carlo modeling tool of scintillation camera output†
Hunter, William C J; Barrett, Harrison H.; Muzi, John P.; McDougald, Wendy; MacDonald, Lawrence R.; Miyaoka, Robert S.; Lewellen, Thomas K.
2013-01-01
We have developed a Monte-Carlo photon-tracking and readout simulator called SCOUT to study the stochastic behavior of signals output from a simplified rectangular scintillation-camera design. SCOUT models the salient processes affecting signal generation, transport, and readout of a scintillation camera. Presently, we compare output signal statistics from SCOUT to experimental results for both a discrete and a monolithic camera. We also benchmark the speed of this simulation tool and compare it to existing simulation tools. We find this modeling tool to be relatively fast and predictive of experimental results. Depending on the modeled camera geometry, we found SCOUT to be 4 to 140 times faster than other modeling tools. PMID:23640136
NASA Astrophysics Data System (ADS)
Hinckley, Sarah; Parada, Carolina; Horne, John K.; Mazur, Michael; Woillez, Mathieu
2016-10-01
Biophysical individual-based models (IBMs) have been used to study aspects of early life history of marine fishes such as recruitment, connectivity of spawning and nursery areas, and marine reserve design. However, there is no consistent approach to validating the spatial outputs of these models. In this study, we hope to rectify this gap. We document additions to an existing individual-based biophysical model for Alaska walleye pollock (Gadus chalcogrammus), some simulations made with this model and methods that were used to describe and compare spatial output of the model versus field data derived from ichthyoplankton surveys in the Gulf of Alaska. We used visual methods (e.g. distributional centroids with directional ellipses), several indices (such as a Normalized Difference Index (NDI), and an Overlap Coefficient (OC), and several statistical methods: the Syrjala method, the Getis-Ord Gi* statistic, and a geostatistical method for comparing spatial indices. We assess the utility of these different methods in analyzing spatial output and comparing model output to data, and give recommendations for their appropriate use. Visual methods are useful for initial comparisons of model and data distributions. Metrics such as the NDI and OC give useful measures of co-location and overlap, but care must be taken in discretizing the fields into bins. The Getis-Ord Gi* statistic is useful to determine the patchiness of the fields. The Syrjala method is an easily implemented statistical measure of the difference between the fields, but does not give information on the details of the distributions. Finally, the geostatistical comparison of spatial indices gives good information of details of the distributions and whether they differ significantly between the model and the data. We conclude that each technique gives quite different information about the model-data distribution comparison, and that some are easy to apply and some more complex. We also give recommendations for a multistep process to validate spatial output from IBMs.
Dynamic prediction in functional concurrent regression with an application to child growth.
Leroux, Andrew; Xiao, Luo; Crainiceanu, Ciprian; Checkley, William
2018-04-15
In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject-specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject-specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Faruk, Alfensi
2018-03-01
Survival analysis is a branch of statistics, which is focussed on the analysis of time- to-event data. In multivariate survival analysis, the proportional hazards (PH) is the most popular model in order to analyze the effects of several covariates on the survival time. However, the assumption of constant hazards in PH model is not always satisfied by the data. The violation of the PH assumption leads to the misinterpretation of the estimation results and decreasing the power of the related statistical tests. On the other hand, the accelerated failure time (AFT) models do not assume the constant hazards in the survival data as in PH model. The AFT models, moreover, can be used as the alternative to PH model if the constant hazards assumption is violated. The objective of this research was to compare the performance of PH model and the AFT models in analyzing the significant factors affecting the first birth interval (FBI) data in Indonesia. In this work, the discussion was limited to three AFT models which were based on Weibull, exponential, and log-normal distribution. The analysis by using graphical approach and a statistical test showed that the non-proportional hazards exist in the FBI data set. Based on the Akaike information criterion (AIC), the log-normal AFT model was the most appropriate model among the other considered models. Results of the best fitted model (log-normal AFT model) showed that the covariates such as women’s educational level, husband’s educational level, contraceptive knowledge, access to mass media, wealth index, and employment status were among factors affecting the FBI in Indonesia.
Wind Energy Facilities and Residential Properties: The Effect of Proximity and View on Sales Prices
DOE Office of Scientific and Technical Information (OSTI.GOV)
San Diego State University; Bard Center for Environmental Policy at Bard College; Hoen, Ben
2011-06-23
With increasing numbers of communities considering wind power developments, empirical investigations regarding related community concerns are needed. One such concern is that proximate property values may be adversely affected, yet relatively little research exists on the subject. The present research investigates roughly 7,500 sales of single-family homes surrounding 24 existing U.S. wind facilities. Across four different hedonic models, and a variety of robustness tests, the results are consistent: neither the view of the wind facilities nor the distance of the home to those facilities is found to have a statistically significant effect on sales prices, yet further research is warranted.
Wind Energy Facilities and Residential Properties: The Effect of Proximity and View on Sales Prices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoen, Ben; Wiser, Ryan; Cappers, Peter
2010-04-01
With an increasing number of communities considering nearby wind power developments, there is a need to empirically investigate community concerns about wind project development. One such concern is that property values may be adversely affected by wind energy facilities, and relatively little research exists on the subject. The present research investigates roughly 7,500 sales of single-family homes surrounding 24 existing U.S. wind facilities. Across four different hedonic models the results are consistent: neither the view of the wind facilities nor the distance of the home to those facilities is found to have a statistically significant effect on home sales prices.
Design of a Ka-Band Propagation Terminal for Atmospheric Measurements in Polar Regions
NASA Technical Reports Server (NTRS)
Houts, Jacquelynne R.; Nessel, James A.; Zemba, Michael J.
2016-01-01
This paper describes the design and performance of a Ka-Band beacon receiver developed at NASA Glenn Research Center (GRC) that will be installed alongside an existing Ka-Band Radiometer [2] located at the east end of the Svalbard Near Earth Network (NEN) complex. The goal of this experiment is to characterize rain fade attenuation to improve the performance of existing statistical rain attenuation models. The ground terminal developed by NASA GRC utilizes an FFT-based frequency estimation [3] receiver capable of characterizing total path attenuation effects due to gaseous absorption, clouds, rain, and scintillation by directly measuring the propagated signal from the satellite Thor 7.
Design of a Ka-band Propagation Terminal for Atmospheric Measurements in Polar Regions
NASA Technical Reports Server (NTRS)
Houts, Jacquelynne R.; Nessel, James A.; Zemba, Michael J.
2016-01-01
This paper describes the design and performance of a Ka-Band beacon receiver developed at NASA Glenn Research Center (GRC) that will be installed alongside an existing Ka-Band Radiometer located at the east end of the Svalbard Near Earth Network (NEN) complex. The goal of this experiment is to characterize rain fade attenuation to improve the performance of existing statistical rain attenuation models. The ground terminal developed by NASA GRC utilizes an FFT-based frequency estimation receiver capable of characterizing total path attenuation effects due to gaseous absorption, clouds, rain, and scintillation by directly measuring the propagated signal from the satellite Thor 7.
Micro-mechanics of hydro-mechanical coupled processes during hydraulic fracturing in sandstone
NASA Astrophysics Data System (ADS)
Caulk, R.; Tomac, I.
2017-12-01
This contribution presents micro-mechanical study of hydraulic fracture initiation and propagation in sandstone. The Discrete Element Method (DEM) Yade software is used as a tool to model fully coupled hydro-mechanical behavior of the saturated sandstone under pressures typical for deep geo-reservoirs. Heterogeneity of sandstone strength tensile and shear parameters are introduced using statistical representation of cathodoluminiscence (CL) sandstone rock images. Weibull distribution of statistical parameter values was determined as a best match of the CL scans of sandstone grains and cement between grains. Results of hydraulic fracturing stimulation from the well bore indicate significant difference between models with the bond strengths informed from CL scans and uniform homogeneous representation of sandstone parameters. Micro-mechanical insight reveals formed hydraulic fracture typical for mode I or tensile cracking in both cases. However, the shear micro-cracks are abundant in the CL informed model while they are absent in the standard model with uniform strength distribution. Most of the mode II cracks, or shear micro-cracks, are not part of the main hydraulic fracture and occur in the near-tip and near-fracture areas. The position and occurrence of the shear micro-cracks is characterized as secondary effect which dissipates the hydraulic fracturing energy. Additionally, the shear micro-crack locations qualitatively resemble acoustic emission cloud of shear cracks frequently observed in hydraulic fracturing, and sometimes interpreted as re-activation of existing fractures. Clearly, our model does not contain pre-existing cracks and has continuous nature prior to fracturing. This observation is novel and interesting and is quantified in the paper. The shear particle contact forces field reveals significant relaxation compared to the model with uniform strength distribution.
NASA Astrophysics Data System (ADS)
Acharya, S.; Kaplan, D. A.; Casey, S.; Cohen, M. J.; Jawitz, J. W.
2015-05-01
Self-organized landscape patterning can arise in response to multiple processes. Discriminating among alternative patterning mechanisms, particularly where experimental manipulations are untenable, requires process-based models. Previous modeling studies have attributed patterning in the Everglades (Florida, USA) to sediment redistribution and anisotropic soil hydraulic properties. In this work, we tested an alternate theory, the self-organizing-canal (SOC) hypothesis, by developing a cellular automata model that simulates pattern evolution via local positive feedbacks (i.e., facilitation) coupled with a global negative feedback based on hydrology. The model is forced by global hydroperiod that drives stochastic transitions between two patch types: ridge (higher elevation) and slough (lower elevation). We evaluated model performance using multiple criteria based on six statistical and geostatistical properties observed in reference portions of the Everglades landscape: patch density, patch anisotropy, semivariogram ranges, power-law scaling of ridge areas, perimeter area fractal dimension, and characteristic pattern wavelength. Model results showed strong statistical agreement with reference landscapes, but only when anisotropically acting local facilitation was coupled with hydrologic global feedback, for which several plausible mechanisms exist. Critically, the model correctly generated fractal landscapes that had no characteristic pattern wavelength, supporting the invocation of global rather than scale-specific negative feedbacks.
NASA Astrophysics Data System (ADS)
Acharya, S.; Kaplan, D. A.; Casey, S.; Cohen, M. J.; Jawitz, J. W.
2015-01-01
Self-organized landscape patterning can arise in response to multiple processes. Discriminating among alternative patterning mechanisms, particularly where experimental manipulations are untenable, requires process-based models. Previous modeling studies have attributed patterning in the Everglades (Florida, USA) to sediment redistribution and anisotropic soil hydraulic properties. In this work, we tested an alternate theory, the self-organizing canal (SOC) hypothesis, by developing a cellular automata model that simulates pattern evolution via local positive feedbacks (i.e., facilitation) coupled with a global negative feedback based on hydrology. The model is forced by global hydroperiod that drives stochastic transitions between two patch types: ridge (higher elevation) and slough (lower elevation). We evaluated model performance using multiple criteria based on six statistical and geostatistical properties observed in reference portions of the Everglades landscape: patch density, patch anisotropy, semivariogram ranges, power-law scaling of ridge areas, perimeter area fractal dimension, and characteristic pattern wavelength. Model results showed strong statistical agreement with reference landscapes, but only when anisotropically acting local facilitation was coupled with hydrologic global feedback, for which several plausible mechanisms exist. Critically, the model correctly generated fractal landscapes that had no characteristic pattern wavelength, supporting the invocation of global rather than scale-specific negative feedbacks.
Statistical validity of using ratio variables in human kinetics research.
Liu, Yuanlong; Schutz, Robert W
2003-09-01
The purposes of this study were to investigate the validity of the simple ratio and three alternative deflation models and examine how the variation of the numerator and denominator variables affects the reliability of a ratio variable. A simple ratio and three alternative deflation models were fitted to four empirical data sets, and common criteria were applied to determine the best model for deflation. Intraclass correlation was used to examine the component effect on the reliability of a ratio variable. The results indicate that the validity, of a deflation model depends on the statistical characteristics of the particular component variables used, and an optimal deflation model for all ratio variables may not exist. Therefore, it is recommended that different models be fitted to each empirical data set to determine the best deflation model. It was found that the reliability of a simple ratio is affected by the coefficients of variation and the within- and between-trial correlations between the numerator and denominator variables. It was recommended that researchers should compute the reliability of the derived ratio scores and not assume that strong reliabilities in the numerator and denominator measures automatically lead to high reliability in the ratio measures.
A model-based approach to wildland fire reconstruction using sediment charcoal records
Itter, Malcolm S.; Finley, Andrew O.; Hooten, Mevin B.; Higuera, Philip E.; Marlon, Jennifer R.; Kelly, Ryan; McLachlan, Jason S.
2017-01-01
Lake sediment charcoal records are used in paleoecological analyses to reconstruct fire history, including the identification of past wildland fires. One challenge of applying sediment charcoal records to infer fire history is the separation of charcoal associated with local fire occurrence and charcoal originating from regional fire activity. Despite a variety of methods to identify local fires from sediment charcoal records, an integrated statistical framework for fire reconstruction is lacking. We develop a Bayesian point process model to estimate the probability of fire associated with charcoal counts from individual-lake sediments and estimate mean fire return intervals. A multivariate extension of the model combines records from multiple lakes to reduce uncertainty in local fire identification and estimate a regional mean fire return interval. The univariate and multivariate models are applied to 13 lakes in the Yukon Flats region of Alaska. Both models resulted in similar mean fire return intervals (100–350 years) with reduced uncertainty under the multivariate model due to improved estimation of regional charcoal deposition. The point process model offers an integrated statistical framework for paleofire reconstruction and extends existing methods to infer regional fire history from multiple lake records with uncertainty following directly from posterior distributions.
Park, Jungkap; Saitou, Kazuhiro
2014-09-18
Multibody potentials accounting for cooperative effects of molecular interactions have shown better accuracy than typical pairwise potentials. The main challenge in the development of such potentials is to find relevant structural features that characterize the tightly folded proteins. Also, the side-chains of residues adopt several specific, staggered conformations, known as rotamers within protein structures. Different molecular conformations result in different dipole moments and induce charge reorientations. However, until now modeling of the rotameric state of residues had not been incorporated into the development of multibody potentials for modeling non-bonded interactions in protein structures. In this study, we develop a new multibody statistical potential which can account for the influence of rotameric states on the specificity of atomic interactions. In this potential, named "rotamer-dependent atomic statistical potential" (ROTAS), the interaction between two atoms is specified by not only the distance and relative orientation but also by two state parameters concerning the rotameric state of the residues to which the interacting atoms belong. It was clearly found that the rotameric state is correlated to the specificity of atomic interactions. Such rotamer-dependencies are not limited to specific type or certain range of interactions. The performance of ROTAS was tested using 13 sets of decoys and was compared to those of existing atomic-level statistical potentials which incorporate orientation-dependent energy terms. The results show that ROTAS performs better than other competing potentials not only in native structure recognition, but also in best model selection and correlation coefficients between energy and model quality. A new multibody statistical potential, ROTAS accounting for the influence of rotameric states on the specificity of atomic interactions was developed and tested on decoy sets. The results show that ROTAS has improved ability to recognize native structure from decoy models compared to other potentials. The effectiveness of ROTAS may provide insightful information for the development of many applications which require accurate side-chain modeling such as protein design, mutation analysis, and docking simulation.
Localized Smart-Interpretation
NASA Astrophysics Data System (ADS)
Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom
2014-05-01
The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f(d,m) successfully has been inferred, we are able to simulate how the geological expert would perform an interpretation given some external information m, through f(d|m). We will demonstrate this method applied on geological interpretation and densely sampled airborne electromagnetic data. In short, our goal is to build a statistical model describing how a geological expert performs geological interpretation given some geophysical data. We then wish to use this statistical model to perform semi automatic interpretation, everywhere where such geophysical data exist, in a manner consistent with the choices made by a geological expert. Benefits of such a statistical model are that 1. it provides a quantification of how a geological expert performs interpretation based on available diverse data 2. all available geophysical information can be used 3. it allows much faster interpretation of large data sets.
[Mechanism study on leptin resistance in lung cancer cachexia rats treated by Xiaoyan Decoction].
Zhang, Yun-Chao; Jia, Ying-Jie; Yang, Pei-Ying; Zhang, Xing; Li, Xiao-Jiang; Zhang, Ying; Zhu, Jin-Li; Sun, Yi-Yu; Chen, Jun; Duan, Hao-Guo; Guo, Hua; Li, Chao
2014-12-01
To study the leptin resistance mechanism of Xiaoyan Decoction (XD) in lung cancer cachexia (LCC) rats. An LCC rat model was established. Totally 40 rats were randomly divided into the normal control group, the LCC model group, the XD group, and the positive control group, 10 in each group. After LCC model was set up, rats in the LCC model group were administered with normal saline, 2 mL each time. Rats in the XD group were administered with XD at the daily dose of 2 mL. Those in the positive control group were administered with Medroxyprogesterone Acetate suspension (20 mg/kg) by gastrogavage at the daily dose of 2 mL. All medication lasted for 14 days. The general condition and tumor growth were observed. Serum levels of leptin and leptin receptor in the hypothalamus were detected using enzyme-linked immunosorbent assay. Contents of neuropeptide Y (NPY) and anorexia for genomic POMC were detected using real-time PCR technique. Serum leptin levels were lower in the LCC model group than in the normal control group with statistical significance (P < 0.05). Compared with the LCC model groups, serum leptin levels significantly increased in the XD group (P < 0.01). Leptin receptor levels in the hypothalamus increased significantly in the LCC model group (P < 0.01). Increased receptor levels in the LCC model group indicated that either XD or Medroxyprogesterone Acetate could effectively reduce levels of leptin receptor with statistical significance (P < 0.01). There was also statistical difference between the XD group and the positive control group (P < 0.05). Contents of NPY was higher in the LCC model group than in the other groups with statistical difference (P < 0.05). There was no statistical difference in NPY between the normal control group and the rest 2 treatment groups (P > 0.05). There was statistical difference in POMC between the normal control group and the LCC model group (P < 0.05). POMC could be decreased in the XD group and the positive control group with statistical significance (P < 0.05), and it was more obviously decreased in the XD group (P < 0.05). Leptin resistance existed in LCC rats. XD could increase serum leptin levels and reduce leptin receptor levels in the hypothalamus. LCC could be improved by elevating NPY contents in the hypothalamus and reducing POMC contents, promoting the appetite, and increasing food intake from the periphery pathway and the central pathway.
Bromaghin, Jeffrey F.; McDonald, Trent L.; Amstrup, Steven C.
2013-01-01
Mark-recapture models are extensively used in quantitative population ecology, providing estimates of population vital rates, such as survival, that are difficult to obtain using other methods. Vital rates are commonly modeled as functions of explanatory covariates, adding considerable flexibility to mark-recapture models, but also increasing the subjectivity and complexity of the modeling process. Consequently, model selection and the evaluation of covariate structure remain critical aspects of mark-recapture modeling. The difficulties involved in model selection are compounded in Cormack-Jolly- Seber models because they are composed of separate sub-models for survival and recapture probabilities, which are conceptualized independently even though their parameters are not statistically independent. The construction of models as combinations of sub-models, together with multiple potential covariates, can lead to a large model set. Although desirable, estimation of the parameters of all models may not be feasible. Strategies to search a model space and base inference on a subset of all models exist and enjoy widespread use. However, even though the methods used to search a model space can be expected to influence parameter estimation, the assessment of covariate importance, and therefore the ecological interpretation of the modeling results, the performance of these strategies has received limited investigation. We present a new strategy for searching the space of a candidate set of Cormack-Jolly-Seber models and explore its performance relative to existing strategies using computer simulation. The new strategy provides an improved assessment of the importance of covariates and covariate combinations used to model survival and recapture probabilities, while requiring only a modest increase in the number of models on which inference is based in comparison to existing techniques.
Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling.
Nord, Camilla L; Valton, Vincent; Wood, John; Roiser, Jonathan P
2017-08-23
Recently, evidence for endemically low statistical power has cast neuroscience findings into doubt. If low statistical power plagues neuroscience, then this reduces confidence in the reported effects. However, if statistical power is not uniformly low, then such blanket mistrust might not be warranted. Here, we provide a different perspective on this issue, analyzing data from an influential study reporting a median power of 21% across 49 meta-analyses (Button et al., 2013). We demonstrate, using Gaussian mixture modeling, that the sample of 730 studies included in that analysis comprises several subcomponents so the use of a single summary statistic is insufficient to characterize the nature of the distribution. We find that statistical power is extremely low for studies included in meta-analyses that reported a null result and that it varies substantially across subfields of neuroscience, with particularly low power in candidate gene association studies. Therefore, whereas power in neuroscience remains a critical issue, the notion that studies are systematically underpowered is not the full story: low power is far from a universal problem. SIGNIFICANCE STATEMENT Recently, researchers across the biomedical and psychological sciences have become concerned with the reliability of results. One marker for reliability is statistical power: the probability of finding a statistically significant result given that the effect exists. Previous evidence suggests that statistical power is low across the field of neuroscience. Our results present a more comprehensive picture of statistical power in neuroscience: on average, studies are indeed underpowered-some very seriously so-but many studies show acceptable or even exemplary statistical power. We show that this heterogeneity in statistical power is common across most subfields in neuroscience. This new, more nuanced picture of statistical power in neuroscience could affect not only scientific understanding, but potentially policy and funding decisions for neuroscience research. Copyright © 2017 Nord, Valton et al.
No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.
Li, Xuelong; Guo, Qun; Lu, Xiaoqiang
2016-05-13
It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.
Statistical testing of association between menstruation and migraine.
Barra, Mathias; Dahl, Fredrik A; Vetvik, Kjersti G
2015-02-01
To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation. Menstrually related migraine (MRM) affects about 20% of female migraineurs in the general population. The exact pathophysiological link from menstruation to migraine is hypothesized to be through fluctuations in female reproductive hormones, but the exact mechanisms remain unknown. Therefore, the main diagnostic criterion today is concurrency of migraine attacks with menstruation. Methods aiming to exclude spurious associations are wanted, so that further research into these mechanisms can be performed on a population with a true association. The statistical method is based on a simple two-parameter null model of MRM (which allows for simulation modeling), and Fisher's exact test (with mid-p correction) applied to standard 2 × 2 contingency tables derived from the patients' headache diaries. Our method is a corrected version of a previously published flawed framework. To our best knowledge, no other published methods for establishing a menstruation-migraine association by statistical means exist today. The probabilistic methodology shows good performance when subjected to receiver operator characteristic curve analysis. Quick reference cutoff values for the clinical setting were tabulated for assessing association given a patient's headache history. In this paper, we correct a proposed method for establishing association between menstruation and migraine by statistical methods. We conclude that the proposed standard of 3-cycle observations prior to setting an MRM diagnosis should be extended with at least one perimenstrual window to obtain sufficient information for statistical processing. © 2014 American Headache Society.
Zhang, Qin; Yao, Quanying
2018-05-01
The dynamic uncertain causality graph (DUCG) is a newly presented framework for uncertain causality representation and probabilistic reasoning. It has been successfully applied to online fault diagnoses of large, complex industrial systems, and decease diagnoses. This paper extends the DUCG to model more complex cases than what could be previously modeled, e.g., the case in which statistical data are in different groups with or without overlap, and some domain knowledge and actions (new variables with uncertain causalities) are introduced. In other words, this paper proposes to use -mode, -mode, and -mode of the DUCG to model such complex cases and then transform them into either the standard -mode or the standard -mode. In the former situation, if no directed cyclic graph is involved, the transformed result is simply a Bayesian network (BN), and existing inference methods for BNs can be applied. In the latter situation, an inference method based on the DUCG is proposed. Examples are provided to illustrate the methodology.
Ramírez-Rivera, Emmanuel de Jesús; Lopez-Collado, Jose; Díaz-Rivera, Pablo; Ortega-Jiménez, Eusebio; Torres-Hernández, Glafiro; Jacinto-Padilla, Jazmín; Herman-Lara, Erasmo
2017-04-01
This research identifies favorable areas for goat production systems in the state of Veracruz, Mexico. Through the use of the analytic hierarchy process, layers of biophysical and soil information were combined to generate a model of favorability. Model validation was performed by calculating the area under the curve, the true skill statistic, and a qualitative comparison with census records. The results showed the existence of regions with high (4494.3 km 2 ) and moderate (2985.8 km 2 ) favorability, and these areas correspond to 6.25 and 4.15%, respectively, of the state territory and are located in the regions of Sierra de Huayacocotla, Perote, and Orizaba. These regions are characterized as mountainous and having predominantly temperate-wet or cold climates, and having montane mesophilic forests, containing pine, fir, and desert scrub. The reliability of the distribution model was supported by the area under the curve value (0.96), the true skill statistic (0.86), and consistency with census records.
Bayesian Bigot? Statistical Discrimination, Stereotypes, and Employer Decision Making
Pager, Devah; Karafin, Diana
2010-01-01
Much of the debate over the underlying causes of discrimination centers on the rationality of employer decision making. Economic models of statistical discrimination emphasize the cognitive utility of group estimates as a means of dealing with the problems of uncertainty. Sociological and social-psychological models, by contrast, question the accuracy of group-level attributions. Although mean differences may exist between groups on productivity-related characteristics, these differences are often inflated in their application, leading to much larger differences in individual evaluations than would be warranted by actual group-level trait distributions. In this study, the authors examine the nature of employer attitudes about black and white workers and the extent to which these views are calibrated against their direct experiences with workers from each group. They use data from fifty-five in-depth interviews with hiring managers to explore employers’ group-level attributions and their direct observations to develop a model of attitude formation and employer learning. PMID:20686633
Lee, Thomas; Bocquet, Lydéric; Coasne, Benoit
2016-01-01
Hydrocarbon recovery from unconventional reservoirs (shale gas) is debated due to its environmental impact and uncertainties on its predictability. But a lack of scientific knowledge impedes the proposal of reliable alternatives. The requirement of hydrofracking, fast recovery decay and ultra-low permeability—inherent to their nanoporosity—are specificities of these reservoirs, which challenge existing frameworks. Here we use molecular simulation and statistical models to show that recovery is hampered by interfacial effects at the wet kerogen surface. Recovery is shown to be thermally activated with an energy barrier modelled from the interface wetting properties. We build a statistical model of the recovery kinetics with a two-regime decline that is consistent with published data: a short time decay, consistent with Darcy description, followed by a fast algebraic decay resulting from increasingly unreachable energy barriers. Replacing water by CO2 or propane eliminates the barriers, therefore raising hopes for clean/efficient recovery. PMID:27327254
Federal Register 2010, 2011, 2012, 2013, 2014
2013-01-07
... DEPARTMENT OF JUSTICE [OMB Number 1121-0094] Agency Information Collection Activities: Existing...: 60-day notice. The Department of Justice (DOJ), Bureau of Justice Statistics, will be submitting the... information, please contact Todd D. Minton, Bureau of Justice Statistics, 810 Seventh Street NW., Washington...
The basis function approach for modeling autocorrelation in ecological data.
Hefley, Trevor J; Broms, Kristin M; Brost, Brian M; Buderman, Frances E; Kay, Shannon L; Scharf, Henry R; Tipton, John R; Williams, Perry J; Hooten, Mevin B
2017-03-01
Analyzing ecological data often requires modeling the autocorrelation created by spatial and temporal processes. Many seemingly disparate statistical methods used to account for autocorrelation can be expressed as regression models that include basis functions. Basis functions also enable ecologists to modify a wide range of existing ecological models in order to account for autocorrelation, which can improve inference and predictive accuracy. Furthermore, understanding the properties of basis functions is essential for evaluating the fit of spatial or time-series models, detecting a hidden form of collinearity, and analyzing large data sets. We present important concepts and properties related to basis functions and illustrate several tools and techniques ecologists can use when modeling autocorrelation in ecological data. © 2016 by the Ecological Society of America.
Certification of medical librarians, 1949--1977 statistical analysis.
Schmidt, D
1979-01-01
The Medical Library Association's Code for Training and Certification of Medical Librarians was in effect from 1949 to August 1977, a period during which 3,216 individuals were certified. Statistics on each type of certificate granted each year are provided. Because 54.5% of those granted certification were awarded it in the last three-year, two-month period of the code's existence, these applications are reviewed in greater detail. Statistics on each type of certificate granted each year are provided. Because 54.5% of those granted certification were awarded it in the last three-year, two-month period of the code's existence, these applications are reviewed in greater detail. Statistics on MLA membership, sex, residence, library school, and method of meeting requirements are detailed. Questions relating to certification under the code now in existence are raised.
Certification of medical librarians, 1949--1977 statistical analysis.
Schmidt, D
1979-01-01
The Medical Library Association's Code for Training and Certification of Medical Librarians was in effect from 1949 to August 1977, a period during which 3,216 individuals were certified. Statistics on each type of certificate granted each year are provided. Because 54.5% of those granted certification were awarded it in the last three-year, two-month period of the code's existence, these applications are reviewed in greater detail. Statistics on each type of certificate granted each year are provided. Because 54.5% of those granted certification were awarded it in the last three-year, two-month period of the code's existence, these applications are reviewed in greater detail. Statistics on MLA membership, sex, residence, library school, and method of meeting requirements are detailed. Questions relating to certification under the code now in existence are raised. PMID:427287
Huggins, P; Johnson, CK; Schoergendorfer, A; Putta, S; Bathke, AC; Stromberg, AJ; Voss, SR
2011-01-01
The Mexican axolotl (Ambystoma mexicanum) presents an excellent model to investigate mechanisms of brain development that are conserved among vertebrates. In particular, metamorphic changes of the brain can be induced in free-living aquatic juveniles and adults by simply adding thyroid hormone (T4) to rearing water. Whole brains were sampled from juvenile A. mexicanum that were exposed to 0, 8, and 18 days of 50 nM T4, and these were used to isolate RNA and make normalized cDNA libraries for 454 DNA sequencing. A total of 1,875,732 high quality cDNA reads were assembled with existing ESTs to obtain 5,884 new contigs for human RefSeq protein models, and to develop a custom Affymetrix gene expression array (Amby_002) with approximately 20,000 probe sets. The Amby_002 array was used to identify 303 transcripts that differed statistically (p < 0.05, fold change > 1.5) as a function of days of T4 treatment. Further statistical analyses showed that Amby_002 performed concordantly in comparison to an existing, small format expression array. This study introduces a new A. mexicanum microarray resource for the community and the first lists of T4-responsive genes from the brain of a salamander amphibian. PMID:21457787
Huggins, P; Johnson, C K; Schoergendorfer, A; Putta, S; Bathke, A C; Stromberg, A J; Voss, S R
2012-01-01
The Mexican axolotl (Ambystoma mexicanum) presents an excellent model to investigate mechanisms of brain development that are conserved among vertebrates. In particular, metamorphic changes of the brain can be induced in free-living aquatic juveniles and adults by simply adding thyroid hormone (T4) to rearing water. Whole brains were sampled from juvenile A. mexicanum that were exposed to 0, 8, and 18 days of 50 nM T4, and these were used to isolate RNA and make normalized cDNA libraries for 454 DNA sequencing. A total of 1,875,732 high quality cDNA reads were assembled with existing ESTs to obtain 5884 new contigs for human RefSeq protein models, and to develop a custom Affymetrix gene expression array (Amby_002) with approximately 20,000 probe sets. The Amby_002 array was used to identify 303 transcripts that differed statistically (p<0.05, fold change >1.5) as a function of days of T4 treatment. Further statistical analyses showed that Amby_002 performed concordantly in comparison to an existing, small format expression array. This study introduces a new A. mexicanum microarray resource for the community and the first lists of T4-responsive genes from the brain of a salamander amphibian. Copyright © 2011 Elsevier Inc. All rights reserved.
A sup-score test for the cure fraction in mixture models for long-term survivors.
Hsu, Wei-Wen; Todem, David; Kim, KyungMann
2016-12-01
The evaluation of cure fractions in oncology research under the well known cure rate model has attracted considerable attention in the literature, but most of the existing testing procedures have relied on restrictive assumptions. A common assumption has been to restrict the cure fraction to a constant under alternatives to homogeneity, thereby neglecting any information from covariates. This article extends the literature by developing a score-based statistic that incorporates covariate information to detect cure fractions, with the existing testing procedure serving as a special case. A complication of this extension, however, is that the implied hypotheses are not typical and standard regularity conditions to conduct the test may not even hold. Using empirical processes arguments, we construct a sup-score test statistic for cure fractions and establish its limiting null distribution as a functional of mixtures of chi-square processes. In practice, we suggest a simple resampling procedure to approximate this limiting distribution. Our simulation results show that the proposed test can greatly improve efficiency over tests that neglect the heterogeneity of the cure fraction under the alternative. The practical utility of the methodology is illustrated using ovarian cancer survival data with long-term follow-up from the surveillance, epidemiology, and end results registry. © 2016, The International Biometric Society.
New Methodology for Estimating Fuel Economy by Vehicle Class
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, Shih-Miao; Dabbs, Kathryn; Hwang, Ho-Ling
2011-01-01
Office of Highway Policy Information to develop a new methodology to generate annual estimates of average fuel efficiency and number of motor vehicles registered by vehicle class for Table VM-1 of the Highway Statistics annual publication. This paper describes the new methodology developed under this effort and compares the results of the existing manual method and the new systematic approach. The methodology developed under this study takes a two-step approach. First, the preliminary fuel efficiency rates are estimated based on vehicle stock models for different classes of vehicles. Then, a reconciliation model is used to adjust the initial fuel consumptionmore » rates from the vehicle stock models and match the VMT information for each vehicle class and the reported total fuel consumption. This reconciliation model utilizes a systematic approach that produces documentable and reproducible results. The basic framework utilizes a mathematical programming formulation to minimize the deviations between the fuel economy estimates published in the previous year s Highway Statistics and the results from the vehicle stock models, subject to the constraint that fuel consumptions for different vehicle classes must sum to the total fuel consumption estimate published in Table MF-21 of the current year Highway Statistics. The results generated from this new approach provide a smoother time series for the fuel economies by vehicle class. It also utilizes the most up-to-date and best available data with sound econometric models to generate MPG estimates by vehicle class.« less
ZERODUR strength modeling with Weibull statistical distributions
NASA Astrophysics Data System (ADS)
Hartmann, Peter
2016-07-01
The decisive influence on breakage strength of brittle materials such as the low expansion glass ceramic ZERODUR is the surface condition. For polished or etched surfaces it is essential if micro cracks are present and how deep they are. Ground surfaces have many micro cracks caused by the generation process. Here only the depths of the micro cracks are relevant. In any case presence and depths of micro cracks are statistical by nature. The Weibull distribution is the model used traditionally for the representation of such data sets. It is based on the weakest link ansatz. The use of the two or three parameter Weibull distribution for data representation and reliability prediction depends on the underlying crack generation mechanisms. Before choosing the model for a specific evaluation, some checks should be done. Is there only one mechanism present or is it to be expected that an additional mechanism might contribute deviating results? For ground surfaces the main mechanism is the diamond grains' action on the surface. However, grains breaking from their bonding might be moved by the tool across the surface introducing a slightly deeper crack. It is not to be expected that these scratches follow the same statistical distribution as the grinding process. Hence, their description with the same distribution parameters is not adequate. Before including them a dedicated discussion should be performed. If there is additional information available influencing the selection of the model, for example the existence of a maximum crack depth, this should be taken into account also. Micro cracks introduced by small diamond grains on tools working with limited forces cannot be arbitrarily deep. For data obtained with such surfaces the existence of a threshold breakage stress should be part of the hypothesis. This leads to the use of the three parameter Weibull distribution. A differentiation based on the data set alone without preexisting information is possible but requires a large data set. With only 20 specimens per sample such differentiation is not possible. This requires 100 specimens per set, the more the better. The validity of the statistical evaluation methods is discussed with several examples. These considerations are of special importance because of their consequences on the prognosis methods and results. Especially the use of the two parameter Weibull distribution for high strength surfaces has led to non-realistic results. Extrapolation down to low acceptable probability of failure covers a wide range without data points existing and is mainly influenced by the slope determined by the high strength specimens. In the past this misconception has prevented the use of brittle materials for stress loads, which they could have endured easily.
Humidity-corrected Arrhenius equation: The reference condition approach.
Naveršnik, Klemen; Jurečič, Rok
2016-03-16
Accelerated and stress stability data is often used to predict shelf life of pharmaceuticals. Temperature, combined with humidity accelerates chemical decomposition and the Arrhenius equation is used to extrapolate accelerated stability results to long-term stability. Statistical estimation of the humidity-corrected Arrhenius equation is not straightforward due to its non-linearity. A two stage nonlinear fitting approach is used in practice, followed by a prediction stage. We developed a single-stage statistical procedure, called the reference condition approach, which has better statistical properties (less collinearity, direct estimation of uncertainty, narrower prediction interval) and is significantly easier to use, compared to the existing approaches. Our statistical model was populated with data from a 35-day stress stability study on a laboratory batch of vitamin tablets and required mere 30 laboratory assay determinations. The stability prediction agreed well with the actual 24-month long term stability of the product. The approach has high potential to assist product formulation, specification setting and stability statements. Copyright © 2016 Elsevier B.V. All rights reserved.
Scharfenberger, Christian; Wong, Alexander; Clausi, David A
2015-01-01
We propose a simple yet effective structure-guided statistical textural distinctiveness approach to salient region detection. Our method uses a multilayer approach to analyze the structural and textural characteristics of natural images as important features for salient region detection from a scale point of view. To represent the structural characteristics, we abstract the image using structured image elements and extract rotational-invariant neighborhood-based textural representations to characterize each element by an individual texture pattern. We then learn a set of representative texture atoms for sparse texture modeling and construct a statistical textural distinctiveness matrix to determine the distinctiveness between all representative texture atom pairs in each layer. Finally, we determine saliency maps for each layer based on the occurrence probability of the texture atoms and their respective statistical textural distinctiveness and fuse them to compute a final saliency map. Experimental results using four public data sets and a variety of performance evaluation metrics show that our approach provides promising results when compared with existing salient region detection approaches.
A BRDF statistical model applying to space target materials modeling
NASA Astrophysics Data System (ADS)
Liu, Chenghao; Li, Zhi; Xu, Can; Tian, Qichen
2017-10-01
In order to solve the problem of poor effect in modeling the large density BRDF measured data with five-parameter semi-empirical model, a refined statistical model of BRDF which is suitable for multi-class space target material modeling were proposed. The refined model improved the Torrance-Sparrow model while having the modeling advantages of five-parameter model. Compared with the existing empirical model, the model contains six simple parameters, which can approximate the roughness distribution of the material surface, can approximate the intensity of the Fresnel reflectance phenomenon and the attenuation of the reflected light's brightness with the azimuth angle changes. The model is able to achieve parameter inversion quickly with no extra loss of accuracy. The genetic algorithm was used to invert the parameters of 11 different samples in the space target commonly used materials, and the fitting errors of all materials were below 6%, which were much lower than those of five-parameter model. The effect of the refined model is verified by comparing the fitting results of the three samples at different incident zenith angles in 0° azimuth angle. Finally, the three-dimensional modeling visualizations of these samples in the upper hemisphere space was given, in which the strength of the optical scattering of different materials could be clearly shown. It proved the good describing ability of the refined model at the material characterization as well.
So, Rita; Teakles, Andrew; Baik, Jonathan; Vingarzan, Roxanne; Jones, Keith
2018-05-01
Visibility degradation, one of the most noticeable indicators of poor air quality, can occur despite relatively low levels of particulate matter when the risk to human health is low. The availability of timely and reliable visibility forecasts can provide a more comprehensive understanding of the anticipated air quality conditions to better inform local jurisdictions and the public. This paper describes the development of a visibility forecasting modeling framework, which leverages the existing air quality and meteorological forecasts from Canada's operational Regional Air Quality Deterministic Prediction System (RAQDPS) for the Lower Fraser Valley of British Columbia. A baseline model (GM-IMPROVE) was constructed using the revised IMPROVE algorithm based on unprocessed forecasts from the RAQDPS. Three additional prototypes (UMOS-HYB, GM-MLR, GM-RF) were also developed and assessed for forecast performance of up to 48 hr lead time during various air quality and meteorological conditions. Forecast performance was assessed by examining their ability to provide both numerical and categorical forecasts in the form of 1-hr total extinction and Visual Air Quality Ratings (VAQR), respectively. While GM-IMPROVE generally overestimated extinction more than twofold, it had skill in forecasting the relative species contribution to visibility impairment, including ammonium sulfate and ammonium nitrate. Both statistical prototypes, GM-MLR and GM-RF, performed well in forecasting 1-hr extinction during daylight hours, with correlation coefficients (R) ranging from 0.59 to 0.77. UMOS-HYB, a prototype based on postprocessed air quality forecasts without additional statistical modeling, provided reasonable forecasts during most daylight hours. In terms of categorical forecasts, the best prototype was approximately 75 to 87% correct, when forecasting for a condensed three-category VAQR. A case study, focusing on a poor visual air quality yet low Air Quality Health Index episode, illustrated that the statistical prototypes were able to provide timely and skillful visibility forecasts with lead time up to 48 hr. This study describes the development of a visibility forecasting modeling framework, which leverages the existing air quality and meteorological forecasts from Canada's operational Regional Air Quality Deterministic Prediction System. The main applications include tourism and recreation planning, input into air quality management programs, and educational outreach. Visibility forecasts, when supplemented with the existing air quality and health based forecasts, can assist jurisdictions to anticipate the visual air quality impacts as perceived by the public, which can potentially assist in formulating the appropriate air quality bulletins and recommendations.
Landslide Susceptibility Statistical Methods: A Critical and Systematic Literature Review
NASA Astrophysics Data System (ADS)
Mihir, Monika; Malamud, Bruce; Rossi, Mauro; Reichenbach, Paola; Ardizzone, Francesca
2014-05-01
Landslide susceptibility assessment, the subject of this systematic review, is aimed at understanding the spatial probability of slope failures under a set of geomorphological and environmental conditions. It is estimated that about 375 landslides that occur globally each year are fatal, with around 4600 people killed per year. Past studies have brought out the increasing cost of landslide damages which primarily can be attributed to human occupation and increased human activities in the vulnerable environments. Many scientists, to evaluate and reduce landslide risk, have made an effort to efficiently map landslide susceptibility using different statistical methods. In this paper, we do a critical and systematic landslide susceptibility literature review, in terms of the different statistical methods used. For each of a broad set of studies reviewed we note: (i) study geography region and areal extent, (ii) landslide types, (iii) inventory type and temporal period covered, (iv) mapping technique (v) thematic variables used (vi) statistical models, (vii) assessment of model skill, (viii) uncertainty assessment methods, (ix) validation methods. We then pulled out broad trends within our review of landslide susceptibility, particularly regarding the statistical methods. We found that the most common statistical methods used in the study of landslide susceptibility include logistic regression, artificial neural network, discriminant analysis and weight of evidence. Although most of the studies we reviewed assessed the model skill, very few assessed model uncertainty. In terms of geographic extent, the largest number of landslide susceptibility zonations were in Turkey, Korea, Spain, Italy and Malaysia. However, there are also many landslides and fatalities in other localities, particularly India, China, Philippines, Nepal and Indonesia, Guatemala, and Pakistan, where there are much fewer landslide susceptibility studies available in the peer-review literature. This raises some concern that existing studies do not always cover all the regions globally that currently experience landslides and landslide fatalities.
Software cost/resource modeling: Deep space network software cost estimation model
NASA Technical Reports Server (NTRS)
Tausworthe, R. J.
1980-01-01
A parametric software cost estimation model prepared for JPL deep space network (DSN) data systems implementation tasks is presented. The resource estimation model incorporates principles and data from a number of existing models, such as those of the General Research Corporation, Doty Associates, IBM (Walston-Felix), Rome Air Force Development Center, University of Maryland, and Rayleigh-Norden-Putnam. The model calibrates task magnitude and difficulty, development environment, and software technology effects through prompted responses to a set of approximately 50 questions. Parameters in the model are adjusted to fit JPL software lifecycle statistics. The estimation model output scales a standard DSN work breakdown structure skeleton, which is then input to a PERT/CPM system, producing a detailed schedule and resource budget for the project being planned.
The analysis of morphometric data on rocky mountain wolves and artic wolves using statistical method
NASA Astrophysics Data System (ADS)
Ammar Shafi, Muhammad; Saifullah Rusiman, Mohd; Hamzah, Nor Shamsidah Amir; Nor, Maria Elena; Ahmad, Noor’ani; Azia Hazida Mohamad Azmi, Nur; Latip, Muhammad Faez Ab; Hilmi Azman, Ahmad
2018-04-01
Morphometrics is a quantitative analysis depending on the shape and size of several specimens. Morphometric quantitative analyses are commonly used to analyse fossil record, shape and size of specimens and others. The aim of the study is to find the differences between rocky mountain wolves and arctic wolves based on gender. The sample utilised secondary data which included seven variables as independent variables and two dependent variables. Statistical modelling was used in the analysis such was the analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). The results showed there exist differentiating results between arctic wolves and rocky mountain wolves based on independent factors and gender.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marekova, Elisaveta
Series of relatively large earthquakes in different regions of the Earth are studied. The regions chooses are of a high seismic activity and has a good contemporary network for recording of the seismic events along them. The main purpose of this investigation is the attempt to describe analytically the seismic process in the space and time. We are considering the statistical distributions the distances and the times between consecutive earthquakes (so called pair analysis). Studies conducted on approximating the statistical distribution of the parameters of consecutive seismic events indicate the existence of characteristic functions that describe them best. Such amore » mathematical description allows the distributions of the examined parameters to be compared to other model distributions.« less
A Method for Retrieving Ground Flash Fraction from Satellite Lightning Imager Data
NASA Technical Reports Server (NTRS)
Koshak, William J.
2009-01-01
A general theory for retrieving the fraction of ground flashes in N lightning observed by a satellite-based lightning imager is provided. An "exponential model" is applied as a physically reasonable constraint to describe the measured optical parameter distributions, and population statistics (i.e., mean, variance) are invoked to add additional constraints to the retrieval process. The retrieval itself is expressed in terms of a Bayesian inference, and the Maximum A Posteriori (MAP) solution is obtained. The approach is tested by performing simulated retrievals, and retrieval error statistics are provided. The ability to retrieve ground flash fraction has important benefits to the atmospheric chemistry community. For example, using the method to partition the existing satellite global lightning climatology into separate ground and cloud flash climatologies will improve estimates of lightning nitrogen oxides (NOx) production; this in turn will improve both regional air quality and global chemistry/climate model predictions.
Multilayer Statistical Intrusion Detection in Wireless Networks
NASA Astrophysics Data System (ADS)
Hamdi, Mohamed; Meddeb-Makhlouf, Amel; Boudriga, Noureddine
2008-12-01
The rapid proliferation of mobile applications and services has introduced new vulnerabilities that do not exist in fixed wired networks. Traditional security mechanisms, such as access control and encryption, turn out to be inefficient in modern wireless networks. Given the shortcomings of the protection mechanisms, an important research focuses in intrusion detection systems (IDSs). This paper proposes a multilayer statistical intrusion detection framework for wireless networks. The architecture is adequate to wireless networks because the underlying detection models rely on radio parameters and traffic models. Accurate correlation between radio and traffic anomalies allows enhancing the efficiency of the IDS. A radio signal fingerprinting technique based on the maximal overlap discrete wavelet transform (MODWT) is developed. Moreover, a geometric clustering algorithm is presented. Depending on the characteristics of the fingerprinting technique, the clustering algorithm permits to control the false positive and false negative rates. Finally, simulation experiments have been carried out to validate the proposed IDS.
Stationarity: Wanted dead or alive?
Lins, H.F.; Cohn, T.A.
2011-01-01
Aligning engineering practice with natural process behavior would appear, on its face, to be a prudent and reasonable course of action. However, if we do not understand the long-term characteristics of hydroclimatic processes, how does one find the prudent and reasonable course needed for water management? We consider this question in light of three aspects of existing and unresolved issues affecting hydroclimatic variability and statistical inference: Hurst-Kolmogorov phenomena; the complications long-term persistence introduces with respect to statistical understanding; and the dependence of process understanding on arbitrary sampling choices. These problems are not easily addressed. In such circumstances, humility may be more important than physics; a simple model with well-understood flaws may be preferable to a sophisticated model whose correspondence to reality is uncertain. ?? 2011 American Water Resources Association. This article is a U.S. Government work and is in the public domain in the USA.
The Apollo 16 regolith - A petrographically-constrained chemical mixing model
NASA Technical Reports Server (NTRS)
Kempa, M. J.; Papike, J. J.; White, C.
1980-01-01
A mixing model for Apollo 16 regolith samples has been developed, which differs from other A-16 mixing models in that it is both petrographically constrained and statistically sound. The model was developed using three components representative of rock types present at the A-16 site, plus a representative mare basalt. A linear least-squares fitting program employing the chi-squared test and sum of components was used to determine goodness of fit. Results for surface soils indicate that either there are no significant differences between Cayley and Descartes material at the A-16 site or, if differences do exist, they have been obscured by meteoritic reworking and mixing of the lithologies.
Cheng, Chui Ling
2016-08-03
Statistical models were developed to estimate natural streamflow under low-flow conditions for streams with existing streamflow data at measurement sites on the Islands of Kauaʻi, Oʻahu, Molokaʻi, Maui, and Hawaiʻi. Streamflow statistics used to describe the low-flow characteristics are flow-duration discharges that are equaled or exceeded between 50 and 95 percent of the time during the 30-year base period 1984–2013. Record-augmentation techniques were applied to develop statistical models relating concurrent streamflow data at the measurement sites and long-term data from nearby continuous-record streamflow-gaging stations that were in operation during the base period and were selected as index stations. Existing data and subsequent low-flow analyses of the available data help to identify streams in under-represented geographic areas and hydrogeologic settings where additional data collection is suggested.Low-flow duration discharges were estimated for 107 measurement sites (including long-term and short-term continuous-record streamflow-gaging stations, and partial-record stations) and 27 index stations. The adequacy of statistical models was evaluated with correlation coefficients and modified Nash-Sutcliff coefficients of efficiency, and a majority of the low-flow duration-discharge estimates are satisfactory based on these regression statistics.Molokaʻi and Hawaiʻi have the fewest number of measurement sites (that are not located on ephemeral stream reaches) at which flow-duration discharges were estimated, which can be partially explained by the limited number of index stations available on these islands that could be used for record augmentation. At measurement sites on some tributary streams, low-flow duration discharges could not be estimated because no adequate correlations could be developed with the index stations. These measurement sites are located on streams where duration-discharge estimates are available at long-term stations at other locations on the main stream channel to provide at least some definition of low-flow characteristics on that stream. In terms of general natural streamflow data availability, data are scarce in the leeward areas for all five islands as many leeward streams are dry or have minimal flow. Other under-represented areas include central Oʻahu, central Maui, and southeastern Maui.
ΛCDM model with dissipative nonextensive viscous dark matter
NASA Astrophysics Data System (ADS)
Gimenes, H. S.; Viswanathan, G. M.; Silva, R.
2018-03-01
Many models in cosmology typically assume the standard bulk viscosity. We study an alternative interpretation for the origin of the bulk viscosity. Using nonadditive statistics proposed by Tsallis, we propose a bulk viscosity component that can only exist by a nonextensive effect through the nonextensive/dissipative correspondence (NexDC). In this paper, we consider a ΛCDM model for a flat universe with a dissipative nonextensive viscous dark matter component, following the Eckart theory of bulk viscosity, without any perturbative approach. In order to analyze cosmological constraints, we use one of the most recent observations of Type Ia Supernova, baryon acoustic oscillations and cosmic microwave background data.
A Framework to Learn Physics from Atomically Resolved Images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vlcek, L.; Maksov, A.; Pan, M.
Here, we present a generalized framework for physics extraction, i.e., knowledge, from atomically resolved images, and show its utility by applying it to a model system of segregation of chalcogen atoms in an FeSe 0.45Te 0.55 superconductor system. We emphasize that the framework can be used for any imaging data for which a generative physical model exists. Consider that a generative physical model can produce a very large number of configurations, not all of which are observable. By applying a microscope function to a sub-set of this generated data, we form a simulated dataset on which statistics can be computed.
From intuition to statistics in building subsurface structural models
Brandenburg, J.P.; Alpak, F.O.; Naruk, S.; Solum, J.
2011-01-01
Experts associated with the oil and gas exploration industry suggest that combining forward trishear models with stochastic global optimization algorithms allows a quantitative assessment of the uncertainty associated with a given structural model. The methodology is applied to incompletely imaged structures related to deepwater hydrocarbon reservoirs and results are compared to prior manual palinspastic restorations and borehole data. This methodology is also useful for extending structural interpretations into other areas of limited resolution, such as subsalt in addition to extrapolating existing data into seismic data gaps. This technique can be used for rapid reservoir appraisal and potentially have other applications for seismic processing, well planning, and borehole stability analysis.
The Galactic Isotropic γ-ray Background and Implications for Dark Matter
NASA Astrophysics Data System (ADS)
Campbell, Sheldon S.; Kwa, Anna; Kaplinghat, Manoj
2018-06-01
We present an analysis of the radial angular profile of the galacto-isotropic (GI) γ-ray flux-the statistically uniform flux in angular annuli centred on the Galactic centre. Two different approaches are used to measure the GI flux profile in 85 months of Fermi-LAT data: the BDS statistical method which identifies spatial correlations, and a new Poisson ordered-pixel method which identifies non-Poisson contributions. Both methods produce similar GI flux profiles. The GI flux profile is well-described by an existing model of bremsstrahlung, π0 production, inverse Compton scattering, and the isotropic background. Discrepancies with data in our full-sky model are not present in the GI component, and are therefore due to mis-modelling of the non-GI emission. Dark matter annihilation constraints based solely on the observed GI profile are close to the thermal WIMP cross section below 100 GeV, for fixed models of the dark matter density profile and astrophysical γ-ray foregrounds. Refined measurements of the GI profile are expected to improve these constraints by a factor of a few.
Preliminary constraints on variable w dark energy cosmologies from the SNLS
NASA Astrophysics Data System (ADS)
Carlberg, R. G.; Conley, A.; Howell, D. A.; Neill, J. D.; Perrett, K.; Pritchet, C. J.; Sullivan, M.
2005-12-01
The first 71 confirmed Ia supernovae from the Supernova Legacy Survey being conducted with CFHT imaging and Gemini, VLT and Keck spectroscopy set limits on variable dark energy cosmological models. For a generalized Chaplygin gas, in which the dark energy content is (1-Ω M)/ρ a, we find that a is statistically consistent with zero, with a best fit a=-0.2±-0.3 (68 systematic errors requires a further refinement of the photometric calibration and the potential model biases. A variable dark energy equation of state with w=w0+w_1 z shows the expected degeneracy between increasingly positive w0 and negative w1. The existing data rule out the parameters of the Weller & Linder (2002) Super-gravity inspired model cosmology (w0,w_1)=(-0.81,0.31). The full 700 Ia of the completed survey will provide a statistical error limit of w1 of about 0.2 and significant constraints on variable w models. The Canadian NSERC provided funding for the scientific analysis. These results are based on observations obtained at the CFHT, Gemini, VLT and Keck observatories.
Shilling Attacks Detection in Recommender Systems Based on Target Item Analysis
Zhou, Wei; Wen, Junhao; Koh, Yun Sing; Xiong, Qingyu; Gao, Min; Dobbie, Gillian; Alam, Shafiq
2015-01-01
Recommender systems are highly vulnerable to shilling attacks, both by individuals and groups. Attackers who introduce biased ratings in order to affect recommendations, have been shown to negatively affect collaborative filtering (CF) algorithms. Previous research focuses only on the differences between genuine profiles and attack profiles, ignoring the group characteristics in attack profiles. In this paper, we study the use of statistical metrics to detect rating patterns of attackers and group characteristics in attack profiles. Another question is that most existing detecting methods are model specific. Two metrics, Rating Deviation from Mean Agreement (RDMA) and Degree of Similarity with Top Neighbors (DegSim), are used for analyzing rating patterns between malicious profiles and genuine profiles in attack models. Building upon this, we also propose and evaluate a detection structure called RD-TIA for detecting shilling attacks in recommender systems using a statistical approach. In order to detect more complicated attack models, we propose a novel metric called DegSim’ based on DegSim. The experimental results show that our detection model based on target item analysis is an effective approach for detecting shilling attacks. PMID:26222882
Numerical solutions of the semiclassical Boltzmann ellipsoidal-statistical kinetic model equation
Yang, Jaw-Yen; Yan, Chin-Yuan; Huang, Juan-Chen; Li, Zhihui
2014-01-01
Computations of rarefied gas dynamical flows governed by the semiclassical Boltzmann ellipsoidal-statistical (ES) kinetic model equation using an accurate numerical method are presented. The semiclassical ES model was derived through the maximum entropy principle and conserves not only the mass, momentum and energy, but also contains additional higher order moments that differ from the standard quantum distributions. A different decoding procedure to obtain the necessary parameters for determining the ES distribution is also devised. The numerical method in phase space combines the discrete-ordinate method in momentum space and the high-resolution shock capturing method in physical space. Numerical solutions of two-dimensional Riemann problems for two configurations covering various degrees of rarefaction are presented and various contours of the quantities unique to this new model are illustrated. When the relaxation time becomes very small, the main flow features a display similar to that of ideal quantum gas dynamics, and the present solutions are found to be consistent with existing calculations for classical gas. The effect of a parameter that permits an adjustable Prandtl number in the flow is also studied. PMID:25104904
TACKETT, JENNIFER L.; BALSIS, STEVE; OLTMANNS, THOMAS F.; KRUEGER, ROBERT F.
2010-01-01
Proposed changes in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-V) include replacing current personality disorder (PD) categories on Axis II with a taxonomy of dimensional maladaptive personality traits. Most of the work on dimensional models of personality pathology, and on personality disorders per se, has been conducted on young and middle-aged adult populations. Numerous questions remain regarding the applicability and limitations of applying various PD models to early and later life. In the present paper, we provide an overview of such dimensional models and review current proposals for conceptualizing PDs in DSM-V. Next, we extensively review existing evidence on the development, measurement, and manifestation of personality pathology in early and later life focusing on those issues deemed most relevant for informing DSM-V. Finally, we present overall conclusions regarding the need to incorporate developmental issues in conceptualizing PDs in DSM-V and highlight the advantages of a dimensional model in unifying PD perspectives across the life span. PMID:19583880
PyEvolve: a toolkit for statistical modelling of molecular evolution.
Butterfield, Andrew; Vedagiri, Vivek; Lang, Edward; Lawrence, Cath; Wakefield, Matthew J; Isaev, Alexander; Huttley, Gavin A
2004-01-05
Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from approximately 10 days to approximately 6 hours. PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used interactively or by writing and executing scripts. The toolkit uses efficient processes for specifying the parameterisation of statistical models, and implements numerous optimisations that make highly parameter rich likelihood functions solvable within hours on multi-cpu hardware. PyEvolve can be readily adapted in response to changing computational demands and hardware configurations to maximise performance. PyEvolve is released under the GPL and can be downloaded from http://cbis.anu.edu.au/software.
NASA Astrophysics Data System (ADS)
Potirakis, Stelios M.; Zitis, Pavlos I.; Eftaxias, Konstantinos
2013-07-01
The field of study of complex systems considers that the dynamics of complex systems are founded on universal principles that may be used to describe a great variety of scientific and technological approaches of different types of natural, artificial, and social systems. Several authors have suggested that earthquake dynamics and the dynamics of economic (financial) systems can be analyzed within similar mathematical frameworks. We apply concepts of the nonextensive statistical physics, on time-series data of observable manifestations of the underlying complex processes ending up with these different extreme events, in order to support the suggestion that a dynamical analogy exists between a financial crisis (in the form of share or index price collapse) and a single earthquake. We also investigate the existence of such an analogy by means of scale-free statistics (the Gutenberg-Richter distribution of event sizes). We show that the populations of: (i) fracto-electromagnetic events rooted in the activation of a single fault, emerging prior to a significant earthquake, (ii) the trade volume events of different shares/economic indices, prior to a collapse, and (iii) the price fluctuation (considered as the difference of maximum minus minimum price within a day) events of different shares/economic indices, prior to a collapse, follow both the traditional Gutenberg-Richter law as well as a nonextensive model for earthquake dynamics, with similar parameter values. The obtained results imply the existence of a dynamic analogy between earthquakes and economic crises, which moreover follow the dynamics of seizures, magnetic storms and solar flares.
Plis, Sergey M; George, J S; Jun, S C; Paré-Blagoev, J; Ranken, D M; Wood, C C; Schmidt, D M
2007-01-01
We propose a new model to approximate spatiotemporal noise covariance for use in neural electromagnetic source analysis, which better captures temporal variability in background activity. As with other existing formalisms, our model employs a Kronecker product of matrices representing temporal and spatial covariance. In our model, spatial components are allowed to have differing temporal covariances. Variability is represented as a series of Kronecker products of spatial component covariances and corresponding temporal covariances. Unlike previous attempts to model covariance through a sum of Kronecker products, our model is designed to have a computationally manageable inverse. Despite increased descriptive power, inversion of the model is fast, making it useful in source analysis. We have explored two versions of the model. One is estimated based on the assumption that spatial components of background noise have uncorrelated time courses. Another version, which gives closer approximation, is based on the assumption that time courses are statistically independent. The accuracy of the structural approximation is compared to an existing model, based on a single Kronecker product, using both Frobenius norm of the difference between spatiotemporal sample covariance and a model, and scatter plots. Performance of ours and previous models is compared in source analysis of a large number of single dipole problems with simulated time courses and with background from authentic magnetoencephalography data.
NASA Astrophysics Data System (ADS)
Kariniotakis, G.; Anemos Team
2003-04-01
Objectives: Accurate forecasting of the wind energy production up to two days ahead is recognized as a major contribution for reliable large-scale wind power integration. Especially, in a liberalized electricity market, prediction tools enhance the position of wind energy compared to other forms of dispatchable generation. ANEMOS, is a new 3.5 years R&D project supported by the European Commission, that resembles research organizations and end-users with an important experience on the domain. The project aims to develop advanced forecasting models that will substantially outperform current methods. Emphasis is given to situations like complex terrain, extreme weather conditions, as well as to offshore prediction for which no specific tools currently exist. The prediction models will be implemented in a software platform and installed for online operation at onshore and offshore wind farms by the end-users participating in the project. Approach: The paper presents the methodology of the project. Initially, the prediction requirements are identified according to the profiles of the end-users. The project develops prediction models based on both a physical and an alternative statistical approach. Research on physical models gives emphasis to techniques for use in complex terrain and the development of prediction tools based on CFD techniques, advanced model output statistics or high-resolution meteorological information. Statistical models (i.e. based on artificial intelligence) are developed for downscaling, power curve representation, upscaling for prediction at regional or national level, etc. A benchmarking process is set-up to evaluate the performance of the developed models and to compare them with existing ones using a number of case studies. The synergy between statistical and physical approaches is examined to identify promising areas for further improvement of forecasting accuracy. Appropriate physical and statistical prediction models are also developed for offshore wind farms taking into account advances in marine meteorology (interaction between wind and waves, coastal effects). The benefits from the use of satellite radar images for modeling local weather patterns are investigated. A next generation forecasting software, ANEMOS, will be developed to integrate the various models. The tool is enhanced by advanced Information Communication Technology (ICT) functionality and can operate both in stand alone, or remote mode, or be interfaced with standard Energy or Distribution Management Systems (EMS/DMS) systems. Contribution: The project provides an advanced technology for wind resource forecasting applicable in a large scale: at a single wind farm, regional or national level and for both interconnected and island systems. A major milestone is the on-line operation of the developed software by the participating utilities for onshore and offshore wind farms and the demonstration of the economic benefits. The outcome of the ANEMOS project will help consistently the increase of wind integration in two levels; in an operational level due to better management of wind farms, but also, it will contribute to increasing the installed capacity of wind farms. This is because accurate prediction of the resource reduces the risk of wind farm developers, who are then more willing to undertake new wind farm installations especially in a liberalized electricity market environment.
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology
Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...
2017-05-15
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
A Modified Mechanical Threshold Stress Constitutive Model for Austenitic Stainless Steels
NASA Astrophysics Data System (ADS)
Prasad, K. Sajun; Gupta, Amit Kumar; Singh, Yashjeet; Singh, Swadesh Kumar
2016-12-01
This paper presents a modified mechanical threshold stress (m-MTS) constitutive model. The m-MTS model incorporates variable athermal and dynamic strain aging (DSA) Components to accurately predict the flow stress behavior of austenitic stainless steels (ASS)-316 and 304. Under strain rate variations between 0.01-0.0001 s-1, uniaxial tensile tests were conducted at temperatures ranging from 50-650 °C to evaluate the material constants of constitutive models. The test results revealed the high dependence of flow stress on strain, strain rate and temperature. In addition, it was observed that DSA occurred at elevated temperatures and very low strain rates, causing an increase in flow stress. While the original MTS model is capable of predicting the flow stress behavior for ASS, statistical parameters point out the inefficiency of the model when compared to other models such as Johnson Cook model, modified Zerilli-Armstrong (m-ZA) model, and modified Arrhenius-type equations (m-Arr). Therefore, in order to accurately model both the DSA and non-DSA regimes, the original MTS model was modified by incorporating variable athermal and DSA components. The suitability of the m-MTS model was assessed by comparing the statistical parameters. It was observed that the m-MTS model was highly accurate for the DSA regime when compared to the existing models. However, models like m-ZA and m-Arr showed better results for the non-DSA regime.
Statistical Analysis of the Impacts of Regional Transportation on the Air Quality in Beijing
NASA Astrophysics Data System (ADS)
Huang, Zhongwen; Zhang, Huiling; Tong, Lei; Xiao, Hang
2016-04-01
From October to December 2015, Beijing-Tianjin-Hebei (BTH) region had experienced several severe haze events. In order to assess the effects of the regional transportation on the air quality in Beijing, the air monitoring data (PM2.5, SO2, NO2 and CO) from that period published by Chinese National Environmental Monitoring Center (CNEMC) was collected and analyzed with various statistical models. The cities within BTH area were clustered into three groups according to the geographical conditions, while the air pollutant concentrations of cities within a group sharing similar variation trends. The Granger causality test results indicate that significant causal relationships exist between the air pollutant data of Beijing and its surrounding cities (Baoding, Chengde, Tianjin and Zhangjiakou) for the reference period. Then, linear regression models were constructed to capture the interdependency among the multiple time series. It shows that the observed air pollutant concentrations in Beijing were well consistent with the model-fitted results. More importantly, further analysis suggests that the air pollutants in Beijing were strongly affected by regional transportation, as the local sources only contributed 17.88%, 27.12%, 14.63% and 31.36% of PM2.5, SO2, NO2 and CO concentrations, respectively. And the major foreign source for Beijing was from Southwest (Baoding) direction, account for more than 42% of all these air pollutants. Thus, by combining various statistical models, it may not only be able to quickly predict the air qualities of any cities on a regional scale, but also to evaluate the local and regional source contributions for a particular city. Key words: regional transportation, air pollution, Granger causality test, statistical models
DOE Office of Scientific and Technical Information (OSTI.GOV)
McManamay, Ryan A
2014-01-01
Despite the ubiquitous existence of dams within riverscapes, much of our knowledge about dams and their environmental effects remains context-specific. Hydrology, more than any other environmental variable, has been studied in great detail with regard to dam regulation. While much progress has been made in generalizing the hydrologic effects of regulation by large dams, many aspects of hydrology show site-specific fidelity to dam operations, small dams (including diversions), and regional hydrologic regimes. A statistical modeling framework is presented to quantify and generalize hydrologic responses to varying degrees of dam regulation. Specifically, the objectives were to 1) compare the effects ofmore » local versus cumulative dam regulation, 2) determine the importance of different regional hydrologic regimes in influencing hydrologic responses to dams, and 3) evaluate how different regulation contexts lead to error in predicting hydrologic responses to dams. Overall, model performance was poor in quantifying the magnitude of hydrologic responses, but performance was sufficient in classifying hydrologic responses as negative or positive. Responses of some hydrologic indices to dam regulation were highly dependent upon hydrologic class membership and the purpose of the dam. The opposing coefficients between local and cumulative-dam predictors suggested that hydrologic responses to cumulative dam regulation are complex, and predicting the hydrology downstream of individual dams, as opposed to multiple dams, may be more easy accomplished using statistical approaches. Results also suggested that particular contexts, including multipurpose dams, high cumulative regulation by multiple dams, diversions, close proximity to dams, and certain hydrologic classes are all sources of increased error when predicting hydrologic responses to dams. Statistical models, such as the ones presented herein, show promise in their ability to model the effects of dam regulation effects at large spatial scales as to generalize the directionality of hydrologic responses.« less
September Arctic Sea Ice minimum prediction - a new skillful statistical approach
NASA Astrophysics Data System (ADS)
Ionita-Scholz, Monica; Grosfeld, Klaus; Scholz, Patrick; Treffeisen, Renate; Lohmann, Gerrit
2017-04-01
Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability is complex and it depends on various climate and oceanic parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal of sea ice evolution, we developed a robust statistical model based on ocean heat content, sea surface temperature and different atmospheric variables to calculate an estimate of the September Sea ice extent (SSIE) on monthly time scale. Although previous statistical attempts at monthly/seasonal forecasts of SSIE show a relatively reduced skill, we show here that more than 92% (r = 0.96) of the September sea ice extent can be predicted at the end of May by using previous months' climate and oceanic conditions. The skill of the model increases with a decrease in the time lag used for the forecast. At the end of August, our predictions are even able to explain 99% of the SSIE. Our statistical model captures both the general trend as well as the interannual variability of the SSIE. Moreover, it is able to properly forecast the years with extreme high/low SSIE (e.g. 1996/ 2007, 2012, 2013). Besides its forecast skill for SSIE, the model could provide a valuable tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.
NASA Astrophysics Data System (ADS)
Baek, Seung Ki; Um, Jaegon; Yi, Su Do; Kim, Beom Jun
2011-11-01
In a number of classical statistical-physical models, there exists a characteristic dimensionality called the upper critical dimension above which one observes the mean-field critical behavior. Instead of constructing high-dimensional lattices, however, one can also consider infinite-dimensional structures, and the question is whether this mean-field character extends to quantum-mechanical cases as well. We therefore investigate the transverse-field quantum Ising model on the globally coupled network and on the Watts-Strogatz small-world network by means of quantum Monte Carlo simulations and the finite-size scaling analysis. We confirm that both of the structures exhibit critical behavior consistent with the mean-field description. In particular, we show that the existing cumulant method has difficulty in estimating the correct dynamic critical exponent and suggest that an order parameter based on the quantum-mechanical expectation value can be a practically useful numerical observable to determine critical behavior when there is no well-defined dimensionality.
Analysis of Magnitude Correlations in a Self-Similar model of Seismicity
NASA Astrophysics Data System (ADS)
Zambrano, A.; Joern, D.
2017-12-01
A recent model of seismicity that incorporates a self-similar Omori-Utsu relation, which is used to describe the temporal evolution of earthquake triggering, has been shown to provide a more accurate description of seismicity in Southern California when compared to epidemic type aftershock sequence models. Forecasting of earthquakes is an active research area where one of the debated points is whether magnitude correlations of earthquakes exist within real world seismic data. Prior to this work, the analysis of magnitude correlations of the aforementioned self-similar model had not been addressed. Here we present statistical properties of the magnitude correlations for the self-similar model along with an analytical analysis of the branching ratio and criticality parameters.
Aalto, Juha; Harrison, Stephan; Luoto, Miska
2017-09-11
The periglacial realm is a major part of the cryosphere, covering a quarter of Earth's land surface. Cryogenic land surface processes (LSPs) control landscape development, ecosystem functioning and climate through biogeochemical feedbacks, but their response to contemporary climate change is unclear. Here, by statistically modelling the current and future distributions of four major LSPs unique to periglacial regions at fine scale, we show fundamental changes in the periglacial climate realm are inevitable with future climate change. Even with the most optimistic CO 2 emissions scenario (Representative Concentration Pathway (RCP) 2.6) we predict a 72% reduction in the current periglacial climate realm by 2050 in our climatically sensitive northern Europe study area. These impacts are projected to be especially severe in high-latitude continental interiors. We further predict that by the end of the twenty-first century active periglacial LSPs will exist only at high elevations. These results forecast a future tipping point in the operation of cold-region LSP, and predict fundamental landscape-level modifications in ground conditions and related atmospheric feedbacks.Cryogenic land surface processes characterise the periglacial realm and control landscape development and ecosystem functioning. Here, via statistical modelling, the authors predict a 72% reduction of the periglacial realm in Northern Europe by 2050, and almost complete disappearance by 2100.
Likelihoods for fixed rank nomination networks
HOFF, PETER; FOSDICK, BAILEY; VOLFOVSKY, ALEX; STOVEL, KATHERINE
2014-01-01
Many studies that gather social network data use survey methods that lead to censored, missing, or otherwise incomplete information. For example, the popular fixed rank nomination (FRN) scheme, often used in studies of schools and businesses, asks study participants to nominate and rank at most a small number of contacts or friends, leaving the existence of other relations uncertain. However, most statistical models are formulated in terms of completely observed binary networks. Statistical analyses of FRN data with such models ignore the censored and ranked nature of the data and could potentially result in misleading statistical inference. To investigate this possibility, we compare Bayesian parameter estimates obtained from a likelihood for complete binary networks with those obtained from likelihoods that are derived from the FRN scheme, and therefore accommodate the ranked and censored nature of the data. We show analytically and via simulation that the binary likelihood can provide misleading inference, particularly for certain model parameters that relate network ties to characteristics of individuals and pairs of individuals. We also compare these different likelihoods in a data analysis of several adolescent social networks. For some of these networks, the parameter estimates from the binary and FRN likelihoods lead to different conclusions, indicating the importance of analyzing FRN data with a method that accounts for the FRN survey design. PMID:25110586
[Micro-simulation of firms' heterogeneity on pollution intensity and regional characteristics].
Zhao, Nan; Liu, Yi; Chen, Ji-Ning
2009-11-01
In the same industrial sector, heterogeneity of pollution intensity exists among firms. There are some errors if using sector's average pollution intensity, which are calculated by limited number of firms in environmental statistic database to represent the sector's regional economic-environmental status. Based on the production function which includes environmental depletion as input, a micro-simulation model on firms' operational decision making is proposed. Then the heterogeneity of firms' pollution intensity can be mechanically described. Taking the mechanical manufacturing sector in Deyang city, 2005 as the case, the model's parameters were estimated. And the actual COD emission intensities of environmental statistic firms can be properly matched by the simulation. The model's results also show that the regional average COD emission intensity calculated by the environmental statistic firms (0.002 6 t per 10 000 yuan fixed asset, 0.001 5 t per 10 000 yuan production value) is lower than the regional average intensity calculated by all the firms in the region (0.003 0 t per 10 000 yuan fixed asset, 0.002 3 t per 10 000 yuan production value). The difference among average intensities in the six counties is significant as well. These regional characteristics of pollution intensity attribute to the sector's inner-structure (firms' scale distribution, technology distribution) and its spatial deviation.
A statistical pixel intensity model for segmentation of confocal laser scanning microscopy images.
Calapez, Alexandre; Rosa, Agostinho
2010-09-01
Confocal laser scanning microscopy (CLSM) has been widely used in the life sciences for the characterization of cell processes because it allows the recording of the distribution of fluorescence-tagged macromolecules on a section of the living cell. It is in fact the cornerstone of many molecular transport and interaction quantification techniques where the identification of regions of interest through image segmentation is usually a required step. In many situations, because of the complexity of the recorded cellular structures or because of the amounts of data involved, image segmentation either is too difficult or inefficient to be done by hand and automated segmentation procedures have to be considered. Given the nature of CLSM images, statistical segmentation methodologies appear as natural candidates. In this work we propose a model to be used for statistical unsupervised CLSM image segmentation. The model is derived from the CLSM image formation mechanics and its performance is compared to the existing alternatives. Results show that it provides a much better description of the data on classes characterized by their mean intensity, making it suitable not only for segmentation methodologies with known number of classes but also for use with schemes aiming at the estimation of the number of classes through the application of cluster selection criteria.
[The metrology of uncertainty: a study of vital statistics from Chile and Brazil].
Carvajal, Yuri; Kottow, Miguel
2012-11-01
This paper addresses the issue of uncertainty in the measurements used in public health analysis and decision-making. The Shannon-Wiener entropy measure was adapted to express the uncertainty contained in counting causes of death in official vital statistics from Chile. Based on the findings, the authors conclude that metrological requirements in public health are as important as the measurements themselves. The study also considers and argues for the existence of uncertainty associated with the statistics' performative properties, both by the way the data are structured as a sort of syntax of reality and by exclusion of what remains beyond the quantitative modeling used in each case. Following the legacy of pragmatic thinking and using conceptual tools from the sociology of translation, the authors emphasize that by taking uncertainty into account, public health can contribute to a discussion on the relationship between technology, democracy, and formation of a participatory public.
A high-fidelity weather time series generator using the Markov Chain process on a piecewise level
NASA Astrophysics Data System (ADS)
Hersvik, K.; Endrerud, O.-E. V.
2017-12-01
A method is developed for generating a set of unique weather time-series based on an existing weather series. The method allows statistically valid weather variations to take place within repeated simulations of offshore operations. The numerous generated time series need to share the same statistical qualities as the original time series. Statistical qualities here refer mainly to the distribution of weather windows available for work, including durations and frequencies of such weather windows, and seasonal characteristics. The method is based on the Markov chain process. The core new development lies in how the Markov Process is used, specifically by joining small pieces of random length time series together rather than joining individual weather states, each from a single time step, which is a common solution found in the literature. This new Markov model shows favorable characteristics with respect to the requirements set forth and all aspects of the validation performed.
Effects of Nongray Opacity on Radiatively Driven Wolf-Rayet Winds
NASA Astrophysics Data System (ADS)
Onifer, A. J.; Gayley, K. G.
2002-05-01
Wolf-Rayet winds are characterized by their large momentum fluxes, and simulations of radiation driving have been increasingly successful in modeling these winds. Simple analytic approaches that help understand the most critical processes for copious momentum deposition already exist in the effectively gray approximation, but these have not been extended to more realistic nongray opacities. With this in mind, we have developed a simplified theory for describing the interaction of the stellar flux with nongray wind opacity. We replace the detailed line list with a set of statistical parameters that are sensitive not only to the strength but also the wavelength distribution of lines, incorporating as a free parameter the rate of photon frequency redistribution. We label the resulting flux-weighted opacity the statistical Sobolev- Rosseland (SSR) mean, and explore how changing these various statistical parameters affects the flux/opacity interaction. We wish to acknowledge NSF grant AST-0098155
An issue of literacy on pediatric arterial hypertension
NASA Astrophysics Data System (ADS)
Teodoro, M. Filomena; Romana, Andreia; Simão, Carla
2017-11-01
Arterial hypertension in pediatric age is a public health problem, whose prevalence has increased significantly over time. Pediatric arterial hypertension (PAH) is under-diagnosed in most cases, a highly prevalent disease, appears without notice with multiple consequences on the children's health and future adults. Children caregivers and close family must know the PAH existence, the negative consequences associated with it, the risk factors and, finally, must do prevention. In [12, 13] can be found a statistical data analysis using a simpler questionnaire introduced in [4] under the aim of a preliminary study about PAH caregivers acquaintance. A continuation of such analysis is detailed in [14]. An extension of such questionnaire was built and applied to a distinct population and it was filled online. The statistical approach is partially reproduced in the present work. Some statistical models were estimated using several approaches, namely multivariate analysis (factorial analysis), also adequate methods to analyze the kind of data in study.
Yamagata, Koichi; Yamanishi, Ayako; Kokubu, Chikara; Takeda, Junji; Sese, Jun
2016-05-05
An important challenge in cancer genomics is precise detection of structural variations (SVs) by high-throughput short-read sequencing, which is hampered by the high false discovery rates of existing analysis tools. Here, we propose an accurate SV detection method named COSMOS, which compares the statistics of the mapped read pairs in tumor samples with isogenic normal control samples in a distinct asymmetric manner. COSMOS also prioritizes the candidate SVs using strand-specific read-depth information. Performance tests on modeled tumor genomes revealed that COSMOS outperformed existing methods in terms of F-measure. We also applied COSMOS to an experimental mouse cell-based model, in which SVs were induced by genome engineering and gamma-ray irradiation, followed by polymerase chain reaction-based confirmation. The precision of COSMOS was 84.5%, while the next best existing method was 70.4%. Moreover, the sensitivity of COSMOS was the highest, indicating that COSMOS has great potential for cancer genome analysis. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Photon Strength Function at Low Energies in 95Mo
Wiedeking, M.; Bernstein, L. A.; Allmond, J. M.; ...
2014-05-01
A new and model-independent experimental method has been developed to determine the energy dependence of the photon strength function. It is designed to study statistical feeding from the quasi continuum to individual low-lying discrete levels. This new technique is presented and results for 95Mo are compared to data from the University of Oslo. In particular, questions regarding the existence of the low-energy enhancement in the photon strength function are addressed.
A Study of Selected Issues in Military Construction and Base Operating Support
1986-12-01
and poten- tial, of data reported annually on the readiness of Navy shore base facilities * Development of a statistical model for forecasting the...of replacing or modernizing existing facilities * Study of the set of activities--including retail supply operations, bachelor housing, automated data ...until data from FY 1986 can be obtained and I SECTION 2 CAPITAL FACILITIES : AGING, REPLACEMENT, AND CHANGE OVER TIME During development of the POM, OP
Statistical inference and Aristotle's Rhetoric.
Macdonald, Ranald R
2004-11-01
Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.
Wang, Yikai; Kang, Jian; Kemmer, Phebe B.; Guo, Ying
2016-01-01
Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package “DensParcorr” can be downloaded from CRAN for implementing the proposed statistical methods. PMID:27242395
Wang, Yikai; Kang, Jian; Kemmer, Phebe B; Guo, Ying
2016-01-01
Currently, network-oriented analysis of fMRI data has become an important tool for understanding brain organization and brain networks. Among the range of network modeling methods, partial correlation has shown great promises in accurately detecting true brain network connections. However, the application of partial correlation in investigating brain connectivity, especially in large-scale brain networks, has been limited so far due to the technical challenges in its estimation. In this paper, we propose an efficient and reliable statistical method for estimating partial correlation in large-scale brain network modeling. Our method derives partial correlation based on the precision matrix estimated via Constrained L1-minimization Approach (CLIME), which is a recently developed statistical method that is more efficient and demonstrates better performance than the existing methods. To help select an appropriate tuning parameter for sparsity control in the network estimation, we propose a new Dens-based selection method that provides a more informative and flexible tool to allow the users to select the tuning parameter based on the desired sparsity level. Another appealing feature of the Dens-based method is that it is much faster than the existing methods, which provides an important advantage in neuroimaging applications. Simulation studies show that the Dens-based method demonstrates comparable or better performance with respect to the existing methods in network estimation. We applied the proposed partial correlation method to investigate resting state functional connectivity using rs-fMRI data from the Philadelphia Neurodevelopmental Cohort (PNC) study. Our results show that partial correlation analysis removed considerable between-module marginal connections identified by full correlation analysis, suggesting these connections were likely caused by global effects or common connection to other nodes. Based on partial correlation, we find that the most significant direct connections are between homologous brain locations in the left and right hemisphere. When comparing partial correlation derived under different sparse tuning parameters, an important finding is that the sparse regularization has more shrinkage effects on negative functional connections than on positive connections, which supports previous findings that many of the negative brain connections are due to non-neurophysiological effects. An R package "DensParcorr" can be downloaded from CRAN for implementing the proposed statistical methods.
An Accident Precursor Analysis Process Tailored for NASA Space Systems
NASA Technical Reports Server (NTRS)
Groen, Frank; Stamatelatos, Michael; Dezfuli, Homayoon; Maggio, Gaspare
2010-01-01
Accident Precursor Analysis (APA) serves as the bridge between existing risk modeling activities, which are often based on historical or generic failure statistics, and system anomalies, which provide crucial information about the failure mechanisms that are actually operative in the system and which may differ in frequency or type from those in the various models. These discrepancies between the models (perceived risk) and the system (actual risk) provide the leading indication of an underappreciated risk. This paper presents an APA process developed specifically for NASA Earth-to-Orbit space systems. The purpose of the process is to identify and characterize potential sources of system risk as evidenced by anomalous events which, although not necessarily presenting an immediate safety impact, may indicate that an unknown or insufficiently understood risk-significant condition exists in the system. Such anomalous events are considered accident precursors because they signal the potential for severe consequences that may occur in the future, due to causes that are discernible from their occurrence today. Their early identification allows them to be integrated into the overall system risk model used to intbrm decisions relating to safety.
Optimizing fixed observational assets in a coastal observatory
NASA Astrophysics Data System (ADS)
Frolov, Sergey; Baptista, António; Wilkin, Michael
2008-11-01
Proliferation of coastal observatories necessitates an objective approach to managing of observational assets. In this article, we used our experience in the coastal observatory for the Columbia River estuary and plume to identify and address common problems in managing of fixed observational assets, such as salinity, temperature, and water level sensors attached to pilings and moorings. Specifically, we addressed the following problems: assessing the quality of an existing array, adding stations to an existing array, removing stations from an existing array, validating an array design, and targeting of an array toward data assimilation or monitoring. Our analysis was based on a combination of methods from oceanographic and statistical literature, mainly on the statistical machinery of the best linear unbiased estimator. The key information required for our analysis was the covariance structure for a field of interest, which was computed from the output of assimilated and non-assimilated models of the Columbia River estuary and plume. The network optimization experiments in the Columbia River estuary and plume proved to be successful, largely withstanding the scrutiny of sensitivity and validation studies, and hence providing valuable insight into optimization and operation of the existing observational network. Our success in the Columbia River estuary and plume suggest that algorithms for optimal placement of sensors are reaching maturity and are likely to play a significant role in the design of emerging ocean observatories, such as the United State's ocean observation initiative (OOI) and integrated ocean observing system (IOOS) observatories, and smaller regional observatories.
Potentiation Following Ballistic and Nonballistic Complexes: The Effect of Strength Level.
Suchomel, Timothy J; Sato, Kimitake; DeWeese, Brad H; Ebben, William P; Stone, Michael H
2016-07-01
Suchomel, TJ, Sato, K, DeWeese, BH, Ebben, WP, and Stone, MH. Potentiation following ballistic and nonballistic complexes: the effect of strength level. J Strength Cond Res 30(7): 1825-1833, 2016-The purpose of this study was to compare the temporal profile of strong and weak subjects during ballistic and nonballistic potentiation complexes. Eight strong (relative back squat = 2.1 ± 0.1 times body mass) and 8 weak (relative back squat = 1.6 ± 0.2 times body mass) males performed squat jumps immediately and every minute up to 10 minutes following potentiation complexes that included ballistic or nonballistic concentric-only half-squat (COHS) performed at 90% of their 1 repetition maximum COHS. Jump height (JH) and allometrically scaled peak power (PPa) were compared using a series of 2 × 12 repeated measures analyses of variance. No statistically significant strength level main effects for JH (p = 0.442) or PPa (p = 0.078) existed during the ballistic condition. In contrast, statistically significant main effects for time existed for both JH (p = 0.014) and PPa (p < 0.001); however, no statistically significant pairwise comparisons were present (p > 0.05). Statistically significant strength level main effects existed for PPa (p = 0.039) but not for JH (p = 0.137) during the nonballistic condition. Post hoc analysis revealed that the strong subjects produced statistically greater PPa than the weaker subjects (p = 0.039). Statistically significant time main effects existed for time existed for PPa (p = 0.015), but not for JH (p = 0.178). No statistically significant strength level × time interaction effects for JH (p = 0.319) or PPa (p = 0.203) were present for the ballistic or nonballistic conditions. Practical significance indicated by effect sizes and the relationships between maximum potentiation and relative strength suggest that stronger subjects potentiate earlier and to a greater extent than weaker subjects during ballistic and nonballistic potentiation complexes.
Micromechanical investigation of sand migration in gas hydrate-bearing sediments
NASA Astrophysics Data System (ADS)
Uchida, S.; Klar, A.; Cohen, E.
2017-12-01
Past field gas production tests from hydrate bearing sediments have indicated that sand migration is an important phenomenon that needs to be considered for successful long-term gas production. The authors previously developed the continuum based analytical thermo-hydro-mechanical sand migration model that can be applied to predict wellbore responses during gas production. However, the model parameters involved in the model still needs to be calibrated and studied thoroughly and it still remains a challenge to conduct well-defined laboratory experiments of sand migration, especially in hydrate-bearing sediments. Taking the advantage of capability of micromechanical modelling approach through discrete element method (DEM), this work presents a first step towards quantifying one of the model parameters that governs stresses reduction due to grain detachment. Grains represented by DEM particles are randomly removed from an isotropically loaded DEM specimen and statistical analyses reveal that linear proportionality exists between the normalized volume of detached solids and normalized reduced stresses. The DEM specimen with different porosities (different packing densities) are also considered and statistical analyses show that there is a clear transition between loose sand behavior and dense sand behavior, characterized by the relative density.
Model-Based Linkage Analysis of a Quantitative Trait.
Song, Yeunjoo E; Song, Sunah; Schnell, Audrey H
2017-01-01
Linkage Analysis is a family-based method of analysis to examine whether any typed genetic markers cosegregate with a given trait, in this case a quantitative trait. If linkage exists, this is taken as evidence in support of a genetic basis for the trait. Historically, linkage analysis was performed using a binary disease trait, but has been extended to include quantitative disease measures. Quantitative traits are desirable as they provide more information than binary traits. Linkage analysis can be performed using single-marker methods (one marker at a time) or multipoint (using multiple markers simultaneously). In model-based linkage analysis the genetic model for the trait of interest is specified. There are many software options for performing linkage analysis. Here, we use the program package Statistical Analysis for Genetic Epidemiology (S.A.G.E.). S.A.G.E. was chosen because it also includes programs to perform data cleaning procedures and to generate and test genetic models for a quantitative trait, in addition to performing linkage analysis. We demonstrate in detail the process of running the program LODLINK to perform single-marker analysis, and MLOD to perform multipoint analysis using output from SEGREG, where SEGREG was used to determine the best fitting statistical model for the trait.
Jover-Esplá, Ana Gabriela; Palazón-Bru, Antonio; Folgado-de la Rosa, David Manuel; Severá-Ferrándiz, Guillermo; Sancho-Mestre, Manuela; de Juan-Herrero, Joaquín; Gil-Guillén, Vicente Francisco
2018-05-01
The existing predictive models of laryngeal cancer recurrence present limitations for clinical practice. Therefore, we constructed, internally validated and implemented in a mobile application (Android) a new model based on a points system taking into account the internationally recommended statistical methodology. This longitudinal prospective study included 189 patients with glottic cancer in 2004-2016 in a Spanish region. The main variable was time-to-recurrence, and its potential predictors were: age, gender, TNM classification, stage, smoking, alcohol consumption, and histology. A points system was developed to predict five-year risk of recurrence based on a Cox model. This was validated internally by bootstrapping, determining discrimination (C-statistics) and calibration (smooth curves). A total of 77 patients presented recurrence (40.7%) in a mean follow-up period of 3.4 ± 3.0 years. The factors in the model were: age, lymph node stage, alcohol consumption and stage. Discrimination and calibration were satisfactory. A points system was developed to obtain the probability of recurrence of laryngeal glottic cancer in five years, using five clinical variables. Our system should be validated externally in other geographical areas. Copyright © 2018 Elsevier Ltd. All rights reserved.
Systematic review of prediction models for delirium in the older adult inpatient.
Lindroth, Heidi; Bratzke, Lisa; Purvis, Suzanne; Brown, Roger; Coburn, Mark; Mrkobrada, Marko; Chan, Matthew T V; Davis, Daniel H J; Pandharipande, Pratik; Carlsson, Cynthia M; Sanders, Robert D
2018-04-28
To identify existing prognostic delirium prediction models and evaluate their validity and statistical methodology in the older adult (≥60 years) acute hospital population. Systematic review. PubMed, CINAHL, PsychINFO, SocINFO, Cochrane, Web of Science and Embase were searched from 1 January 1990 to 31 December 2016. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses and CHARMS Statement guided protocol development. age >60 years, inpatient, developed/validated a prognostic delirium prediction model. alcohol-related delirium, sample size ≤50. The primary performance measures were calibration and discrimination statistics. Two authors independently conducted search and extracted data. The synthesis of data was done by the first author. Disagreement was resolved by the mentoring author. The initial search resulted in 7,502 studies. Following full-text review of 192 studies, 33 were excluded based on age criteria (<60 years) and 27 met the defined criteria. Twenty-three delirium prediction models were identified, 14 were externally validated and 3 were internally validated. The following populations were represented: 11 medical, 3 medical/surgical and 13 surgical. The assessment of delirium was often non-systematic, resulting in varied incidence. Fourteen models were externally validated with an area under the receiver operating curve range from 0.52 to 0.94. Limitations in design, data collection methods and model metric reporting statistics were identified. Delirium prediction models for older adults show variable and typically inadequate predictive capabilities. Our review highlights the need for development of robust models to predict delirium in older inpatients. We provide recommendations for the development of such models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Estimating extreme river discharges in Europe through a Bayesian network
NASA Astrophysics Data System (ADS)
Paprotny, Dominik; Morales-Nápoles, Oswaldo
2017-06-01
Large-scale hydrological modelling of flood hazards requires adequate extreme discharge data. In practise, models based on physics are applied alongside those utilizing only statistical analysis. The former require enormous computational power, while the latter are mostly limited in accuracy and spatial coverage. In this paper we introduce an alternate, statistical approach based on Bayesian networks (BNs), a graphical model for dependent random variables. We use a non-parametric BN to describe the joint distribution of extreme discharges in European rivers and variables representing the geographical characteristics of their catchments. Annual maxima of daily discharges from more than 1800 river gauges (stations with catchment areas ranging from 1.4 to 807 000 km2) were collected, together with information on terrain, land use and local climate. The (conditional) correlations between the variables are modelled through copulas, with the dependency structure defined in the network. The results show that using this method, mean annual maxima and return periods of discharges could be estimated with an accuracy similar to existing studies using physical models for Europe and better than a comparable global statistical model. Performance of the model varies slightly between regions of Europe, but is consistent between different time periods, and remains the same in a split-sample validation. Though discharge prediction under climate change is not the main scope of this paper, the BN was applied to a large domain covering all sizes of rivers in the continent both for present and future climate, as an example. Results show substantial variation in the influence of climate change on river discharges. The model can be used to provide quick estimates of extreme discharges at any location for the purpose of obtaining input information for hydraulic modelling.
Update on SU(2) gauge theory with NF = 2 fundamental flavours.
NASA Astrophysics Data System (ADS)
Drach, Vincent; Janowski, Tadeusz; Pica, Claudio
2018-03-01
We present a non perturbative study of SU(2) gauge theory with two fundamental Dirac flavours. This theory provides a minimal template which is ideal for a wide class of Standard Model extensions featuring novel strong dynamics, such as a minimal realization of composite Higgs models. We present an update on the status of the meson spectrum and decay constants based on increased statistics on our existing ensembles and the inclusion of new ensembles with lighter pion masses, resulting in a more reliable chiral extrapolation. Preprint: CP3-Origins-2017-048 DNRF90
NASA Technical Reports Server (NTRS)
daSilva, Arlindo
2004-01-01
The first set of interoperability experiments illustrates the role ESMF can play in integrating the national Earth science resources. Using existing data assimilation technology from NCEP and the National Weather Service, the Community Atmosphere Model (CAM) was able to ingest conventional and remotely sensed observations, a capability that could open the door to using CAM for weather as well as climate prediction. CAM, which includes land surface capabilities, was developed by NCAR, with key components from GSFC. In this talk we will describe the steps necessary for achieving the coupling of these two systems.
Neighbor effect in complexation of a conjugated polymer.
Sosorev, Andrey; Zapunidi, Sergey
2013-09-19
Charge-transfer complex (CTC) formation between a conjugated polymer and low-molecular-weight organic acceptor is proposed to be driven by the neighbor effect. Formation of a CTC on the polymer chain results in an increased probability of new CTC formation near the existing one. We present an analytical model for CTC distribution considering the neighbor effect, based on the principles of statistical mechanics. This model explains the experimentally observed threshold-like dependence of the CTC concentration on the acceptor content in a polymer:acceptor blend. It also allows us to evaluate binding energies of the complexes.
NASA Astrophysics Data System (ADS)
Plekhov, Oleg; Naimark, Oleg; Narykova, Maria; Kadomtsev, Andrey; Betekhtin, Vladimir
2015-10-01
The work is devoted to the study of the metal structure evolution under gigacyclic fatigue (VHCF) regime. The study of the mechanical properties of the samples (Armco iron) with different state of life time existing was carried out on the base of the acoustic resonance method. The damage accumulation (porosity of the samples) was studied by the hydrostatic weighing method. A statistical model of damage accumulation was proposed in order to describe the damage accumulation process. The model describes the influence of the sample surface on the location of fatigue crack initiation.
Pathway analysis with next-generation sequencing data.
Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao
2015-04-01
Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
NASA Astrophysics Data System (ADS)
Erfanifard, Y.; Rezayan, F.
2014-10-01
Vegetation heterogeneity biases second-order summary statistics, e.g., Ripley's K-function, applied for spatial pattern analysis in ecology. Second-order investigation based on Ripley's K-function and related statistics (i.e., L- and pair correlation function g) is widely used in ecology to develop hypothesis on underlying processes by characterizing spatial patterns of vegetation. The aim of this study was to demonstrate effects of underlying heterogeneity of wild pistachio (Pistacia atlantica Desf.) trees on the second-order summary statistics of point pattern analysis in a part of Zagros woodlands, Iran. The spatial distribution of 431 wild pistachio trees was accurately mapped in a 40 ha stand in the Wild Pistachio & Almond Research Site, Fars province, Iran. Three commonly used second-order summary statistics (i.e., K-, L-, and g-functions) were applied to analyse their spatial pattern. The two-sample Kolmogorov-Smirnov goodness-of-fit test showed that the observed pattern significantly followed an inhomogeneous Poisson process null model in the study region. The results also showed that heterogeneous pattern of wild pistachio trees biased the homogeneous form of K-, L-, and g-functions, demonstrating a stronger aggregation of the trees at the scales of 0-50 m than actually existed and an aggregation at scales of 150-200 m, while regularly distributed. Consequently, we showed that heterogeneity of point patterns may bias the results of homogeneous second-order summary statistics and we also suggested applying inhomogeneous summary statistics with related null models for spatial pattern analysis of heterogeneous vegetations.
GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences.
Yu, Ning; Guo, Xuan; Zelikovsky, Alexander; Pan, Yi
2017-05-24
As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG obs /CpG exp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.
Do maladaptive behaviors exist at one or both ends of personality traits?
Pettersson, Erik; Mendle, Jane; Turkheimer, Eric; Horn, Erin E; Ford, Derek C; Simms, Leonard J; Clark, Lee Anna
2014-06-01
In the Diagnostic and Statistical Manual of Mental Disorders (5th ed.; American Psychiatric Association, 2013) personality disorder trait model, maladaptive behavior is located at one end of continuous scales. Widiger and colleagues, however, have argued that maladaptive behavior exists at both ends of trait continua. We propose that the role of evaluative variance differentiates these two perspectives and that once evaluation is isolated, maladaptive behaviors emerge at both ends of nonevaluative trait dimensions. In Study 1, we argue that evaluative variance is worthwhile to measure separately from descriptive content because it clusters items by valence regardless of content (e.g., lazy and workaholic; apathetic and anxious; gullible and paranoid; timid and hostile, etc.), which is unlikely to describe a consistent behavioral style. We isolate evaluation statistically (Study 2) and at the time of measurement (Study 3) to show that factors unrelated to valence evidence maladaptive behavior at both ends. We argue that nonevaluative factors, which display maladaptive behavior at both ends of continua, may better approximate ways in which individuals actually behave.
Variable Selection in the Presence of Missing Data: Imputation-based Methods.
Zhao, Yize; Long, Qi
2017-01-01
Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.
Digital morphogenesis via Schelling segregation
NASA Astrophysics Data System (ADS)
Barmpalias, George; Elwes, Richard; Lewis-Pye, Andrew
2018-04-01
Schelling’s model of segregation looks to explain the way in which particles or agents of two types may come to arrange themselves spatially into configurations consisting of large homogeneous clusters, i.e. connected regions consisting of only one type. As one of the earliest agent based models studied by economists and perhaps the most famous model of self-organising behaviour, it also has direct links to areas at the interface between computer science and statistical mechanics, such as the Ising model and the study of contagion and cascading phenomena in networks. While the model has been extensively studied it has largely resisted rigorous analysis, prior results from the literature generally pertaining to variants of the model which are tweaked so as to be amenable to standard techniques from statistical mechanics or stochastic evolutionary game theory. In Brandt et al (2012 Proc. 44th Annual ACM Symp. on Theory of Computing) provided the first rigorous analysis of the unperturbed model, for a specific set of input parameters. Here we provide a rigorous analysis of the model’s behaviour much more generally and establish some surprising forms of threshold behaviour, notably the existence of situations where an increased level of intolerance for neighbouring agents of opposite type leads almost certainly to decreased segregation.
The discounting model selector: Statistical software for delay discounting applications.
Gilroy, Shawn P; Franck, Christopher T; Hantula, Donald A
2017-05-01
Original, open-source computer software was developed and validated against established delay discounting methods in the literature. The software executed approximate Bayesian model selection methods from user-supplied temporal discounting data and computed the effective delay 50 (ED50) from the best performing model. Software was custom-designed to enable behavior analysts to conveniently apply recent statistical methods to temporal discounting data with the aid of a graphical user interface (GUI). The results of independent validation of the approximate Bayesian model selection methods indicated that the program provided results identical to that of the original source paper and its methods. Monte Carlo simulation (n = 50,000) confirmed that true model was selected most often in each setting. Simulation code and data for this study were posted to an online repository for use by other researchers. The model selection approach was applied to three existing delay discounting data sets from the literature in addition to the data from the source paper. Comparisons of model selected ED50 were consistent with traditional indices of discounting. Conceptual issues related to the development and use of computer software by behavior analysts and the opportunities afforded by free and open-sourced software are discussed and a review of possible expansions of this software are provided. © 2017 Society for the Experimental Analysis of Behavior.
Rational approximations to rational models: alternative algorithms for category learning.
Sanborn, Adam N; Griffiths, Thomas L; Navarro, Daniel J
2010-10-01
Rational models of cognition typically consider the abstract computational problems posed by the environment, assuming that people are capable of optimally solving those problems. This differs from more traditional formal models of cognition, which focus on the psychological processes responsible for behavior. A basic challenge for rational models is thus explaining how optimal solutions can be approximated by psychological processes. We outline a general strategy for answering this question, namely to explore the psychological plausibility of approximation algorithms developed in computer science and statistics. In particular, we argue that Monte Carlo methods provide a source of rational process models that connect optimal solutions to psychological processes. We support this argument through a detailed example, applying this approach to Anderson's (1990, 1991) rational model of categorization (RMC), which involves a particularly challenging computational problem. Drawing on a connection between the RMC and ideas from nonparametric Bayesian statistics, we propose 2 alternative algorithms for approximate inference in this model. The algorithms we consider include Gibbs sampling, a procedure appropriate when all stimuli are presented simultaneously, and particle filters, which sequentially approximate the posterior distribution with a small number of samples that are updated as new data become available. Applying these algorithms to several existing datasets shows that a particle filter with a single particle provides a good description of human inferences.
NASA Astrophysics Data System (ADS)
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Diagnostic methods for atmospheric inversions of long-lived greenhouse gases
NASA Astrophysics Data System (ADS)
Michalak, Anna M.; Randazzo, Nina A.; Chevallier, Frédéric
2017-06-01
The ability to predict the trajectory of climate change requires a clear understanding of the emissions and uptake (i.e., surface fluxes) of long-lived greenhouse gases (GHGs). Furthermore, the development of climate policies is driving a need to constrain the budgets of anthropogenic GHG emissions. Inverse problems that couple atmospheric observations of GHG concentrations with an atmospheric chemistry and transport model have increasingly been used to gain insights into surface fluxes. Given the inherent technical challenges associated with their solution, it is imperative that objective approaches exist for the evaluation of such inverse problems. Because direct observation of fluxes at compatible spatiotemporal scales is rarely possible, diagnostics tools must rely on indirect measures. Here we review diagnostics that have been implemented in recent studies and discuss their use in informing adjustments to model setup. We group the diagnostics along a continuum starting with those that are most closely related to the scientific question being targeted, and ending with those most closely tied to the statistical and computational setup of the inversion. We thus begin with diagnostics based on assessments against independent information (e.g., unused atmospheric observations, large-scale scientific constraints), followed by statistical diagnostics of inversion results, diagnostics based on sensitivity tests, and analyses of robustness (e.g., tests focusing on the chemistry and transport model, the atmospheric observations, or the statistical and computational framework), and close with the use of synthetic data experiments (i.e., observing system simulation experiments, OSSEs). We find that existing diagnostics provide a crucial toolbox for evaluating and improving flux estimates but, not surprisingly, cannot overcome the fundamental challenges associated with limited atmospheric observations or the lack of direct flux measurements at compatible scales. As atmospheric inversions are increasingly expected to contribute to national reporting of GHG emissions, the need for developing and implementing robust and transparent evaluation approaches will only grow.
NASA Astrophysics Data System (ADS)
West, Damien; West, Bruce J.
2012-07-01
There are a substantial number of empirical relations that began with the identification of a pattern in data; were shown to have a terse power-law description; were interpreted using existing theory; reached the level of "law" and given a name; only to be subsequently fade away when it proved impossible to connect the "law" with a larger body of theory and/or data. Various forms of allometry relations (ARs) have followed this path. The ARs in biology are nearly two hundred years old and those in ecology, geophysics, physiology and other areas of investigation are not that much younger. In general if X is a measure of the size of a complex host network and Y is a property of a complex subnetwork embedded within the host network a theoretical AR exists between the two when Y = aXb. We emphasize that the reductionistic models of AR interpret X and Y as dynamic variables, albeit the ARs themselves are explicitly time independent even though in some cases the parameter values change over time. On the other hand, the phenomenological models of AR are based on the statistical analysis of data and interpret X and Y as averages to yield the empirical AR:
Eruption patterns of the chilean volcanoes Villarrica, Llaima, and Tupungatito
NASA Astrophysics Data System (ADS)
Muñoz, Miguel
1983-09-01
The historical eruption records of three Chilean volcanoes have been subjected to many statistical tests, and none have been found to differ significantly from random, or Poissonian, behaviour. The statistical analysis shows rough conformity with the descriptions determined from the eruption rate functions. It is possible that a constant eruption rate describes the activity of Villarrica; Llaima and Tupungatito present complex eruption rate patterns that appear, however, to have no statistical significance. Questions related to loading and extinction processes and to the existence of shallow secondary magma chambers to which magma is supplied from a deeper system are also addressed. The analysis and the computation of the serial correlation coefficients indicate that the three series may be regarded as stationary renewal processes. None of the test statistics indicates rejection of the Poisson hypothesis at a level less than 5%, but the coefficient of variation for the eruption series at Llaima is significantly different from the value expected for a Poisson process. Also, the estimates of the normalized spectrum of the counting process for the three series suggest a departure from the random model, but the deviations are not found to be significant at the 5% level. Kolmogorov-Smirnov and chi-squared test statistics, applied directly to ascertaining to which probability P the random Poisson model fits the data, indicate that there is significant agreement in the case of Villarrica ( P=0.59) and Tupungatito ( P=0.3). Even though the P-value for Llaima is a marginally significant 0.1 (which is equivalent to rejecting the Poisson model at the 90% confidence level), the series suggests that nonrandom features are possibly present in the eruptive activity of this volcano.
A process-based standard for the Solar Energetic Particle Event Environment
NASA Astrophysics Data System (ADS)
Gabriel, Stephen
For 10 years or more, there has been a lack of concensus on what the ISO standard model for the Solar Energetic Particle Event (SEPE) environment should be. Despite many technical discussions between the world experts in this field, it has been impossible to agree on which of the several models available should be selected as the standard. Most of these discussions at the ISO WG4 meetings and conferences, etc have centred around the differences in modelling approach between the MSU model and the several remaining models from elsewhere worldwide (mainly the USA and Europe). The topic is considered timely given the inclusion of a session on reference data sets at the Space Weather Workshop in Boulder in April 2014. The original idea of a ‘process-based’ standard was conceived by Dr Kent Tobiska as a way of getting round the problems associated with not only the presence of different models, which in themselves could have quite distinct modelling approaches but could also be based on different data sets. In essence, a process based standard approach overcomes these issues by allowing there to be more than one model and not necessarily a single standard model; however, any such model has to be completely transparent in that the data set and the modelling techniques used have to be not only to be clearly and unambiguously defined but also subject to peer review. If the model meets all of these requirements then it should be acceptable as a standard model. So how does this process-based approach resolve the differences between the existing modelling approaches for the SEPE environment and remove the impasse? In a sense, it does not remove all of the differences but only some of them; however, most importantly it will allow something which so far has been impossible without ambiguities and disagreement and that is a comparison of the results of the various models. To date one of the problems (if not the major one) in comparing the results of the various different SEPE statistical models has been caused by two things: 1) the data set and 2) the definition of an event Because unravelling the dependencies of the outputs of different statistical models on these two parameters is extremely difficult if not impossible, currently comparison of the results from the different models is also extremely difficult and can lead to controversies, especially over which model is the correct one; hence, when it comes to using these models for engineering purposes to calculate, for example, the radiation dose for a particular mission, the user, who is in all likelihood not an expert in this field, could be given two( or even more) very different environments and find it impossible to know how to select one ( or even how to compare them). What is proposed then, is a process-based standard, which in common with nearly all of the current models is composed of 3 elements, a standard data set, a standard event definition and a resulting standard event list. A standard event list is the output of this standard and can then be used with any of the existing (or indeed future) models that are based on events. This standard event list is completely traceable and transparent and represents a reference event list for all the community. When coupled with a statistical model, the results when compared will only be dependent on the statistical model and not on the data set or event definition.
Statistical analysis of regulatory ecotoxicity tests.
Isnard, P; Flammarion, P; Roman, G; Babut, M; Bastien, P; Bintein, S; Esserméant, L; Férard, J F; Gallotti-Schmitt, S; Saouter, E; Saroli, M; Thiébaud, H; Tomassone, R; Vindimian, E
2001-11-01
ANOVA-type data analysis, i.e.. determination of lowest-observed-effect concentrations (LOECs), and no-observed-effect concentrations (NOECs), has been widely used for statistical analysis of chronic ecotoxicity data. However, it is more and more criticised for several reasons, among which the most important is probably the fact that the NOEC depends on the choice of test concentrations and number of replications and rewards poor experiments, i.e., high variability, with high NOEC values. Thus, a recent OECD workshop concluded that the use of the NOEC should be phased out and that a regression-based estimation procedure should be used. Following this workshop, a working group was established at the French level between government, academia and industry representatives. Twenty-seven sets of chronic data (algae, daphnia, fish) were collected and analysed by ANOVA and regression procedures. Several regression models were compared and relations between NOECs and ECx, for different values of x, were established in order to find an alternative summary parameter to the NOEC. Biological arguments are scarce to help in defining a negligible level of effect x for the ECx. With regard to their use in the risk assessment procedures, a convenient methodology would be to choose x so that ECx are on average similar to the present NOEC. This would lead to no major change in the risk assessment procedure. However, experimental data show that the ECx depend on the regression models and that their accuracy decreases in the low effect zone. This disadvantage could probably be reduced by adapting existing experimental protocols but it could mean more experimental effort and higher cost. ECx (derived with existing test guidelines, e.g., regarding the number of replicates) whose lowest bounds of the confidence interval are on average similar to present NOEC would improve this approach by a priori encouraging more precise experiments. However, narrow confidence intervals are not only linked to good experimental practices, but also depend on the distance between the best model fit and experimental data. At least, these approaches still use the NOEC as a reference although this reference is statistically not correct. On the contrary, EC50 are the most precise values to estimate on a concentration response curve, but they are clearly different from the NOEC and their use would require a modification of existing assessment factors.
NASA Astrophysics Data System (ADS)
Härer, Stefan; Bernhardt, Matthias; Gutmann, Ethan; Bauer, Hans-Stefan; Schulz, Karsten
2017-04-01
Until recently, a large gap existed in the atmospheric downscaling strategies. On the one hand, computationally efficient statistical approaches are widely used, on the other hand, dynamic but CPU-intensive numeric atmospheric models like the weather research and forecast (WRF) model exist. The intermediate complex atmospheric research (ICAR) model developed at NCAR (Boulder, Colorado, USA) addresses this gap by combining the strengths of both approaches: the process-based structure of a dynamic model and its applicability in a changing climate as well as the speed of a parsimonious modelling approach which facilitates the modelling of ensembles and a straightforward way to test new parametrization schemes as well as various input data sources. However, the ICAR model has not been tested in Europe and on slightly undulated terrain yet. This study now evaluates for the first time the ICAR model to WRF model runs in Central Europe comparing a complete year of model results in the mesoscale Attert catchment (Luxembourg). In addition to these modelling results, we also describe the first implementation of ICAR on an Intel Phi architecture and consequently perform speed tests between the Vienna cluster, a standard workstation and the use of an Intel Phi coprocessor. Finally, the study gives an outlook on sensitivity studies using slightly different input data sources.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Datta, Susmita
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally-driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statisticalmore » inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian model (BP-Quant) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern, or the existence of multiple over-expressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab ® and R packages at https://github.com/PNNL-Comp-Mass-Spec/BP-Quant.« less
The space of ultrametric phylogenetic trees.
Gavryushkin, Alex; Drummond, Alexei J
2016-08-21
The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Ensemble-based prediction of RNA secondary structures.
Aghaeepour, Nima; Hoos, Holger H
2013-04-24
Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.
NASA Astrophysics Data System (ADS)
Payraudeau, S.; Tournoud, M. G.; Cernesson, F.
Distributed modelling in hydrology assess catchment subdivision to take into account physic characteristics. In this paper, we test the effect of land use aggregation scheme on catchment hydrological response. Evolution of intra-subcatchment land use is studied using statistic and entropy methods. The SCS-CN method is used to calculate effective rainfall which is here assimilated to hydrological response. Our purpose is to determine the existence of a critical threshold-area appropriate for the application of hydrological modelling. Land use aggregation effects on effective rainfall is assessed on small mediterranean catchment. The results show that land use aggregation and land use classification type have significant effects on hydrological modelling and in particular on effective rainfall modelling.
Identification of Chinese plague foci from long-term epidemiological data
Ben-Ari, Tamara; Neerinckx, Simon; Agier, Lydiane; Cazelles, Bernard; Xu, Lei; Zhang, Zhibin; Fang, Xiye; Wang, Shuchun; Liu, Qiyong; Stenseth, Nils C.
2012-01-01
Carrying out statistical analysis over an extensive dataset of human plague reports in Chinese villages from 1772 to 1964, we identified plague endemic territories in China (i.e., plague foci). Analyses rely on (i) a clustering method that groups time series based on their time-frequency resemblances and (ii) an ecological niche model that helps identify plague suitable territories characterized by value ranges for a set of predefined environmental variables. Results from both statistical tools indicate the existence of two disconnected plague territories corresponding to Northern and Southern China. Altogether, at least four well defined independent foci are identified. Their contours compare favorably with field observations. Potential and limitations of inferring plague foci and dynamics using epidemiological data is discussed. PMID:22570501
Potential pitfalls when denoising resting state fMRI data using nuisance regression.
Bright, Molly G; Tench, Christopher R; Murphy, Kevin
2017-07-01
In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Computational Motion Phantoms and Statistical Models of Respiratory Motion
NASA Astrophysics Data System (ADS)
Ehrhardt, Jan; Klinder, Tobias; Lorenz, Cristian
Breathing motion is not a robust and 100 % reproducible process, and inter- and intra-fractional motion variations form an important problem in radiotherapy of the thorax and upper abdomen. A widespread consensus nowadays exists that it would be useful to use prior knowledge about respiratory organ motion and its variability to improve radiotherapy planning and treatment delivery. This chapter discusses two different approaches to model the variability of respiratory motion. In the first part, we review computational motion phantoms, i.e. computerized anatomical and physiological models. Computational phantoms are excellent tools to simulate and investigate the effects of organ motion in radiation therapy and to gain insight into methods for motion management. The second part of this chapter discusses statistical modeling techniques to describe the breathing motion and its variability in a population of 4D images. Population-based models can be generated from repeatedly acquired 4D images of the same patient (intra-patient models) and from 4D images of different patients (inter-patient models). The generation of those models is explained and possible applications of those models for motion prediction in radiotherapy are exemplified. Computational models of respiratory motion and motion variability have numerous applications in radiation therapy, e.g. to understand motion effects in simulation studies, to develop and evaluate treatment strategies or to introduce prior knowledge into the patient-specific treatment planning.
Probabilistic neural networks modeling of the 48-h LC50 acute toxicity endpoint to Daphnia magna.
Niculescu, S P; Lewis, M A; Tigner, J
2008-01-01
Two modeling experiments based on the maximum likelihood estimation paradigm and targeting prediction of the Daphnia magna 48-h LC50 acute toxicity endpoint for both organic and inorganic compounds are reported. The resulting models computational algorithms are implemented as basic probabilistic neural networks with Gaussian kernel (statistical corrections included). The first experiment uses strictly D. magna information for 971 structures as training/learning data and the resulting model targets practical applications. The second experiment uses the same training/learning information plus additional data on another 29 compounds whose endpoint information is originating from D. pulex and Ceriodaphnia dubia. It only targets investigation of the effect of mixing strictly D. magna 48-h LC50 modeling information with small amounts of similar information estimated from related species, and this is done as part of the validation process. A complementary 81 compounds dataset (involving only strictly D. magna information) is used to perform external testing. On this external test set, the Gaussian character of the distribution of the residuals is confirmed for both models. This allows the use of traditional statistical methodology to implement computation of confidence intervals for the unknown measured values based on the models predictions. Examples are provided for the model targeting practical applications. For the same model, a comparison with other existing models targeting the same endpoint is performed.
Microstructure development in Kolmogorov, Johnson-Mehl, and Avrami nucleation and growth kinetics
NASA Astrophysics Data System (ADS)
Pineda, Eloi; Crespo, Daniel
1999-08-01
A statistical model with the ability to evaluate the microstructure developed in nucleation and growth kinetics is built in the framework of the Kolmogorov, Johnson-Mehl, and Avrami theory. A populational approach is used to compute the observed grain-size distribution. The impingement process which delays grain growth is analyzed, and the effective growth rate of each population is estimated considering the previous grain history. The proposed model is integrated for a wide range of nucleation and growth protocols, including constant nucleation, pre-existing nuclei, and intermittent nucleation with interface or diffusion-controlled grain growth. The results are compared with Monte Carlo simulations, giving quantitative agreement even in cases where previous models fail.
Measuring health and disability: supporting policy development. The European MHADIE project.
Leonardi, Matilde
2010-01-01
Disability is a multi-dimensional phenomenon arising out of an interaction between the individual's health status and his environment: disability data must reflect this bio-psychosocial model. WHO's International Classification of Functioning, Disability and Health (ICF) provides the framework for documenting the interaction between health status and environmental features. MHADIE, a 3-year project supported by a EC 6th Framework Programme Grant, aimed at demonstrating the feasibility and utility of the ICF model in the measurement and description of disability. The ICF model was used as the structure for analysing existing population health surveys and education statistics data. ICF-based tools were used to describe disability in selected health conditions. MHADIE researchers showed that the ICF model is adequate for describing and measuring patterns of disability in clinical samples from different countries cross-sectionally and over time as well as feasible and useful in educational sectors. Valid and reliable information are essential to design, implement or evaluate policies to combat discrimination, promote integration and enhance opportunities. Results made it possible to produce a definition of disability as well as policy recommendations concerning how, in Europe and internationally, the existing sources of data can be harmonized with the ICF model.
Brown, Jeremiah R; MacKenzie, Todd A; Maddox, Thomas M; Fly, James; Tsai, Thomas T; Plomondon, Mary E; Nielson, Christopher D; Siew, Edward D; Resnic, Frederic S; Baker, Clifton R; Rumsfeld, John S; Matheny, Michael E
2015-12-11
Acute kidney injury (AKI) occurs frequently after cardiac catheterization and percutaneous coronary intervention. Although a clinical risk model exists for percutaneous coronary intervention, no models exist for both procedures, nor do existing models account for risk factors prior to the index admission. We aimed to develop such a model for use in prospective automated surveillance programs in the Veterans Health Administration. We collected data on all patients undergoing cardiac catheterization or percutaneous coronary intervention in the Veterans Health Administration from January 01, 2009 to September 30, 2013, excluding patients with chronic dialysis, end-stage renal disease, renal transplant, and missing pre- and postprocedural creatinine measurement. We used 4 AKI definitions in model development and included risk factors from up to 1 year prior to the procedure and at presentation. We developed our prediction models for postprocedural AKI using the least absolute shrinkage and selection operator (LASSO) and internally validated using bootstrapping. We developed models using 115 633 angiogram procedures and externally validated using 27 905 procedures from a New England cohort. Models had cross-validated C-statistics of 0.74 (95% CI: 0.74-0.75) for AKI, 0.83 (95% CI: 0.82-0.84) for AKIN2, 0.74 (95% CI: 0.74-0.75) for contrast-induced nephropathy, and 0.89 (95% CI: 0.87-0.90) for dialysis. We developed a robust, externally validated clinical prediction model for AKI following cardiac catheterization or percutaneous coronary intervention to automatically identify high-risk patients before and immediately after a procedure in the Veterans Health Administration. Work is ongoing to incorporate these models into routine clinical practice. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Milic, Natasa M.; Trajkovic, Goran Z.; Bukumiric, Zoran M.; Cirkovic, Andja; Nikolic, Ivan M.; Milin, Jelena S.; Milic, Nikola V.; Savic, Marko D.; Corac, Aleksandar M.; Marinkovic, Jelena M.; Stanisavljevic, Dejana M.
2016-01-01
Background Although recent studies report on the benefits of blended learning in improving medical student education, there is still no empirical evidence on the relative effectiveness of blended over traditional learning approaches in medical statistics. We implemented blended along with on-site (i.e. face-to-face) learning to further assess the potential value of web-based learning in medical statistics. Methods This was a prospective study conducted with third year medical undergraduate students attending the Faculty of Medicine, University of Belgrade, who passed (440 of 545) the final exam of the obligatory introductory statistics course during 2013–14. Student statistics achievements were stratified based on the two methods of education delivery: blended learning and on-site learning. Blended learning included a combination of face-to-face and distance learning methodologies integrated into a single course. Results Mean exam scores for the blended learning student group were higher than for the on-site student group for both final statistics score (89.36±6.60 vs. 86.06±8.48; p = 0.001) and knowledge test score (7.88±1.30 vs. 7.51±1.36; p = 0.023) with a medium effect size. There were no differences in sex or study duration between the groups. Current grade point average (GPA) was higher in the blended group. In a multivariable regression model, current GPA and knowledge test scores were associated with the final statistics score after adjusting for study duration and learning modality (p<0.001). Conclusion This study provides empirical evidence to support educator decisions to implement different learning environments for teaching medical statistics to undergraduate medical students. Blended and on-site training formats led to similar knowledge acquisition; however, students with higher GPA preferred the technology assisted learning format. Implementation of blended learning approaches can be considered an attractive, cost-effective, and efficient alternative to traditional classroom training in medical statistics. PMID:26859832
Milic, Natasa M; Trajkovic, Goran Z; Bukumiric, Zoran M; Cirkovic, Andja; Nikolic, Ivan M; Milin, Jelena S; Milic, Nikola V; Savic, Marko D; Corac, Aleksandar M; Marinkovic, Jelena M; Stanisavljevic, Dejana M
2016-01-01
Although recent studies report on the benefits of blended learning in improving medical student education, there is still no empirical evidence on the relative effectiveness of blended over traditional learning approaches in medical statistics. We implemented blended along with on-site (i.e. face-to-face) learning to further assess the potential value of web-based learning in medical statistics. This was a prospective study conducted with third year medical undergraduate students attending the Faculty of Medicine, University of Belgrade, who passed (440 of 545) the final exam of the obligatory introductory statistics course during 2013-14. Student statistics achievements were stratified based on the two methods of education delivery: blended learning and on-site learning. Blended learning included a combination of face-to-face and distance learning methodologies integrated into a single course. Mean exam scores for the blended learning student group were higher than for the on-site student group for both final statistics score (89.36±6.60 vs. 86.06±8.48; p = 0.001) and knowledge test score (7.88±1.30 vs. 7.51±1.36; p = 0.023) with a medium effect size. There were no differences in sex or study duration between the groups. Current grade point average (GPA) was higher in the blended group. In a multivariable regression model, current GPA and knowledge test scores were associated with the final statistics score after adjusting for study duration and learning modality (p<0.001). This study provides empirical evidence to support educator decisions to implement different learning environments for teaching medical statistics to undergraduate medical students. Blended and on-site training formats led to similar knowledge acquisition; however, students with higher GPA preferred the technology assisted learning format. Implementation of blended learning approaches can be considered an attractive, cost-effective, and efficient alternative to traditional classroom training in medical statistics.
Adaptive Error Estimation in Linearized Ocean General Circulation Models
NASA Technical Reports Server (NTRS)
Chechelnitsky, Michael Y.
1999-01-01
Data assimilation methods are routinely used in oceanography. The statistics of the model and measurement errors need to be specified a priori. This study addresses the problem of estimating model and measurement error statistics from observations. We start by testing innovation based methods of adaptive error estimation with low-dimensional models in the North Pacific (5-60 deg N, 132-252 deg E) to TOPEX/POSEIDON (TIP) sea level anomaly data, acoustic tomography data from the ATOC project, and the MIT General Circulation Model (GCM). A reduced state linear model that describes large scale internal (baroclinic) error dynamics is used. The methods are shown to be sensitive to the initial guess for the error statistics and the type of observations. A new off-line approach is developed, the covariance matching approach (CMA), where covariance matrices of model-data residuals are "matched" to their theoretical expectations using familiar least squares methods. This method uses observations directly instead of the innovations sequence and is shown to be related to the MT method and the method of Fu et al. (1993). Twin experiments using the same linearized MIT GCM suggest that altimetric data are ill-suited to the estimation of internal GCM errors, but that such estimates can in theory be obtained using acoustic data. The CMA is then applied to T/P sea level anomaly data and a linearization of a global GFDL GCM which uses two vertical modes. We show that the CMA method can be used with a global model and a global data set, and that the estimates of the error statistics are robust. We show that the fraction of the GCM-T/P residual variance explained by the model error is larger than that derived in Fukumori et al.(1999) with the method of Fu et al.(1993). Most of the model error is explained by the barotropic mode. However, we find that impact of the change in the error statistics on the data assimilation estimates is very small. This is explained by the large representation error, i.e. the dominance of the mesoscale eddies in the T/P signal, which are not part of the 21 by 1" GCM. Therefore, the impact of the observations on the assimilation is very small even after the adjustment of the error statistics. This work demonstrates that simult&neous estimation of the model and measurement error statistics for data assimilation with global ocean data sets and linearized GCMs is possible. However, the error covariance estimation problem is in general highly underdetermined, much more so than the state estimation problem. In other words there exist a very large number of statistical models that can be made consistent with the available data. Therefore, methods for obtaining quantitative error estimates, powerful though they may be, cannot replace physical insight. Used in the right context, as a tool for guiding the choice of a small number of model error parameters, covariance matching can be a useful addition to the repertory of tools available to oceanographers.
2011-09-30
by Rosalind M. Rolland, Susan E. Parks, Kathleen E. Hunt, Manuel Castellote, Peter J. Corkeron, Douglas P. Nowacek, Samuel K. Wasser and Scott D...Partitioning. Journal of Computational and Graphical Statistics. 15(3): 651-674. Hunt KE, Rolland RM, Kraus SD, Wasser SK. 2006. Analysis of fecal...KE, Kraus SD, Wasser SK. 2005. Assessing reproductive status of right whales (Eubalaena glacialis) using fecal hormone metabolites. General and
Computing Pathways for Urban Decarbonization.
NASA Astrophysics Data System (ADS)
Cremades, R.; Sommer, P.
2016-12-01
Urban areas emit roughly three quarters of global carbon emissions. Cities are crucial elements for a decarbonized society. Urban expansion and related transportation needs lead to increased energy use, and to carbon-intensive lock-ins that create barriers for climate change mitigation globally. The authors present the Integrated Urban Complexity (IUC) model, based on self-organizing Cellular Automata (CA), and use it to produce a new kind of spatially explicit Transformation Pathways for Urban Decarbonization (TPUD). IUC is based on statistical evidence relating the energy needed for transportation with the spatial distribution of population, specifically IUC incorporates variables from complexity science related to urban form, like the slope of the rank-size rule or spatial entropy, which brings IUC a step beyond existing models. The CA starts its evolution with real-world urban land use and population distribution data from the Global Human Settlement Layer. Thus, the IUC model runs over existing urban settlements, transforming the spatial distribution of population so the energy consumption for transportation is minimized. The statistical evidence that governs the evolution of the CA departs from the database of the International Association of Public Transport. A selected case is presented using Stuttgart (Germany) as an example. The results show how IUC varies urban density in those places where it improves the performance of crucial parameters related to urban form, producing a TPUD that shows where the spatial distribution of population should be modified with a degree of detail of 250 meters of cell size. The TPUD shows how the urban complex system evolves over time to minimize energy consumption for transportation. The resulting dynamics or urban decarbonization show decreased energy per capita, although total energy increases for increasing population. The results provide innovative insights: by checking current urban planning against a TPUD, urban planners could understand where existing plans contradict the Agenda 2030, primarily the Sustainable Development Goals (SDGs) Climate Action (SDG 13), and Sustainable Cities and Communities (SDG 11). For the first time, evidence-based transformation pathways are produced to decarbonize cities.
Teachers and Textbooks: On Statistical Definitions in Senior Secondary Mathematics
ERIC Educational Resources Information Center
Dunn, Peter K.; Marshman, Margaret; McDougall, Robert; Wiegand, Aaron
2015-01-01
The new "Australian Senior Secondary Curriculum: Mathematics" contains more statistics than the existing Australian Curricula. This case study examines how a group of Queensland mathematics teachers define the word "statistics" and five statistical terms from the new curricula. These definitions are compared to those used in…
Six Sigma and Introductory Statistics Education
ERIC Educational Resources Information Center
Maleyeff, John; Kaminsky, Frank C.
2002-01-01
A conflict exists between the way statistics is practiced in contemporary business environments and the way statistics is taught in schools of management. While businesses are embracing programs, such as six sigma and TQM, that bring statistical methods to the forefront of management decision making, students do not graduate with the skills to…
ERIC Educational Resources Information Center
Nolan, Meaghan M.; Beran, Tanya; Hecker, Kent G.
2012-01-01
Students with positive attitudes toward statistics are likely to show strong academic performance in statistics courses. Multiple surveys measuring students' attitudes toward statistics exist; however, a comparison of the validity and reliability of interpretations based on their scores is needed. A systematic review of relevant electronic…
A Tablet-PC Software Application for Statistics Classes
ERIC Educational Resources Information Center
Probst, Alexandre C.
2014-01-01
A significant deficiency in the area of introductory statistics education exists: Student performance on standardized assessments after a full semester statistics course is poor and students report a very low desire to learn statistics. Research on the current generation of students indicates an affinity for technology and for multitasking.…
Primary prevention of dental erosion by calcium and fluoride: a systematic review.
Zini, A; Krivoroutski, Y; Vered, Y
2014-02-01
Overviews of the current literature only provide summaries of existing relevant preventive strategies for dental erosion. To perform a systematic review according to the quantitative meta-analysis method of the scientific literature on prevention of dental erosion. The focused question will address primary prevention of dental erosion by calcium and fluoride. Randomized clinical trials (RCTs) regarding dental erosion prevention. The search included five databases: Embase, Cochrane database of systematic reviews, PubMed (MEDLINE), FDA publication and Berman medical library of the Hebrew University. The search included data in the English language, with effect on preventing dental erosion always presented as mean enamel loss and measured by profilometer. Statistical meta-analysis was performed by StatsDirect program and PEPI statistical software. Fixed- and random-effect models were used to analyse the data. Heterogeneity tests were employed to validate the fixed-effect model assumption. A total of 475 articles on dental erosion prevention were located. A four-stage selection process was employed, and 10 RCT articles were found to be suitable for meta-analysis. The number of studies on prevention of dental erosion maintaining standards of evidence-based dentistry remains insufficient to reach any definite conclusions. The focused questions of this review cannot be addressed according to the existing literature. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Wind Energy Facilities and Residential Properties: The Effect of Proximity and View on Sales Prices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoen, Ben; Wiser, Ryan; Cappers, Peter
2010-04-01
With wind energy expanding rapidly in the U.S. and abroad, and with an increasing number of communities considering nearby wind power developments, there is a need to empirically investigate community concerns about wind project development. One such concern is that property values may be adversely affected by wind energy facilities, and relatively little existing research exists on the subject. The present research is based on almost 7,500 sales of single-family homes situated within ten miles of 24 existing wind facilities in nine different U.S. states. The conclusions of the study are drawn from four different hedonic pricing models. The modelmore » results are consistent in that neither the view of the wind facilities nor the distance of the home to those facilities is found to have a statistically significant effect on home sales prices.« less
NASA Astrophysics Data System (ADS)
Figueroa-Morales, N.; Rivera, A.; Altshuler, E.; Darnige, T.; Douarche, C.; Soto, R.; Lindner, A.; Clément, E.
The motility of E. Coli bacteria is described as a run and tumble process. Changes of direction correspond to a switch in the flagellar motor rotation. The run time distribution is described as an exponential decay of characteristic time close to 1s. Remarkably, it has been demonstrated that the generic response for the distribution of run times is not exponential, but a heavy tailed power law decay, which is at odds with the motility findings. We investigate the consequences of the motor statistics in the macroscopic bacterial transport. During upstream contamination processes in very confined channels, we have identified very long contamination tongues. Using a stochastic model considering bacterial dwelling times on the surfaces related to the run times, we are able to reproduce qualitatively and quantitatively the evolution of the contamination profiles when considering the power law run time distribution. However, the model fails to reproduce the qualitative dynamics when the classical exponential run and tumble distribution is considered. Moreover, we have corroborated the existence of a power law run time distribution by means of 3D Lagrangian tracking. We then argue that the macroscopic transport of bacteria is essentially determined by the motor rotation statistics.
No-Impact Threshold Values for NRAP's Reduced Order Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Last, George V.; Murray, Christopher J.; Brown, Christopher F.
2013-02-01
The purpose of this study was to develop methodologies for establishing baseline datasets and statistical protocols for determining statistically significant changes between background concentrations and predicted concentrations that would be used to represent a contamination plume in the Gen II models being developed by NRAP’s Groundwater Protection team. The initial effort examined selected portions of two aquifer systems; the urban shallow-unconfined aquifer system of the Edwards-Trinity Aquifer System (being used to develop the ROM for carbon-rock aquifers, and the a portion of the High Plains Aquifer (an unconsolidated and semi-consolidated sand and gravel aquifer, being used to development the ROMmore » for sandstone aquifers). Threshold values were determined for Cd, Pb, As, pH, and TDS that could be used to identify contamination due to predicted impacts from carbon sequestration storage reservoirs, based on recommendations found in the EPA’s ''Unified Guidance for Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities'' (US Environmental Protection Agency 2009). Results from this effort can be used to inform a ''no change'' scenario with respect to groundwater impacts, rather than the use of an MCL that could be significantly higher than existing concentrations in the aquifer.« less
Probability model for atmospheric sulfur dioxide concentrations in the area of Venice
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buttazzoni, C.; Lavagnini, I.; Marani, A.
1986-09-01
This paper deals with a comparative screening of existing air quality models based on their ability to simulate the distribution of sulfur dioxide data in the Venetian area. Investigations have been carried out on sulfur dioxide dispersion in the atmosphere of the Venetian area. The studies have been mainly focused on transport models (Gaussian, plume and K-models) aiming at meaningful correlations of sources and receptors. Among the results, a noteworthy disagreement of simulated and experimental data, due to the lack of thorough knowledge of source field conditions and of local meteorology of the sea-land transition area, has been shown. Investigationsmore » with receptor oriented models (based, e.g., on time series analysis, Fourier analysis, or statistical distributions) have also been performed.« less
Strelioff, Christopher C; Crutchfield, James P; Hübler, Alfred W
2007-07-01
Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer kth order Markov chains, for arbitrary k , from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.
An open-access CMIP5 pattern library for temperature and precipitation: description and methodology
NASA Astrophysics Data System (ADS)
Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben
2017-05-01
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.
Uncertainties of statistical downscaling from predictor selection: Equifinality and transferability
NASA Astrophysics Data System (ADS)
Fu, Guobin; Charles, Stephen P.; Chiew, Francis H. S.; Ekström, Marie; Potter, Nick J.
2018-05-01
The nonhomogeneous hidden Markov model (NHMM) statistical downscaling model, 38 catchments in southeast Australia and 19 general circulation models (GCMs) were used in this study to demonstrate statistical downscaling uncertainties caused by equifinality to and transferability. That is to say, there could be multiple sets of predictors that give similar daily rainfall simulation results for both calibration and validation periods, but project different amounts (or even directions of change) of rainfall changing in the future. Results indicated that two sets of predictors (Set 1 with predictors of sea level pressure north-south gradient, u-wind at 700 hPa, v-wind at 700 hPa, and specific humidity at 700 hPa and Set 2 with predictors of sea level pressure north-south gradient, u-wind at 700 hPa, v-wind at 700 hPa, and dewpoint temperature depression at 850 hPa) as inputs to the NHMM produced satisfactory results of seasonal rainfall in comparison with observations. For example, during the model calibration period, the relative errors across the 38 catchments ranged from 0.48 to 1.76% with a mean value of 1.09% for the predictor Set 1, and from 0.22 to 2.24% with a mean value of 1.16% for the predictor Set 2. However, the changes of future rainfall from NHMM projections based on 19 GCMs produced projections with a different sign for these two different sets of predictors: Set 1 predictors project an increase of future rainfall with magnitudes depending on future time periods and emission scenarios, but Set 2 predictors project a decline of future rainfall. Such divergent projections may present a significant challenge for applications of statistical downscaling as well as climate change impact studies, and could potentially imply caveats in many existing studies in the literature.
Sheng, Ben; Marsh, Kimberly; Slavkovic, Aleksandra B; Gregson, Simon; Eaton, Jeffrey W; Bao, Le
2017-04-01
HIV prevalence data collected from routine HIV testing of pregnant women at antenatal clinics (ANC-RT) are potentially available from all facilities that offer testing services to pregnant women and can be used to improve estimates of national and subnational HIV prevalence trends. We develop methods to incorporate these new data source into the Joint United Nations Programme on AIDS Estimation and Projection Package in Spectrum 2017. We develop a new statistical model for incorporating ANC-RT HIV prevalence data, aggregated either to the health facility level (site-level) or regionally (census-level), to estimate HIV prevalence alongside existing sources of HIV prevalence data from ANC unlinked anonymous testing (ANC-UAT) and household-based national population surveys. Synthetic data are generated to understand how the availability of ANC-RT data affects the accuracy of various parameter estimates. We estimate HIV prevalence and additional parameters using both ANC-RT and other existing data. Fitting HIV prevalence using synthetic data generally gives precise estimates of the underlying trend and other parameters. More years of ANC-RT data should improve prevalence estimates. More ANC-RT sites and continuation with existing ANC-UAT sites may improve the estimate of calibration between ANC-UAT and ANC-RT sites. We have proposed methods to incorporate ANC-RT data into Spectrum to obtain more precise estimates of prevalence and other measures of the epidemic. Many assumptions about the accuracy, consistency, and representativeness of ANC-RT prevalence underlie the use of these data for monitoring HIV epidemic trends and should be tested as more data become available from national ANC-RT programs.
Sheng, Ben; Marsh, Kimberly; Slavkovic, Aleksandra B.; Gregson, Simon; Eaton, Jeffrey W.; Bao, Le
2017-01-01
Objective HIV prevalence data collected from routine HIV testing of pregnant women at antenatal clinics (ANC-RT) are potentially available from all facilities that offer testing services to pregnant women, and can be used to improve estimates of national and sub-national HIV prevalence trends. We develop methods to incorporate this new data source into the UNAIDS Estimation and Projection Package (EPP) in Spectrum 2017. Methods We develop a new statistical model for incorporating ANC-RT HIV prevalence data, aggregated either to the health facility level (‘site-level’) or regionally (‘census-level’), to estimate HIV prevalence alongside existing sources of HIV prevalence data from ANC unlinked anonymous testing (ANC-UAT) and household-based national population surveys. Synthetic data are generated to understand how the availability of ANC-RT data affects the accuracy of various parameter estimates. Results We estimate HIV prevalence and additional parameters using both ANC-RT and other existing data. Fitting HIV prevalence using synthetic data generally gives precise estimates of the underlying trend and other parameters. More years of ANC-RT data should improve prevalence estimates. More ANC-RT sites and continuation with existing ANC-UAT sites may improve the estimate of calibration between ANC-UAT and ANC-RT sites. Conclusion We have proposed methods to incorporate ANC-RT data into Spectrum to obtain more precise estimates of prevalence and other measures of the epidemic. Many assumptions about the accuracy, consistency, and representativeness of ANC-RT prevalence underlie the use of these data for monitoring HIV epidemic trends, and should be tested as more data become available from national ANC-RT programs. PMID:28296804
The Society of Thoracic Surgeons Congenital Heart Surgery Database Public Reporting Initiative.
Jacobs, Jeffrey P
2017-01-01
Three basic principles provide the rationale for the Society of Thoracic Surgeons (STS) Congenital Heart Surgery Database (CHSD) public reporting initiative: (1) Variation in congenital and pediatric cardiac surgical outcomes exist. (2) Patients and their families have the right to know the outcomes of the treatments that they will receive. (3). It is our professional responsibility to share this information with them in a format they can understand. The STS CHSD public reporting initiative facilitates the voluntary transparent public reporting of congenital and pediatric cardiac surgical outcomes using the STS CHSD Mortality Risk Model. The STS CHSD Mortality Risk Model is used to calculate risk-adjusted operative mortality and adjusts for the following variables: age, primary procedure, weight (neonates and infants), prior cardiothoracic operations, non-cardiac congenital anatomic abnormalities, chromosomal abnormalities or syndromes, prematurity (neonates and infants), and preoperative factors (including preoperative/preprocedural mechanical circulatory support [intraaortic balloon pump, ventricular assist device, extracorporeal membrane oxygenation, or cardiopulmonary support], shock [persistent at time of surgery], mechanical ventilation to treat cardiorespiratory failure, renal failure requiring dialysis and/or renal dysfunction, preoperative neurological deficit, and other preoperative factors). Operative mortality is defined in all STS databases as (1) all deaths, regardless of cause, occurring during the hospitalization in which the operation was performed, even if after 30 days (including patients transferred to other acute care facilities); and (2) all deaths, regardless of cause, occurring after discharge from the hospital, but before the end of the 30 th postoperative day. The STS CHSD Mortality Risk Model has good model fit and discrimination with an overall C statistics of 0.875 and 0.858 in the development sample and the validation sample, respectively. These C statistics are the highest C statistics ever seen in a pediatric cardiac surgical risk model. Therefore, the STS CHSD Mortality Risk Model provides excellent adjustment for case mix and should mitigate against risk aversive behavior. The STS CHSD Mortality Risk Model is the best available model to date for measuring outcomes after pediatric cardiac surgery. As of March 2016, 60% of participants in STS CHSD have agreed to publicly report their outcomes through the STS Public Reporting Online website (http://www.sts.org/quality-research-patient-safety/sts-public-reporting-online). Although several opportunities exist to improve our risk models, the current STS CHSD public reporting initiative provides the tools to report publicly, and with meaning and accuracy, the outcomes of congenital and pediatric cardiac surgery. Copyright © 2017 Elsevier Inc. All rights reserved.
On the Use of Statistics in Design and the Implications for Deterministic Computer Experiments
NASA Technical Reports Server (NTRS)
Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.
1997-01-01
Perhaps the most prevalent use of statistics in engineering design is through Taguchi's parameter and robust design -- using orthogonal arrays to compute signal-to-noise ratios in a process of design improvement. In our view, however, there is an equally exciting use of statistics in design that could become just as prevalent: it is the concept of metamodeling whereby statistical models are built to approximate detailed computer analysis codes. Although computers continue to get faster, analysis codes always seem to keep pace so that their computational time remains non-trivial. Through metamodeling, approximations of these codes are built that are orders of magnitude cheaper to run. These metamodels can then be linked to optimization routines for fast analysis, or they can serve as a bridge for integrating analysis codes across different domains. In this paper we first review metamodeling techniques that encompass design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We discuss their existing applications in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of metamodeling techniques in given situations and how common pitfalls can be avoided.
Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies
Liu, Zhonghua; Lin, Xihong
2017-01-01
Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391
Multiple phenotype association tests using summary statistics in genome-wide association studies.
Liu, Zhonghua; Lin, Xihong
2018-03-01
We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data.
Lun, Aaron T L; Smyth, Gordon K
2015-08-19
Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results. Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data. On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.
NASA Astrophysics Data System (ADS)
Kadhem, Hasan; Amagasa, Toshiyuki; Kitagawa, Hiroyuki
Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.
Zygmunt, Austin; Asada, Yukiko; Burge, Frederick
2017-10-01
As in many jurisdictions, the delivery of primary care in Canada is being transformed from solo practice to team-based care. In Canada, team-based primary care involves general practitioners working with nurses or other health care providers, and it is expected to improve equity in access to care. This study examined whether team-based care is associated with fewer access problems and less unmet need and whether socioeconomic gradients in access problems and unmet need are smaller in team-based care than in non-team-based care. Data came from the 2008 Canadian Survey of Experiences with Primary Health Care (sample size: 10,858). We measured primary care type as team-based or non-team-based and socioeconomic status by income and education. We created four access problem variables and four unmet need variables (overall and three specific components). For each, we ran separate logistic regression models to examine their associations with primary care type. We examined socioeconomic gradients in access problems and unmet need stratified by primary care type. Primary care type had no statistically significant, independent associations with access problems or unmet need. Among those with non-team-based care, a statistically significant education gradient for overall access problems existed, whereas among those with team-based care, no statistically significant socioeconomic gradients existed.
Pattern Adaptation and Normalization Reweighting.
Westrick, Zachary M; Heeger, David J; Landy, Michael S
2016-09-21
Adaptation to an oriented stimulus changes both the gain and preferred orientation of neural responses in V1. Neurons tuned near the adapted orientation are suppressed, and their preferred orientations shift away from the adapter. We propose a model in which weights of divisive normalization are dynamically adjusted to homeostatically maintain response products between pairs of neurons. We demonstrate that this adjustment can be performed by a very simple learning rule. Simulations of this model closely match existing data from visual adaptation experiments. We consider several alternative models, including variants based on homeostatic maintenance of response correlations or covariance, as well as feedforward gain-control models with multiple layers, and we demonstrate that homeostatic maintenance of response products provides the best account of the physiological data. Adaptation is a phenomenon throughout the nervous system in which neural tuning properties change in response to changes in environmental statistics. We developed a model of adaptation that combines normalization (in which a neuron's gain is reduced by the summed responses of its neighbors) and Hebbian learning (in which synaptic strength, in this case divisive normalization, is increased by correlated firing). The model is shown to account for several properties of adaptation in primary visual cortex in response to changes in the statistics of contour orientation. Copyright © 2016 the authors 0270-6474/16/369805-12$15.00/0.
Visual shape perception as Bayesian inference of 3D object-centered shape representations.
Erdogan, Goker; Jacobs, Robert A
2017-11-01
Despite decades of research, little is known about how people visually perceive object shape. We hypothesize that a promising approach to shape perception is provided by a "visual perception as Bayesian inference" framework which augments an emphasis on visual representation with an emphasis on the idea that shape perception is a form of statistical inference. Our hypothesis claims that shape perception of unfamiliar objects can be characterized as statistical inference of 3D shape in an object-centered coordinate system. We describe a computational model based on our theoretical framework, and provide evidence for the model along two lines. First, we show that, counterintuitively, the model accounts for viewpoint-dependency of object recognition, traditionally regarded as evidence against people's use of 3D object-centered shape representations. Second, we report the results of an experiment using a shape similarity task, and present an extensive evaluation of existing models' abilities to account for the experimental data. We find that our shape inference model captures subjects' behaviors better than competing models. Taken as a whole, our experimental and computational results illustrate the promise of our approach and suggest that people's shape representations of unfamiliar objects are probabilistic, 3D, and object-centered. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Simulation of Healing Threshold in Strain-Induced Inflammation Through a Discrete Informatics Model.
Ibrahim, Israr Bin M; Sarma O V, Sanjay; Pidaparti, Ramana M
2018-05-01
Respiratory diseases such as asthma and acute respiratory distress syndrome as well as acute lung injury involve inflammation at the cellular level. The inflammation process is very complex and is characterized by the emergence of cytokines along with other changes in cellular processes. Due to the complexity of the various constituents that makes up the inflammation dynamics, it is necessary to develop models that can complement experiments to fully understand inflammatory diseases. In this study, we developed a discrete informatics model based on cellular automata (CA) approach to investigate the influence of elastic field (stretch/strain) on the dynamics of inflammation and account for probabilistic adaptation based on statistical interpretation of existing experimental data. Our simulation model investigated the effects of low, medium, and high strain conditions on inflammation dynamics. Results suggest that the model is able to indicate the threshold of innate healing of tissue as a response to strain experienced by the tissue. When strain is under the threshold, the tissue is still capable of adapting its structure to heal the damaged part. However, there exists a strain threshold where healing capability breaks down. The results obtained demonstrate that the developed discrete informatics based CA model is capable of modeling and giving insights into inflammation dynamics parameters under various mechanical strain/stretch environments.
Li, Libo; Lan, Xiaolin
2016-12-01
To assess the relationship between hepatitis B virus (HBV), hepatitis C virus (HCV), and HBV/HCV double infection and hepatocellular carcinoma risk in Chinese population. The databases of PubMed and CNKI were electronic searched by reviewers according to the searching words of HBV, HCV, and hepatocellular carcinoma. The related case-control studies or cohort studies were included. The association between virus infection and hepatocellular carcinoma risk was demonstrated by odds ratio (OR) and 95% confidence interval (95% CI). The data were pooled by fixed or random effects model according to the statistical heterogeneity. The publication bias was assessed by Begg's funnel plot and Egger's linear regression test. Finally, 13 publications were included in this meta-analysis. For significant statistical heterogeneity (I2 = 99.8%,P = 0.00), the OR was pooled by random effects model. The pooled results showed that HBV infection can significantly increase the risk of developing hepatocellular carcinoma (OR = 58.01, 95% CI: 44.27-71.75); statistical heterogeneity analysis showed that significant heterogeneity existed in evaluation of HCV infection and hepatocellular carcinoma risk across the included 13 studies I2 = 77.78%, P = 0.00). The OR was pooled by random effects model. The pooled results showed that HCV infection can significantly increase the risk of developing hepatocellular carcinoma (OR = 2.34, 95% CI: 1.20-3.47); significant heterogeneity did not exist in evaluation HBV/HCV double infection and hepatocellular carcinoma risk for the included 13 studies (I2 = 0.00%,P = 0.80). The OR was pooled by fixed effects model. The pooled results showed that HBV/HCV double infection can significantly increase the risk of developing hepatocellular carcinoma (OR = 11.39, 95% CI: 4.58-18.20). No publication bias was found in the aspects of HBV, HCV, and HBV/HCV double infection and hepatocellular carcinoma. For Chinese population, HBV, HCV or HBV/HCV double infection can significantly increase the risk of developing hepatocellular carcinoma.
Pressure balance inconsistency exhibited in a statistical model of magnetospheric plasma
NASA Astrophysics Data System (ADS)
Garner, T. W.; Wolf, R. A.; Spiro, R. W.; Thomsen, M. F.; Korth, H.
2003-08-01
While quantitative theories of plasma flow from the magnetotail to the inner magnetosphere typically assume adiabatic convection, it has long been understood that these convection models tend to overestimate the plasma pressure in the inner magnetosphere. This phenomenon is called the pressure crisis or the pressure balance inconsistency. In order to analyze it in a new and more detailed manner we utilize an empirical model of the proton and electron distribution functions in the near-Earth plasma sheet (-50 RE < X < -10 RE), which uses the [1989] magnetic field model and a plasma sheet representation based upon several previously published statistical studies. We compare our results to a statistically derived particle distribution function at geosynchronous orbit. In this analysis the particle distribution function is characterized by the isotropic energy invariant λ = EV2/3, where E is the particle's kinetic energy and V is the magnetic flux tube volume. The energy invariant is conserved in guiding center drift under the assumption of strong, elastic pitch angle scattering. If, in addition, loss is negligible, the phase space density f(λ) is also conserved along the same path. The statistical model indicates that f(λ, ?) is approximately independent of X for X ≤ -35 RE but decreases with increasing X for X ≥ -35 RE. The tailward gradient of f(λ, ?) might be attributed to gradient/curvature drift for large isotropic energy invariants but not for small invariants. The tailward gradient of the distribution function indicates a violation of the adiabatic drift condition in the plasma sheet. It also confirms the existence of a "number crisis" in addition to the pressure crisis. In addition, plasma sheet pressure gradients, when crossed with the gradient of flux tube volume computed from the [1989] magnetic field model, indicate Region 1 currents on the dawn and dusk sides of the outer plasma sheet.
Freezing transition of the random bond RNA model: Statistical properties of the pairing weights
NASA Astrophysics Data System (ADS)
Monthus, Cécile; Garel, Thomas
2007-03-01
To characterize the pairing specificity of RNA secondary structures as a function of temperature, we analyze the statistics of the pairing weights as follows: for each base (i) of the sequence of length N , we consider the (N-1) pairing weights wi(j) with the other bases (j≠i) of the sequence. We numerically compute the probability distributions P1(w) of the maximal weight wimax=maxj[wi(j)] , the probability distribution Π(Y2) of the parameter Y2(i)=∑jwi2(j) , as well as the average values of the moments Yk(i)=∑jwik(j) . We find that there are two important temperatures Tc
Nonlinear Structured Growth Mixture Models in Mplus and OpenMx
Grimm, Kevin J.; Ram, Nilam; Estabrook, Ryne
2014-01-01
Growth mixture models (GMMs; Muthén & Muthén, 2000; Muthén & Shedden, 1999) are a combination of latent curve models (LCMs) and finite mixture models to examine the existence of latent classes that follow distinct developmental patterns. GMMs are often fit with linear, latent basis, multiphase, or polynomial change models because of their common use, flexibility in modeling many types of change patterns, the availability of statistical programs to fit such models, and the ease of programming. In this paper, we present additional ways of modeling nonlinear change patterns with GMMs. Specifically, we show how LCMs that follow specific nonlinear functions can be extended to examine the presence of multiple latent classes using the Mplus and OpenMx computer programs. These models are fit to longitudinal reading data from the Early Childhood Longitudinal Study-Kindergarten Cohort to illustrate their use. PMID:25419006
LD-SPatt: large deviations statistics for patterns on Markov chains.
Nuel, G
2004-01-01
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.
A New Framework for Cumulus Parametrization - A CPT in action
NASA Astrophysics Data System (ADS)
Jakob, C.; Peters, K.; Protat, A.; Kumar, V.
2016-12-01
The representation of convection in climate model remains a major Achilles Heel in our pursuit of better predictions of global and regional climate. The basic principle underpinning the parametrisation of tropical convection in global weather and climate models is that there exist discernible interactions between the resolved model scale and the parametrised cumulus scale. Furthermore, there must be at least some predictive power in the larger scales for the statistical behaviour on small scales for us to be able to formally close the parametrised equations. The presentation will discuss a new framework for cumulus parametrisation based on the idea of separating the prediction of cloud area from that of velocity. This idea is put into practice by combining an existing multi-scale stochastic cloud model with observations to arrive at the prediction of the area fraction for deep precipitating convection. Using mid-tropospheric humidity and vertical motion as predictors, the model is shown to reproduce the observed behaviour of both mean and variability of deep convective area fraction well. The framework allows for the inclusion of convective organisation and can - in principle - be made resolution-aware or resolution-independent. When combined with simple assumptions about cloud-base vertical motion the model can be used as a closure assumption in any existing cumulus parametrisation. Results of applying this idea in the the ECHAM model indicate significant improvements in the simulation of tropical variability, including but not limited to the MJO. This presentation will highlight how the close collaboration of the observational, theoretical and model development community in the spirit of the climate process teams can lead to significant progress in long-standing issues in climate modelling while preserving the freedom of individual groups in pursuing their specific implementation of an agreed framework.
Cox, Tony; Popken, Douglas; Ricci, Paolo F
2013-01-01
Exposures to fine particulate matter (PM2.5) in air (C) have been suspected of contributing causally to increased acute (e.g., same-day or next-day) human mortality rates (R). We tested this causal hypothesis in 100 United States cities using the publicly available NMMAPS database. Although a significant, approximately linear, statistical C-R association exists in simple statistical models, closer analysis suggests that it is not causal. Surprisingly, conditioning on other variables that have been extensively considered in previous analyses (usually using splines or other smoothers to approximate their effects), such as month of the year and mean daily temperature, suggests that they create strong, nonlinear confounding that explains the statistical association between PM2.5 and mortality rates in this data set. As this finding disagrees with conventional wisdom, we apply several different techniques to examine it. Conditional independence tests for potential causation, non-parametric classification tree analysis, Bayesian Model Averaging (BMA), and Granger-Sims causality testing, show no evidence that PM2.5 concentrations have any causal impact on increasing mortality rates. This apparent absence of a causal C-R relation, despite their statistical association, has potentially important implications for managing and communicating the uncertain health risks associated with, but not necessarily caused by, PM2.5 exposures. PMID:23983662
Eaton, John E; Vesterhus, Mette; McCauley, Bryan M; Atkinson, Elizabeth J; Schlicht, Erik M; Juran, Brian D; Gossard, Andrea A; LaRusso, Nicholas F; Gores, Gregory J; Karlsen, Tom H; Lazaridis, Konstantinos N
2018-05-09
Improved methods are needed to risk stratify and predict outcomes in patients with primary sclerosing cholangitis (PSC). Therefore, we sought to derive and validate a new prediction model and compare its performance to existing surrogate markers. The model was derived using 509 subjects from a multicenter North American cohort and validated in an international multicenter cohort (n=278). Gradient boosting, a machine based learning technique, was used to create the model. The endpoint was hepatic decompensation (ascites, variceal hemorrhage or encephalopathy). Subjects with advanced PSC or cholangiocarcinoma at baseline were excluded. The PSC risk estimate tool (PREsTo) consists of 9 variables: bilirubin, albumin, serum alkaline phosphatase (SAP) times the upper limit of normal (ULN), platelets, AST, hemoglobin, sodium, patient age and the number of years since PSC was diagnosed. Validation in an independent cohort confirms PREsTo accurately predicts decompensation (C statistic 0.90, 95% confidence interval (CI) 0.84-0.95) and performed well compared to MELD score (C statistic 0.72, 95% CI 0.57-0.84), Mayo PSC risk score (C statistic 0.85, 95% CI 0.77-0.92) and SAP < 1.5x ULN (C statistic 0.65, 95% CI 0.55-0.73). PREsTo continued to be accurate among individuals with a bilirubin < 2.0 mg/dL (C statistic 0.90, 95% CI 0.82-0.96) and when the score was re-applied at a later course in the disease (C statistic 0.82, 95% CI 0.64-0.95). PREsTo accurately predicts hepatic decompensation in PSC and exceeds the performance among other widely available, noninvasive prognostic scoring systems. This article is protected by copyright. All rights reserved. © 2018 by the American Association for the Study of Liver Diseases.
NASA Astrophysics Data System (ADS)
Posner, A. J.
2017-12-01
The Middle Rio Grande River (MRG) traverses New Mexico from Cochiti to Elephant Butte reservoirs. Since the 1100s, cultivating and inhabiting the valley of this alluvial river has required various river training works. The mid-20th century saw a concerted effort to tame the river through channelization, Jetty Jacks, and dam construction. A challenge for river managers is to better understand the interactions between a river training works, dam construction, and the geomorphic adjustments of a desert river driven by spring snowmelt and summer thunderstorms carrying water and large sediment inputs from upstream and ephemeral tributaries. Due to its importance to the region, a vast wealth of data exists for conditions along the MRG. The investigation presented herein builds upon previous efforts by combining hydraulic model results, digitized planforms, and stream gage records in various statistical and conceptual models in order to test our understanding of this complex system. Spatially continuous variables were clipped by a set of river cross section data that is collected at decadal intervals since the early 1960s, creating a spatially homogenous database upon which various statistical testing was implemented. Conceptual models relate forcing variables and response variables to estimate river planform changes. The developed database, represents a unique opportunity to quantify and test geomorphic conceptual models in the unique characteristics of the MRG. The results of this investigation provides a spatially distributed characterization of planform variable changes, permitting managers to predict planform at a much higher resolution than previously available, and a better understanding of the relationship between flow regime and planform changes such as changes to longitudinal slope, sinuosity, and width. Lastly, data analysis and model interpretation led to the development of a new conceptual model for the impact of ephemeral tributaries in alluvial rivers.
scoringRules - A software package for probabilistic model evaluation
NASA Astrophysics Data System (ADS)
Lerch, Sebastian; Jordan, Alexander; Krüger, Fabian
2016-04-01
Models in the geosciences are generally surrounded by uncertainty, and being able to quantify this uncertainty is key to good decision making. Accordingly, probabilistic forecasts in the form of predictive distributions have become popular over the last decades. With the proliferation of probabilistic models arises the need for decision theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way. Various scoring rules have been developed over the past decades to address this demand. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. As such, they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This poster presents the software package scoringRules for the statistical programming language R, which contains functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. Two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, Bayesian forecasts produced via Markov Chain Monte Carlo take this form. Thereby, the scoringRules package provides a framework for generalized model evaluation that both includes Bayesian as well as classical parametric models. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices.
NASA Astrophysics Data System (ADS)
Collow, Thomas W.; Wang, Wanqiu; Kumar, Arun; Zhang, Jinlun
2017-09-01
The capability of a numerical model to simulate the statistical characteristics of the summer sea ice date of retreat (DOR) and the winter date of advance (DOA) is investigated using sea ice concentration output from the Climate Forecast System Version 2 model (CFSv2). Two model configurations are tested, the operational setting (CFSv2CFSR) which uses initial data from the Climate Forecast System Reanalysis, and a modified version (CFSv2PIOMp) which ingests sea ice thickness initialization data from the Pan-Arctic Ice Ocean Modeling and Assimilation System (PIOMAS) and includes physics modifications for a more realistic representation of heat fluxes at the sea ice top and bottom. First, a method to define DOR and DOA is presented. Then, DOR and DOA are determined from the model simulations and observational sea ice concentration from the National Aeronautics and Space Administration (NASA). Means, trends, and detrended standard deviations of DOR and DOA are compared, along with DOR/DOA rates in the Arctic Ocean. It is found that the statistics are generally similar between the model and observations, although some regional biases exist. In addition, regions of new ice retreat in recent years are represented well in CFSv2PIOMp over the Arctic Ocean, in terms of both spatial extent and timing. Overall, CFSv2PIOMp shows a reduction in error throughout the Arctic. Based on results, it is concluded that the model produces a reasonable representation of the climatology and variability statistics of DOR and DOA in most regions. This assessment serves as a prerequisite for future predictability experiments.
External validation of preexisting first trimester preeclampsia prediction models.
Allen, Rebecca E; Zamora, Javier; Arroyo-Manzano, David; Velauthar, Luxmilar; Allotey, John; Thangaratinam, Shakila; Aquilina, Joseph
2017-10-01
To validate the increasing number of prognostic models being developed for preeclampsia using our own prospective study. A systematic review of literature that assessed biomarkers, uterine artery Doppler and maternal characteristics in the first trimester for the prediction of preeclampsia was performed and models selected based on predefined criteria. Validation was performed by applying the regression coefficients that were published in the different derivation studies to our cohort. We assessed the models discrimination ability and calibration. Twenty models were identified for validation. The discrimination ability observed in derivation studies (Area Under the Curves) ranged from 0.70 to 0.96 when these models were validated against the validation cohort, these AUC varied importantly, ranging from 0.504 to 0.833. Comparing Area Under the Curves obtained in the derivation study to those in the validation cohort we found statistically significant differences in several studies. There currently isn't a definitive prediction model with adequate ability to discriminate for preeclampsia, which performs as well when applied to a different population and can differentiate well between the highest and lowest risk groups within the tested population. The pre-existing large number of models limits the value of further model development and future research should be focussed on further attempts to validate existing models and assessing whether implementation of these improves patient care. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Tracking signal test to monitor an intelligent time series forecasting model
NASA Astrophysics Data System (ADS)
Deng, Yan; Jaraiedi, Majid; Iskander, Wafik H.
2004-03-01
Extensive research has been conducted on the subject of Intelligent Time Series forecasting, including many variations on the use of neural networks. However, investigation of model adequacy over time, after the training processes is completed, remains to be fully explored. In this paper we demonstrate a how a smoothed error tracking signals test can be incorporated into a neuro-fuzzy model to monitor the forecasting process and as a statistical measure for keeping the forecasting model up-to-date. The proposed monitoring procedure is effective in the detection of nonrandom changes, due to model inadequacy or lack of unbiasedness in the estimation of model parameters and deviations from the existing patterns. This powerful detection device will result in improved forecast accuracy in the long run. An example data set has been used to demonstrate the application of the proposed method.
Efficient statistical mapping of avian count data
Royle, J. Andrew; Wikle, C.K.
2005-01-01
We develop a spatial modeling framework for count data that is efficient to implement in high-dimensional prediction problems. We consider spectral parameterizations for the spatially varying mean of a Poisson model. The spectral parameterization of the spatial process is very computationally efficient, enabling effective estimation and prediction in large problems using Markov chain Monte Carlo techniques. We apply this model to creating avian relative abundance maps from North American Breeding Bird Survey (BBS) data. Variation in the ability of observers to count birds is modeled as spatially independent noise, resulting in over-dispersion relative to the Poisson assumption. This approach represents an improvement over existing approaches used for spatial modeling of BBS data which are either inefficient for continental scale modeling and prediction or fail to accommodate important distributional features of count data thus leading to inaccurate accounting of prediction uncertainty.
Adaptation of clinical prediction models for application in local settings.
Kappen, Teus H; Vergouwe, Yvonne; van Klei, Wilton A; van Wolfswinkel, Leo; Kalkman, Cor J; Moons, Karel G M
2012-01-01
When planning to use a validated prediction model in new patients, adequate performance is not guaranteed. For example, changes in clinical practice over time or a different case mix than the original validation population may result in inaccurate risk predictions. To demonstrate how clinical information can direct updating a prediction model and development of a strategy for handling missing predictor values in clinical practice. A previously derived and validated prediction model for postoperative nausea and vomiting was updated using a data set of 1847 patients. The update consisted of 1) changing the definition of an existing predictor, 2) reestimating the regression coefficient of a predictor, and 3) adding a new predictor to the model. The updated model was then validated in a new series of 3822 patients. Furthermore, several imputation models were considered to handle real-time missing values, so that possible missing predictor values could be anticipated during actual model use. Differences in clinical practice between our local population and the original derivation population guided the update strategy of the prediction model. The predictive accuracy of the updated model was better (c statistic, 0.68; calibration slope, 1.0) than the original model (c statistic, 0.62; calibration slope, 0.57). Inclusion of logistical variables in the imputation models, besides observed patient characteristics, contributed to a strategy to deal with missing predictor values at the time of risk calculation. Extensive knowledge of local, clinical processes provides crucial information to guide the process of adapting a prediction model to new clinical practices.
Accelerated Seismic Release and Related Aspects of Seismicity Patterns on Earthquake Faults
NASA Astrophysics Data System (ADS)
Ben-Zion, Y.; Lyakhovsky, V.
Observational studies indicate that large earthquakes are sometimes preceded by phases of accelerated seismic release (ASR) characterized by cumulative Benioff strain following a power law time-to-failure relation with a term (tf-t)m, where tf is the failure time of the large event and observed values of m are close to 0.3. We discuss properties of ASR and related aspects of seismicity patterns associated with several theoretical frameworks. The subcritical crack growth approach developed to describe deformation on a crack prior to the occurrence of dynamic rupture predicts great variability and low asymptotic values of the exponent m that are not compatible with observed ASR phases. Statistical physics studies assuming that system-size failures in a deforming region correspond to critical phase transitions predict establishment of long-range correlations of dynamic variables and power-law statistics before large events. Using stress and earthquake histories simulated by the model of Ben-Zion (1996) for a discrete fault with quenched heterogeneities in a 3-D elastic half space, we show that large model earthquakes are associated with nonrepeating cyclical establishment and destruction of long-range stress correlations, accompanied by nonstationary cumulative Benioff strain release. We then analyze results associated with a regional lithospheric model consisting of a seismogenic upper crust governed by the damage rheology of Lyakhovskyet al. (1997) over a viscoelastic substrate. We demonstrate analytically for a simplified 1-D case that the employed damage rheology leads to a singular power-law equation for strain proportional to (tf-t)-1/3, and a nonsingular power-law relation for cumulative Benioff strain proportional to (tf-t)1/3. A simple approximate generalization of the latter for regional cumulative Benioff strain is obtained by adding to the result a linear function of time representing a stationary background release. To go beyond the analytical expectations, we examine results generated by various realizations of the regional lithospheric model producing seismicity following the characteristic frequency-size statistics, Gutenberg-Richter power-law distribution, and mode switching activity. We find that phases of ASR exist only when the seismicity preceding a given large event has broad frequency-size statistics. In such cases the simulated ASR phases can be fitted well by the singular analytical relation with m = -1/3, the nonsingular equation with m = 0.2, and the generalized version of the latter including a linear term with m = 1/3. The obtained good fits with all three relations highlight the difficulty of deriving reliable information on functional forms and parameter values from such data sets. The activation process in the simulated ASR phases is found to be accommodated both by increasing rates of moderate events and increasing average event size, with the former starting a few years earlier than the latter. The lack of ASR in portions of the seismicity not having broad frequency-size statistics may explain why some large earthquakes are preceded by ASR and other are not. The results suggest that observations of moderate and large events contain two complementary end-member predictive signals on the time of future large earthquakes. In portions of seismicity following the characteristic earthquake distribution, such information exists directly in the associated quasi-periodic temporal distribution of large events. In portions of seismicity having broad frequency-size statistics with random or clustered temporal distribution of large events, the ASR phases have predictive information. The extent to which natural seismicity may be understood in terms of these end-member cases remains to be clarified. Continuing studies of evolving stress and other dynamic variables in model calculations combined with advanced analyses of simulated and observed seismicity patterns may lead to improvements in existing forecasting strategies.
Steiner, John F.; Ho, P. Michael; Beaty, Brenda L.; Dickinson, L. Miriam; Hanratty, Rebecca; Zeng, Chan; Tavel, Heather M.; Havranek, Edward P.; Davidson, Arthur J.; Magid, David J.; Estacio, Raymond O.
2009-01-01
Background Although many studies have identified patient characteristics or chronic diseases associated with medication adherence, the clinical utility of such predictors has rarely been assessed. We attempted to develop clinical prediction rules for adherence with antihypertensive medications in two health care delivery systems. Methods and Results Retrospective cohort studies of hypertension registries in an inner-city health care delivery system (N = 17176) and a health maintenance organization (N = 94297) in Denver, Colorado. Adherence was defined by acquisition of 80% or more of antihypertensive medications. A multivariable model in the inner-city system found that adherent patients (36.3% of the total) were more likely than non-adherent patients to be older, white, married, and acculturated in US society, to have diabetes or cerebrovascular disease, not to abuse alcohol or controlled substances, and to be prescribed less than three antihypertensive medications. Although statistically significant, all multivariate odds ratios were 1.7 or less, and the model did not accurately discriminate adherent from non-adherent patients (C-statistic = 0.606). In the health maintenance organization, where 72.1% of patients were adherent, significant but weak associations existed between adherence and older age, white race, the lack of alcohol abuse, and fewer antihypertensive medications. The multivariate model again failed to accurately discriminate adherent from non-adherent individuals (C-statistic = 0.576). Conclusions Although certain socio-demographic characteristics or clinical diagnoses are statistically associated with adherence to refills of antihypertensive medications, a combination of these characteristics is not sufficiently accurate to allow clinicians to predict whether their patients will be adherent with treatment. PMID:20031876
NASA Astrophysics Data System (ADS)
Dallmann, N. A.; Carlsten, B. E.; Stonehill, L. C.
2017-12-01
Orbiting nuclear spectrometers have contributed significantly to our understanding of the composition of solar system bodies. Gamma rays and neutrons are produced within the surfaces of bodies by impacting galactic cosmic rays (GCR) and by intrinsic radionuclide decay. Measuring the flux and energy spectrum of these products at one point in an orbit elucidates the elemental content of the area in view. Deconvolution of measurements from many spatially registered orbit points can produce detailed maps of elemental abundances. In applying these well-established techniques to small and irregularly shaped bodies like Phobos, one encounters unique challenges beyond those of a large spheroid. Polar mapping orbits are not possible for Phobos and quasistatic orbits will realize only modest inclinations unavoidably limiting surface coverage and creating North-South ambiguities in deconvolution. The irregular shape causes self-shadowing both of the body to the spectrometer but also of the body to the incoming GCR. The view angle to the surface normal as well as the distance between the surface and the spectrometer is highly irregular. These characteristics can be synthesized into a complicated and continuously changing measurement system point spread function. We have begun to explore different model-based, statistically rigorous, iterative deconvolution methods to produce elemental abundance maps for a proposed future investigation of Phobos. By incorporating the satellite orbit, the existing high accuracy shape-models of Phobos, and the spectrometer response function, a detailed and accurate system model can be constructed. Many aspects of this model formation are particularly well suited to modern graphics processing techniques and parallel processing. We will present the current status and preliminary visualizations of the Phobos measurement system model. We will also discuss different deconvolution strategies and their relative merit in statistical rigor, stability, achievable resolution, and exploitation of the irregular shape to partially resolve ambiguities. The general applicability of these new approaches to existing data sets from Mars, Mercury, and Lunar investigations will be noted.
41 CFR 105-50.202-1 - Copies of statistical or other studies.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 41 Public Contracts and Property Management 3 2011-01-01 2011-01-01 false Copies of statistical or... Services Administration § 105-50.202-1 Copies of statistical or other studies. This material includes a copy of any existing statistical or other studies and compilations, results of technical tests and...
Toward "Constructing" the Concept of Statistical Power: An Optical Analogy.
ERIC Educational Resources Information Center
Rogers, Bruce G.
This paper presents a visual analogy that may be used by instructors to teach the concept of statistical power in statistical courses. Statistical power is mathematically defined as the probability of rejecting a null hypothesis when that null is false, or, equivalently, the probability of detecting a relationship when it exists. The analogy…
41 CFR 105-50.202-1 - Copies of statistical or other studies.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 41 Public Contracts and Property Management 3 2010-07-01 2010-07-01 false Copies of statistical or... Services Administration § 105-50.202-1 Copies of statistical or other studies. This material includes a copy of any existing statistical or other studies and compilations, results of technical tests and...
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
NASA Astrophysics Data System (ADS)
Zhou, J.; Li, G.; Liu, S.; Zhan, W.; Zhang, X.
2015-12-01
At present land surface temperatures (LSTs) can be generated from thermal infrared remote sensing with spatial resolutions from ~100 m to tens of kilometers. However, LSTs with high spatial resolution, e.g. tens of meters, are still lack. The purpose of LST downscaling is to generate LSTs with finer spatial resolutions than their native spatial resolutions. The statistical linear or nonlinear regression models are most frequently used for LST downscaling. The basic assumption of these models is the scale-invariant relationships between LST and its descriptors, which is questioned but rare researches have been reported. In addition, few researches can be found for downscaling satellite LST or TIR data to a high spatial resolution, i.e. better than 100 m or even finer. The lack of LST with high spatial resolution cannot satisfy the requirements of applications such as evapotranspiration mapping at the field scale. By selecting a dynamically developing agricultural oasis as the study area, the aim of this study is to downscale the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) LSTs to 15 m, to satisfy the requirement of evapotranspiration mapping at the field scale. Twelve ASTER images from May to September in 2012, covering the entire growth stage of maize, were selected. Four statistical models were evaluated, including one global model, one piecewise model, and two local models. The influence from scale effect in downscaling LST was quantified. The downscaled LSTs are evaluated from accuracy and image quality. Results demonstrate that the influence from scale effect varies according to models and the maize growth stage. Significant influence about -4 K to 6 K existed at the early stage and weaker influence existed in the middle stage. When compared with the ground measured LSTs, the downscaled LSTs resulted from the global and local models yielded higher accuracies and better image qualities than the local models. In addition to the vegetation indices, the surface albedo is an important descriptor for downscaling LST through explaining its spatial variation induced by soil moisture.
Strong gravitational lensing statistics as a test of cosmogonic scenarios
NASA Technical Reports Server (NTRS)
Cen, Renyue; Gott, J. Richard, III; Ostriker, Jeremiah P.; Turner, Edwin L.
1994-01-01
Gravitational lensing statistics can provide a direct and powerful test of cosmic structure formation theories. Since lensing tests, directly, the magnitude of the nonlinear mass density fluctuations on lines of sight to distant objects, no issues of 'bias' (of mass fluctuations with respect to galaxy density fluctuations) exist here, although lensing observations provide their own ambiguities of interpretation. We develop numerical techniques for generating model density distributions with the very large spatial dynamic range required by lensing considerations and for identifying regions of the simulations capable of multiple image lensing in a conservative and computationally efficient way that should be accurate for splittings significantly larger than 3 seconds. Applying these techniques to existing standard Cold dark matter (CDM) (Omega = 1) and Primeval Baryon Isocurvature (PBI) (Omega = 0.2) simulations (normalized to the Cosmic Background Explorer Satellite (COBE) amplitude), we find that the CDM model predicts large splitting (greater than 8 seconds) lensing events roughly an order-of-magnitude more frequently than the PBI model. Under the reasonable but idealized assumption that lensing structrues can be modeled as singular isothermal spheres (SIS), the predictions can be directly compared to observations of lensing events in quasar samples. Several large splitting (Delta Theta is greater than 8 seconds) cases are predicted in the standard CDM model (the exact number being dependent on the treatment of amplification bias), whereas none is observed. In a formal sense, the comparison excludes the CDM model at high confidence (essentially for the same reason that CDM predicts excessive small-scale cosmic velocity dispersions.) A very rough assessment of low-density but flat CDM model (Omega = 0.3, Lambda/3H(sup 2 sub 0) = 0.7) indicates a far lower and probably acceptable level of lensing. The PBI model is consistent with, but not strongly tested by, the available lensing data, and other open models would presumably do as well as PBI. These preliminary conclusions and the assumptions on which they are based can be tested and the analysis can be applied to other cosmogonic models by straightforward extension of the work presented here.
Exploring Model Error through Post-processing and an Ensemble Kalman Filter on Fire Weather Days
NASA Astrophysics Data System (ADS)
Erickson, Michael J.
The proliferation of coupling atmospheric ensemble data to models in other related fields requires a priori knowledge of atmospheric ensemble biases specific to the desired application. In that spirit, this dissertation focuses on elucidating atmospheric ensemble model bias and error through a variety of different methods specific to fire weather days (FWDs) over the Northeast United States (NEUS). Other than a handful of studies that use models to predict fire indices for single fire seasons (Molders 2008, Simpson et al. 2014), an extensive exploration of model performance specific to FWDs has not been attempted. Two unique definitions for FWDs are proposed; one that uses pre-existing fire indices (FWD1) and another from a new statistical fire weather index (FWD2) relating fire occurrence and near-surface meteorological observations. Ensemble model verification reveals FWDs to have warmer (> 1 K), moister (~ 0.4 g kg-1) and less windy (~ 1 m s-1) biases than the climatological average for both FWD1 and FWD2. These biases are not restricted to the near surface but exist through the entirety of the planetary boundary layer (PBL). Furthermore, post-processing methods are more effective when previous FWDs are incorporated into the statistical training, suggesting that model bias could be related to the synoptic flow pattern. An Ensemble Kalman Filter (EnKF) is used to explore the effectiveness of data assimilation during a period of extensive FWDs in April 2012. Model biases develop rapidly on FWDs, consistent with the FWD1 and FWD2 verification. However, the EnKF is effective at removing most biases for temperature, wind speed and specific humidity. Potential sources of error in the parameterized physics of the PBL are explored by rerunning the EnKF with simultaneous state and parameter estimation (SSPE) for two relevant parameters within the ACM2 PBL scheme. SSPE helps to reduce the cool temperature bias near the surface on FWDs, with the variability in parameter estimates exhibiting some relationship to model bias for temperature. This suggests the potential for structural model error within the ACM2 PBL scheme and could lead toward the future development of improved PBL parameterizations.
Entraining IDyOT: Timing in the Information Dynamics of Thinking
Forth, Jamie; Agres, Kat; Purver, Matthew; Wiggins, Geraint A.
2016-01-01
We present a novel hypothetical account of entrainment in music and language, in context of the Information Dynamics of Thinking model, IDyOT. The extended model affords an alternative view of entrainment, and its companion term, pulse, from earlier accounts. The model is based on hierarchical, statistical prediction, modeling expectations of both what an event will be and when it will happen. As such, it constitutes a kind of predictive coding, with a particular novel hypothetical implementation. Here, we focus on the model's mechanism for predicting when a perceptual event will happen, given an existing sequence of past events, which may be musical or linguistic. We propose a range of tests to validate or falsify the model, at various different levels of abstraction, and argue that computational modeling in general, and this model in particular, can offer a means of providing limited but useful evidence for evolutionary hypotheses. PMID:27803682
2013-01-01
Background As a result of changes in climatic conditions and greater resistance to insecticides, many regions across the globe, including Colombia, have been facing a resurgence of vector-borne diseases, and dengue fever in particular. Timely information on both (1) the spatial distribution of the disease, and (2) prevailing vulnerabilities of the population are needed to adequately plan targeted preventive intervention. We propose a methodology for the spatial assessment of current socioeconomic vulnerabilities to dengue fever in Cali, a tropical urban environment of Colombia. Methods Based on a set of socioeconomic and demographic indicators derived from census data and ancillary geospatial datasets, we develop a spatial approach for both expert-based and purely statistical-based modeling of current vulnerability levels across 340 neighborhoods of the city using a Geographic Information System (GIS). The results of both approaches are comparatively evaluated by means of spatial statistics. A web-based approach is proposed to facilitate the visualization and the dissemination of the output vulnerability index to the community. Results The statistical and the expert-based modeling approach exhibit a high concordance, globally, and spatially. The expert-based approach indicates a slightly higher vulnerability mean (0.53) and vulnerability median (0.56) across all neighborhoods, compared to the purely statistical approach (mean = 0.48; median = 0.49). Both approaches reveal that high values of vulnerability tend to cluster in the eastern, north-eastern, and western part of the city. These are poor neighborhoods with high percentages of young (i.e., < 15 years) and illiterate residents, as well as a high proportion of individuals being either unemployed or doing housework. Conclusions Both modeling approaches reveal similar outputs, indicating that in the absence of local expertise, statistical approaches could be used, with caution. By decomposing identified vulnerability “hotspots” into their underlying factors, our approach provides valuable information on both (1) the location of neighborhoods, and (2) vulnerability factors that should be given priority in the context of targeted intervention strategies. The results support decision makers to allocate resources in a manner that may reduce existing susceptibilities and strengthen resilience, and thus help to reduce the burden of vector-borne diseases. PMID:23945265
NASA Astrophysics Data System (ADS)
Del Rio Amador, Lenin; Lovejoy, Shaun
2017-04-01
Over the past ten years, a key advance in our understanding of atmospheric variability is the discovery that between the weather and climate regime lies an intermediate "macroweather" regime, spanning the range of scales from ≈10 days to ≈30 years. Macroweather statistics are characterized by two fundamental symmetries: scaling and the factorization of the joint space-time statistics. In the time domain, the scaling has low intermittency with the additional property that successive fluctuations tend to cancel. In space, on the contrary the scaling has high (multifractal) intermittency corresponding to the existence of different climate zones. These properties have fundamental implications for macroweather forecasting: a) the temporal scaling implies that the system has a long range memory that can be exploited for forecasting; b) the low temporal intermittency implies that mathematically well-established (Gaussian) forecasting techniques can be used; and c), the statistical factorization property implies that although spatial correlations (including teleconnections) may be large, if long enough time series are available, they are not necessarily useful in improving forecasts. Theoretically, these conditions imply the existence of stochastic predictability limits in our talk, we show that these limits apply to GCM's. Based on these statistical implications, we developed the Stochastic Seasonal and Interannual Prediction System (StocSIPS) for the prediction of temperature from regional to global scales and from one month to many years horizons. One of the main components of StocSIPS is the separation and prediction of both the internal and externally forced variabilities. In order to test the theoretical assumptions and consequences for predictability and predictions, we use 41 different CMIP5 model outputs from preindustrial control runs that have fixed external forcings: whose variability is purely internally generated. We first show that these statistical assumptions hold with relatively good accuracy and then we performed hindcasts at global and regional scales from monthly to annual time resolutions using StocSIPS. We obtained excellent agreement between the hindcast Mean Square Skill Score (MSSS) and the theoretical stochastic limits. We also show the application of StocSIPS to the prediction of average global temperature and compare our results with those obtained using multi-model ensemble approaches. StocSIPS has numerous advantages including a) higher MSSS for large time horizons, b) the from convergence to the real - not model - climate, c) much higher computational speed, d) no need for data assimilation, e) no ad hoc post processing and f) no need for downscaling.
Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A
2011-08-15
Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Walz, M. A.; Donat, M.; Leckebusch, G. C.
2017-12-01
As extreme wind speeds are responsible for large socio-economic losses in Europe, a skillful prediction would be of great benefit for disaster prevention as well as for the actuarial community. Here we evaluate patterns of large-scale atmospheric variability and the seasonal predictability of extreme wind speeds (e.g. >95th percentile) in the European domain in the dynamical seasonal forecast system ECMWF System 4, and compare to the predictability based on a statistical prediction model. The dominant patterns of atmospheric variability show distinct differences between reanalysis and ECMWF System 4, with most patterns in System 4 extended downstream in comparison to ERA-Interim. The dissimilar manifestations of the patterns within the two models lead to substantially different drivers associated with the occurrence of extreme winds in the respective model. While the ECMWF System 4 is shown to provide some predictive power over Scandinavia and the eastern Atlantic, only very few grid cells in the European domain have significant correlations for extreme wind speeds in System 4 compared to ERA-Interim. In contrast, a statistical model predicts extreme wind speeds during boreal winter in better agreement with the observations. Our results suggest that System 4 does not seem to capture the potential predictability of extreme winds that exists in the real world, and therefore fails to provide reliable seasonal predictions for lead months 2-4. This is likely related to the unrealistic representation of large-scale patterns of atmospheric variability. Hence our study points to potential improvements of dynamical prediction skill by improving the simulation of large-scale atmospheric dynamics.
SparRec: An effective matrix completion framework of missing data imputation for GWAS
NASA Astrophysics Data System (ADS)
Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen
2016-10-01
Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
Grand canonical validation of the bipartite international trade network.
Straka, Mika J; Caldarelli, Guido; Saracco, Fabio
2017-08-01
Devising strategies for economic development in a globally competitive landscape requires a solid and unbiased understanding of countries' technological advancements and similarities among export products. Both can be addressed through the bipartite representation of the International Trade Network. In this paper, we apply the recently proposed grand canonical projection algorithm to uncover country and product communities. Contrary to past endeavors, our methodology, based on information theory, creates monopartite projections in an unbiased and analytically tractable way. Single links between countries or products represent statistically significant signals, which are not accounted for by null models such as the bipartite configuration model. We find stable country communities reflecting the socioeconomic distinction in developed, newly industrialized, and developing countries. Furthermore, we observe product clusters based on the aforementioned country groups. Our analysis reveals the existence of a complicated structure in the bipartite International Trade Network: apart from the diversification of export baskets from the most basic to the most exclusive products, we observe a statistically significant signal of an export specialization mechanism towards more sophisticated products.
Grand canonical validation of the bipartite international trade network
NASA Astrophysics Data System (ADS)
Straka, Mika J.; Caldarelli, Guido; Saracco, Fabio
2017-08-01
Devising strategies for economic development in a globally competitive landscape requires a solid and unbiased understanding of countries' technological advancements and similarities among export products. Both can be addressed through the bipartite representation of the International Trade Network. In this paper, we apply the recently proposed grand canonical projection algorithm to uncover country and product communities. Contrary to past endeavors, our methodology, based on information theory, creates monopartite projections in an unbiased and analytically tractable way. Single links between countries or products represent statistically significant signals, which are not accounted for by null models such as the bipartite configuration model. We find stable country communities reflecting the socioeconomic distinction in developed, newly industrialized, and developing countries. Furthermore, we observe product clusters based on the aforementioned country groups. Our analysis reveals the existence of a complicated structure in the bipartite International Trade Network: apart from the diversification of export baskets from the most basic to the most exclusive products, we observe a statistically significant signal of an export specialization mechanism towards more sophisticated products.
The intersection of aggregate-level lead exposure and crime.
Boutwell, Brian B; Nelson, Erik J; Emo, Brett; Vaughn, Michael G; Schootman, Mario; Rosenfeld, Richard; Lewis, Roger
2016-07-01
Childhood lead exposure has been associated with criminal behavior later in life. The current study aimed to analyze the association between elevated blood lead levels (n=59,645) and crime occurrence (n=90,433) across census tracts within St. Louis, Missouri. Longitudinal ecological study. Saint Louis, Missouri. Blood lead levels. Violent, Non-violent, and total crime at the census tract level. Spatial statistical models were used to account for the spatial autocorrelation of the data. Greater lead exposure at the census-tract level was associated with increased violent, non-violent, and total crime. In addition, we examined whether non-additive effects existed in the data by testing for an interaction between lead exposure and concentrated disadvantage. Some evidence of a negative interaction emerged, however, it failed to reach traditional levels of statistical significance (supplementary models, however, revealed a similar negative interaction that was significant). More precise measurements of lead exposure in the aggregate, produced additional evidence that lead is a potent predictor of criminal outcomes. Copyright © 2016 Elsevier Inc. All rights reserved.
Algorithms for constructing optimal paths and statistical analysis of passenger traffic
NASA Astrophysics Data System (ADS)
Trofimov, S. P.; Druzhinina, N. G.; Trofimova, O. G.
2018-01-01
Several existing information systems of urban passenger transport (UPT) are considered. Author’s UPT network model is presented. To a passenger a new service is offered that is the best path from one stop to another stop at a specified time. The algorithm and software implementation for finding the optimal path are presented. The algorithm uses the current UPT schedule. The article also describes the algorithm of statistical analysis of trip payments by the electronic E-cards. The algorithm allows obtaining the density of passenger traffic during the day. This density is independent of the network topology and UPT schedules. The resulting density of the traffic flow can solve a number of practical problems. In particular, the forecast for the overflow of passenger transport in the «rush» hours, the quantitative comparison of different topologies transport networks, constructing of the best UPT timetable. The efficiency of the proposed integrated approach is demonstrated by the example of the model town with arbitrary dimensions.
Statistical analysis of Hasegawa-Wakatani turbulence
NASA Astrophysics Data System (ADS)
Anderson, Johan; Hnat, Bogdan
2017-06-01
Resistive drift wave turbulence is a multipurpose paradigm that can be used to understand transport at the edge of fusion devices. The Hasegawa-Wakatani model captures the essential physics of drift turbulence while retaining the simplicity needed to gain a qualitative understanding of this process. We provide a theoretical interpretation of numerically generated probability density functions (PDFs) of intermittent events in Hasegawa-Wakatani turbulence with enforced equipartition of energy in large scale zonal flows, and small scale drift turbulence. We find that for a wide range of adiabatic index values, the stochastic component representing the small scale turbulent eddies of the flow, obtained from the autoregressive integrated moving average model, exhibits super-diffusive statistics, consistent with intermittent transport. The PDFs of large events (above one standard deviation) are well approximated by the Laplace distribution, while small events often exhibit a Gaussian character. Furthermore, there exists a strong influence of zonal flows, for example, via shearing and then viscous dissipation maintaining a sub-diffusive character of the fluxes.
NASA Astrophysics Data System (ADS)
Chakraborty, Arup
No medical procedure has saved more lives than vaccination. But, today, some pathogens have evolved which have defied successful vaccination using the empirical paradigms pioneered by Pasteur and Jenner. One characteristic of many pathogens for which successful vaccines do not exist is that they present themselves in various guises. HIV is an extreme example because of its high mutability. This highly mutable virus can evade natural or vaccine induced immune responses, often by mutating at multiple sites linked by compensatory interactions. I will describe first how by bringing to bear ideas from statistical physics (e.g., maximum entropy models, Hopfield models, Feynman variational theory) together with in vitro experiments and clinical data, the fitness landscape of HIV is beginning to be defined with explicit account for collective mutational pathways. I will describe how this knowledge can be harnessed for vaccine design. Finally, I will describe how ideas at the intersection of evolutionary biology, immunology, and statistical physics can help guide the design of strategies that may be able to induce broadly neutralizing antibodies.
Lamain-de Ruiter, Marije; Kwee, Anneke; Naaktgeboren, Christiana A; de Groot, Inge; Evers, Inge M; Groenendaal, Floris; Hering, Yolanda R; Huisjes, Anjoke J M; Kirpestein, Cornel; Monincx, Wilma M; Siljee, Jacqueline E; Van 't Zelfde, Annewil; van Oirschot, Charlotte M; Vankan-Buitelaar, Simone A; Vonk, Mariska A A W; Wiegers, Therese A; Zwart, Joost J; Franx, Arie; Moons, Karel G M; Koster, Maria P H
2016-08-30
To perform an external validation and direct comparison of published prognostic models for early prediction of the risk of gestational diabetes mellitus, including predictors applicable in the first trimester of pregnancy. External validation of all published prognostic models in large scale, prospective, multicentre cohort study. 31 independent midwifery practices and six hospitals in the Netherlands. Women recruited in their first trimester (<14 weeks) of pregnancy between December 2012 and January 2014, at their initial prenatal visit. Women with pre-existing diabetes mellitus of any type were excluded. Discrimination of the prognostic models was assessed by the C statistic, and calibration assessed by calibration plots. 3723 women were included for analysis, of whom 181 (4.9%) developed gestational diabetes mellitus in pregnancy. 12 prognostic models for the disorder could be validated in the cohort. C statistics ranged from 0.67 to 0.78. Calibration plots showed that eight of the 12 models were well calibrated. The four models with the highest C statistics included almost all of the following predictors: maternal age, maternal body mass index, history of gestational diabetes mellitus, ethnicity, and family history of diabetes. Prognostic models had a similar performance in a subgroup of nulliparous women only. Decision curve analysis showed that the use of these four models always had a positive net benefit. In this external validation study, most of the published prognostic models for gestational diabetes mellitus show acceptable discrimination and calibration. The four models with the highest discriminative abilities in this study cohort, which also perform well in a subgroup of nulliparous women, are easy models to apply in clinical practice and therefore deserve further evaluation regarding their clinical impact. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Universality in chaos: Lyapunov spectrum and random matrix theory.
Hanada, Masanori; Shimada, Hidehiko; Tezuka, Masaki
2018-02-01
We propose the existence of a new universality in classical chaotic systems when the number of degrees of freedom is large: the statistical property of the Lyapunov spectrum is described by random matrix theory. We demonstrate it by studying the finite-time Lyapunov exponents of the matrix model of a stringy black hole and the mass-deformed models. The massless limit, which has a dual string theory interpretation, is special in that the universal behavior can be seen already at t=0, while in other cases it sets in at late time. The same pattern is demonstrated also in the product of random matrices.