Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
Statistical field theory of futures commodity prices
NASA Astrophysics Data System (ADS)
Baaquie, Belal E.; Yu, Miao
2018-02-01
The statistical theory of commodity prices has been formulated by Baaquie (2013). Further empirical studies of single (Baaquie et al., 2015) and multiple commodity prices (Baaquie et al., 2016) have provided strong evidence in support the primary assumptions of the statistical formulation. In this paper, the model for spot prices (Baaquie, 2013) is extended to model futures commodity prices using a statistical field theory of futures commodity prices. The futures prices are modeled as a two dimensional statistical field and a nonlinear Lagrangian is postulated. Empirical studies provide clear evidence in support of the model, with many nontrivial features of the model finding unexpected support from market data.
Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu
2015-06-01
Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erickson, Paul A.; Liao, Chang-hsien
2007-11-15
A passive flow disturbance has been proven to enhance the conversion of fuel in a methanol-steam reformer. This study presents a statistical validation of the experiment based on a standard 2{sup k} factorial experiment design and the resulting empirical model of the enhanced hydrogen producing process. A factorial experiment design was used to statistically analyze the effects and interactions of various input factors in the experiment. Three input factors, including the number of flow disturbers, catalyst size, and reactant flow rate were investigated for their effects on the fuel conversion in the steam-reformation process. Based on the experimental results, anmore » empirical model was developed and further evaluated with an uncertainty analysis and interior point data. (author)« less
NASA Astrophysics Data System (ADS)
Cincotti, Silvano; Ponta, Linda; Raberto, Marco; Scalas, Enrico
2005-05-01
In this paper, empirical analyses and computational experiments are presented on high-frequency data for a double-auction (book) market. Main objective of the paper is to generalize the order waiting time process in order to properly model such empirical evidences. The empirical study is performed on the best bid and best ask data of 7 U.S. financial markets, for 30-stock time series. In particular, statistical properties of trading waiting times have been analyzed and quality of fits is evaluated by suitable statistical tests, i.e., comparing empirical distributions with theoretical models. Starting from the statistical studies on real data, attention has been focused on the reproducibility of such results in an artificial market. The computational experiments have been performed within the Genoa Artificial Stock Market. In the market model, heterogeneous agents trade one risky asset in exchange for cash. Agents have zero intelligence and issue random limit or market orders depending on their budget constraints. The price is cleared by means of a limit order book. The order generation is modelled with a renewal process. Based on empirical trading estimation, the distribution of waiting times between two consecutive orders is modelled by a mixture of exponential processes. Results show that the empirical waiting-time distribution can be considered as a generalization of a Poisson process. Moreover, the renewal process can approximate real data and implementation on the artificial stocks market can reproduce the trading activity in a realistic way.
The beta distribution: A statistical model for world cloud cover
NASA Technical Reports Server (NTRS)
Falls, L. W.
1973-01-01
Much work has been performed in developing empirical global cloud cover models. This investigation was made to determine an underlying theoretical statistical distribution to represent worldwide cloud cover. The beta distribution with probability density function is given to represent the variability of this random variable. It is shown that the beta distribution possesses the versatile statistical characteristics necessary to assume the wide variety of shapes exhibited by cloud cover. A total of 160 representative empirical cloud cover distributions were investigated and the conclusion was reached that this study provides sufficient statical evidence to accept the beta probability distribution as the underlying model for world cloud cover.
Testing Transitivity of Preferences on Two-Alternative Forced Choice Data
Regenwetter, Michel; Dana, Jason; Davis-Stober, Clintin P.
2010-01-01
As Duncan Luce and other prominent scholars have pointed out on several occasions, testing algebraic models against empirical data raises difficult conceptual, mathematical, and statistical challenges. Empirical data often result from statistical sampling processes, whereas algebraic theories are nonprobabilistic. Many probabilistic specifications lead to statistical boundary problems and are subject to nontrivial order constrained statistical inference. The present paper discusses Luce's challenge for a particularly prominent axiom: Transitivity. The axiom of transitivity is a central component in many algebraic theories of preference and choice. We offer the currently most complete solution to the challenge in the case of transitivity of binary preference on the theory side and two-alternative forced choice on the empirical side, explicitly for up to five, and implicitly for up to seven, choice alternatives. We also discuss the relationship between our proposed solution and weak stochastic transitivity. We recommend to abandon the latter as a model of transitive individual preferences. PMID:21833217
NASA Astrophysics Data System (ADS)
Chen, Yue; Cunningham, Gregory; Henderson, Michael
2016-09-01
This study aims to statistically estimate the errors in local magnetic field directions that are derived from electron directional distributions measured by Los Alamos National Laboratory geosynchronous (LANL GEO) satellites. First, by comparing derived and measured magnetic field directions along the GEO orbit to those calculated from three selected empirical global magnetic field models (including a static Olson and Pfitzer 1977 quiet magnetic field model, a simple dynamic Tsyganenko 1989 model, and a sophisticated dynamic Tsyganenko 2001 storm model), it is shown that the errors in both derived and modeled directions are at least comparable. Second, using a newly developed proxy method as well as comparing results from empirical models, we are able to provide for the first time circumstantial evidence showing that derived magnetic field directions should statistically match the real magnetic directions better, with averaged errors < ˜ 2°, than those from the three empirical models with averaged errors > ˜ 5°. In addition, our results suggest that the errors in derived magnetic field directions do not depend much on magnetospheric activity, in contrast to the empirical field models. Finally, as applications of the above conclusions, we show examples of electron pitch angle distributions observed by LANL GEO and also take the derived magnetic field directions as the real ones so as to test the performance of empirical field models along the GEO orbits, with results suggesting dependence on solar cycles as well as satellite locations. This study demonstrates the validity and value of the method that infers local magnetic field directions from particle spin-resolved distributions.
Chen, Yue; Cunningham, Gregory; Henderson, Michael
2016-09-21
Our study aims to statistically estimate the errors in local magnetic field directions that are derived from electron directional distributions measured by Los Alamos National Laboratory geosynchronous (LANL GEO) satellites. First, by comparing derived and measured magnetic field directions along the GEO orbit to those calculated from three selected empirical global magnetic field models (including a static Olson and Pfitzer 1977 quiet magnetic field model, a simple dynamic Tsyganenko 1989 model, and a sophisticated dynamic Tsyganenko 2001 storm model), it is shown that the errors in both derived and modeled directions are at least comparable. Furthermore, using a newly developedmore » proxy method as well as comparing results from empirical models, we are able to provide for the first time circumstantial evidence showing that derived magnetic field directions should statistically match the real magnetic directions better, with averaged errors < ~2°, than those from the three empirical models with averaged errors > ~5°. In addition, our results suggest that the errors in derived magnetic field directions do not depend much on magnetospheric activity, in contrast to the empirical field models. Finally, as applications of the above conclusions, we show examples of electron pitch angle distributions observed by LANL GEO and also take the derived magnetic field directions as the real ones so as to test the performance of empirical field models along the GEO orbits, with results suggesting dependence on solar cycles as well as satellite locations. Finally, this study demonstrates the validity and value of the method that infers local magnetic field directions from particle spin-resolved distributions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yue; Cunningham, Gregory; Henderson, Michael
Our study aims to statistically estimate the errors in local magnetic field directions that are derived from electron directional distributions measured by Los Alamos National Laboratory geosynchronous (LANL GEO) satellites. First, by comparing derived and measured magnetic field directions along the GEO orbit to those calculated from three selected empirical global magnetic field models (including a static Olson and Pfitzer 1977 quiet magnetic field model, a simple dynamic Tsyganenko 1989 model, and a sophisticated dynamic Tsyganenko 2001 storm model), it is shown that the errors in both derived and modeled directions are at least comparable. Furthermore, using a newly developedmore » proxy method as well as comparing results from empirical models, we are able to provide for the first time circumstantial evidence showing that derived magnetic field directions should statistically match the real magnetic directions better, with averaged errors < ~2°, than those from the three empirical models with averaged errors > ~5°. In addition, our results suggest that the errors in derived magnetic field directions do not depend much on magnetospheric activity, in contrast to the empirical field models. Finally, as applications of the above conclusions, we show examples of electron pitch angle distributions observed by LANL GEO and also take the derived magnetic field directions as the real ones so as to test the performance of empirical field models along the GEO orbits, with results suggesting dependence on solar cycles as well as satellite locations. Finally, this study demonstrates the validity and value of the method that infers local magnetic field directions from particle spin-resolved distributions.« less
Agent-Based Models in Empirical Social Research
ERIC Educational Resources Information Center
Bruch, Elizabeth; Atwell, Jon
2015-01-01
Agent-based modeling has become increasingly popular in recent years, but there is still no codified set of recommendations or practices for how to use these models within a program of empirical research. This article provides ideas and practical guidelines drawn from sociology, biology, computer science, epidemiology, and statistics. We first…
Bootstrap data methodology for sequential hybrid model building
NASA Technical Reports Server (NTRS)
Volponi, Allan J. (Inventor); Brotherton, Thomas (Inventor)
2007-01-01
A method for modeling engine operation comprising the steps of: 1. collecting a first plurality of sensory data, 2. partitioning a flight envelope into a plurality of sub-regions, 3. assigning the first plurality of sensory data into the plurality of sub-regions, 4. generating an empirical model of at least one of the plurality of sub-regions, 5. generating a statistical summary model for at least one of the plurality of sub-regions, 6. collecting an additional plurality of sensory data, 7. partitioning the second plurality of sensory data into the plurality of sub-regions, 8. generating a plurality of pseudo-data using the empirical model, and 9. concatenating the plurality of pseudo-data and the additional plurality of sensory data to generate an updated empirical model and an updated statistical summary model for at least one of the plurality of sub-regions.
An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests
ERIC Educational Resources Information Center
Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.
2013-01-01
Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
Representing Micro-Macro Linkages by Actor-Based Dynamic Network Models
Snijders, Tom A.B.; Steglich, Christian E.G.
2014-01-01
Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of generalized linear statistical models they aim to be realistic detailed representations of network dynamics in empirical data sets. Statistical parallels to micro-macro considerations can be found in the estimation of parameters determining local actor behavior from empirical data, and the assessment of goodness of fit from the correspondence with network-level descriptives. This article studies several network-level consequences of dynamic actor-based models applied to represent cross-sectional network data. Two examples illustrate how network-level characteristics can be obtained as emergent features implied by micro-specifications of actor-based models. PMID:25960578
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Sanov and central limit theorems for output statistics of quantum Markov chains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Horssen, Merlijn van, E-mail: merlijn.vanhorssen@nottingham.ac.uk; Guţă, Mădălin, E-mail: madalin.guta@nottingham.ac.uk
2015-02-15
In this paper, we consider the statistics of repeated measurements on the output of a quantum Markov chain. We establish a large deviations result analogous to Sanov’s theorem for the multi-site empirical measure associated to finite sequences of consecutive outcomes of a classical stochastic process. Our result relies on the construction of an extended quantum transition operator (which keeps track of previous outcomes) in terms of which we compute moment generating functions, and whose spectral radius is related to the large deviations rate function. As a corollary to this, we obtain a central limit theorem for the empirical measure. Suchmore » higher level statistics may be used to uncover critical behaviour such as dynamical phase transitions, which are not captured by lower level statistics such as the sample mean. As a step in this direction, we give an example of a finite system whose level-1 (empirical mean) rate function is independent of a model parameter while the level-2 (empirical measure) rate is not.« less
Jeffrey P. Prestemon
2009-01-01
Timber product markets are subject to large shocks deriving from natural disturbances and policy shifts. Statistical modeling of shocks is often done to assess their economic importance. In this article, I simulate the statistical power of univariate and bivariate methods of shock detection using time series intervention models. Simulations show that bivariate methods...
Worrying trends in econophysics
NASA Astrophysics Data System (ADS)
Gallegati, Mauro; Keen, Steve; Lux, Thomas; Ormerod, Paul
2006-10-01
Econophysics has already made a number of important empirical contributions to our understanding of the social and economic world. These fall mainly into the areas of finance and industrial economics, where in each case there is a large amount of reasonably well-defined data. More recently, Econophysics has also begun to tackle other areas of economics where data is much more sparse and much less reliable. In addition, econophysicists have attempted to apply the theoretical approach of statistical physics to try to understand empirical findings. Our concerns are fourfold. First, a lack of awareness of work that has been done within economics itself. Second, resistance to more rigorous and robust statistical methodology. Third, the belief that universal empirical regularities can be found in many areas of economic activity. Fourth, the theoretical models which are being used to explain empirical phenomena. The latter point is of particular concern. Essentially, the models are based upon models of statistical physics in which energy is conserved in exchange processes. There are examples in economics where the principle of conservation may be a reasonable approximation to reality, such as primitive hunter-gatherer societies. But in the industrialised capitalist economies, income is most definitely not conserved. The process of production and not exchange is responsible for this. Models which focus purely on exchange and not on production cannot by definition offer a realistic description of the generation of income in the capitalist, industrialised economies.
NASA Astrophysics Data System (ADS)
Xie, Yanan; Zhou, Mingliang; Pan, Dengke
2017-10-01
The forward-scattering model is introduced to describe the response of normalized radar cross section (NRCS) of precipitation with synthetic aperture radar (SAR). Since the distribution of near-surface rainfall is related to the rate of near-surface rainfall and horizontal distribution factor, a retrieval algorithm called modified regression empirical and model-oriented statistical (M-M) based on the volterra integration theory is proposed. Compared with the model-oriented statistical and volterra integration (MOSVI) algorithm, the biggest difference is that the M-M algorithm is based on the modified regression empirical algorithm rather than the linear regression formula to retrieve the value of near-surface rainfall rate. Half of the empirical parameters are reduced in the weighted integral work and a smaller average relative error is received while the rainfall rate is less than 100 mm/h. Therefore, the algorithm proposed in this paper can obtain high-precision rainfall information.
Pearce, Marcus T
2018-05-11
Music perception depends on internal psychological models derived through exposure to a musical culture. It is hypothesized that this musical enculturation depends on two cognitive processes: (1) statistical learning, in which listeners acquire internal cognitive models of statistical regularities present in the music to which they are exposed; and (2) probabilistic prediction based on these learned models that enables listeners to organize and process their mental representations of music. To corroborate these hypotheses, I review research that uses a computational model of probabilistic prediction based on statistical learning (the information dynamics of music (IDyOM) model) to simulate data from empirical studies of human listeners. The results show that a broad range of psychological processes involved in music perception-expectation, emotion, memory, similarity, segmentation, and meter-can be understood in terms of a single, underlying process of probabilistic prediction using learned statistical models. Furthermore, IDyOM simulations of listeners from different musical cultures demonstrate that statistical learning can plausibly predict causal effects of differential cultural exposure to musical styles, providing a quantitative model of cultural distance. Understanding the neural basis of musical enculturation will benefit from close coordination between empirical neuroimaging and computational modeling of underlying mechanisms, as outlined here. © 2018 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals, Inc. on behalf of New York Academy of Sciences.
Bryant, Fred B
2016-12-01
This paper introduces a special section of the current issue of the Journal of Evaluation in Clinical Practice that includes a set of 6 empirical articles showcasing a versatile, new machine-learning statistical method, known as optimal data (or discriminant) analysis (ODA), specifically designed to produce statistical models that maximize predictive accuracy. As this set of papers clearly illustrates, ODA offers numerous important advantages over traditional statistical methods-advantages that enhance the validity and reproducibility of statistical conclusions in empirical research. This issue of the journal also includes a review of a recently published book that provides a comprehensive introduction to the logic, theory, and application of ODA in empirical research. It is argued that researchers have much to gain by using ODA to analyze their data. © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Hofer, Marlis; MöLg, Thomas; Marzeion, Ben; Kaser, Georg
2010-06-01
Recently initiated observation networks in the Cordillera Blanca (Peru) provide temporally high-resolution, yet short-term, atmospheric data. The aim of this study is to extend the existing time series into the past. We present an empirical-statistical downscaling (ESD) model that links 6-hourly National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis data to air temperature and specific humidity, measured at the tropical glacier Artesonraju (northern Cordillera Blanca). The ESD modeling procedure includes combined empirical orthogonal function and multiple regression analyses and a double cross-validation scheme for model evaluation. Apart from the selection of predictor fields, the modeling procedure is automated and does not include subjective choices. We assess the ESD model sensitivity to the predictor choice using both single-field and mixed-field predictors. Statistical transfer functions are derived individually for different months and times of day. The forecast skill largely depends on month and time of day, ranging from 0 to 0.8. The mixed-field predictors perform better than the single-field predictors. The ESD model shows added value, at all time scales, against simpler reference models (e.g., the direct use of reanalysis grid point values). The ESD model forecast 1960-2008 clearly reflects interannual variability related to the El Niño/Southern Oscillation but is sensitive to the chosen predictor type.
Quantifying uncertainty in climate change science through empirical information theory.
Majda, Andrew J; Gershgorin, Boris
2010-08-24
Quantifying the uncertainty for the present climate and the predictions of climate change in the suite of imperfect Atmosphere Ocean Science (AOS) computer models is a central issue in climate change science. Here, a systematic approach to these issues with firm mathematical underpinning is developed through empirical information theory. An information metric to quantify AOS model errors in the climate is proposed here which incorporates both coarse-grained mean model errors as well as covariance ratios in a transformation invariant fashion. The subtle behavior of model errors with this information metric is quantified in an instructive statistically exactly solvable test model with direct relevance to climate change science including the prototype behavior of tracer gases such as CO(2). Formulas for identifying the most sensitive climate change directions using statistics of the present climate or an AOS model approximation are developed here; these formulas just involve finding the eigenvector associated with the largest eigenvalue of a quadratic form computed through suitable unperturbed climate statistics. These climate change concepts are illustrated on a statistically exactly solvable one-dimensional stochastic model with relevance for low frequency variability of the atmosphere. Viable algorithms for implementation of these concepts are discussed throughout the paper.
Implication of correlations among some common stability statistics - a Monte Carlo simulations.
Piepho, H P
1995-03-01
Stability analysis of multilocation trials is often based on a mixed two-way model. Two stability measures in frequent use are the environmental variance (S i (2) )and the ecovalence (W i). Under the two-way model the rank orders of the expected values of these two statistics are identical for a given set of genotypes. By contrast, empirical rank correlations among these measures are consistently low. This suggests that the two-way mixed model may not be appropriate for describing real data. To check this hypothesis, a Monte Carlo simulation was conducted. It revealed that the low empirical rank correlation amongS i (2) and W i is most likely due to sampling errors. It is concluded that the observed low rank correlation does not invalidate the two-way model. The paper also discusses tests for homogeneity of S i (2) as well as implications of the two-way model for the classification of stability statistics.
Numerical and Qualitative Contrasts of Two Statistical Models ...
Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and products. This study provided an empirical and qualitative comparison of both models using 29 years of data for two discrete time series of chlorophyll-a (chl-a) in the Patuxent River estuary. Empirical descriptions of each model were based on predictive performance against the observed data, ability to reproduce flow-normalized trends with simulated data, and comparisons of performance with validation datasets. Between-model differences were apparent but minor and both models had comparable abilities to remove flow effects from simulated time series. Both models similarly predicted observations for missing data with different characteristics. Trends from each model revealed distinct mainstem influences of the Chesapeake Bay with both models predicting a roughly 65% increase in chl-a over time in the lower estuary, whereas flow-normalized predictions for the upper estuary showed a more dynamic pattern, with a nearly 100% increase in chl-a in the last 10 years. Qualitative comparisons highlighted important differences in the statistical structure, available products, and characteristics of the data and desired analysis. This manuscript describes a quantitative comparison of two recently-
NASA Astrophysics Data System (ADS)
Wu, Qing; Luu, Quang-Hung; Tkalich, Pavel; Chen, Ge
2018-04-01
Having great impacts on human lives, global warming and associated sea level rise are believed to be strongly linked to anthropogenic causes. Statistical approach offers a simple and yet conceptually verifiable combination of remotely connected climate variables and indices, including sea level and surface temperature. We propose an improved statistical reconstruction model based on the empirical dynamic control system by taking into account the climate variability and deriving parameters from Monte Carlo cross-validation random experiments. For the historic data from 1880 to 2001, we yielded higher correlation results compared to those from other dynamic empirical models. The averaged root mean square errors are reduced in both reconstructed fields, namely, the global mean surface temperature (by 24-37%) and the global mean sea level (by 5-25%). Our model is also more robust as it notably diminished the unstable problem associated with varying initial values. Such results suggest that the model not only enhances significantly the global mean reconstructions of temperature and sea level but also may have a potential to improve future projections.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nasrabadi, M. N., E-mail: mnnasrabadi@ast.ui.ac.ir; Sepiani, M.
2015-03-30
Production of medical radioisotopes is one of the most important tasks in the field of nuclear technology. These radioactive isotopes are mainly produced through variety nuclear process. In this research, excitation functions and nuclear reaction mechanisms are studied for simulation of production of these radioisotopes in the TALYS, EMPIRE and LISE++ reaction codes, then parameters and different models of nuclear level density as one of the most important components in statistical reaction models are adjusted for optimum production of desired radioactive yields.
NASA Astrophysics Data System (ADS)
Nasrabadi, M. N.; Sepiani, M.
2015-03-01
Production of medical radioisotopes is one of the most important tasks in the field of nuclear technology. These radioactive isotopes are mainly produced through variety nuclear process. In this research, excitation functions and nuclear reaction mechanisms are studied for simulation of production of these radioisotopes in the TALYS, EMPIRE & LISE++ reaction codes, then parameters and different models of nuclear level density as one of the most important components in statistical reaction models are adjusted for optimum production of desired radioactive yields.
1982-02-01
TANAAD S t(AB A , THE IMPACT OF SOCIAL DESIRABILITY ON ORGANIZATIONAL BEHAVIOR RESEARCH RESULTS: AN EMPIRICAL INVESTIGATION OF ALTERNATIVE MODELS Daniel C...Investigation of Alternative Models 4- PERFORMING G. RE.PORTNUMBER 7. AU ?wnOPe) a. CONTRACf OR GR1ANT NUBAe Daniel C. Ganster, Harry W. Hennessey, and Fred...Fred Luthans University of Nebraska-Lincoln ABSTRACT Three conceptual and statistical models are developed for the effects of social desirability (SD
The Small World of Psychopathology
Borsboom, Denny; Cramer, Angélique O. J.; Schmittmann, Verena D.; Epskamp, Sacha; Waldorp, Lourens J.
2011-01-01
Background Mental disorders are highly comorbid: people having one disorder are likely to have another as well. We explain empirical comorbidity patterns based on a network model of psychiatric symptoms, derived from an analysis of symptom overlap in the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV). Principal Findings We show that a) half of the symptoms in the DSM-IV network are connected, b) the architecture of these connections conforms to a small world structure, featuring a high degree of clustering but a short average path length, and c) distances between disorders in this structure predict empirical comorbidity rates. Network simulations of Major Depressive Episode and Generalized Anxiety Disorder show that the model faithfully reproduces empirical population statistics for these disorders. Conclusions In the network model, mental disorders are inherently complex. This explains the limited successes of genetic, neuroscientific, and etiological approaches to unravel their causes. We outline a psychosystems approach to investigate the structure and dynamics of mental disorders. PMID:22114671
Record statistics of financial time series and geometric random walks
NASA Astrophysics Data System (ADS)
Sabir, Behlool; Santhanam, M. S.
2014-09-01
The study of record statistics of correlated series in physics, such as random walks, is gaining momentum, and several analytical results have been obtained in the past few years. In this work, we study the record statistics of correlated empirical data for which random walk models have relevance. We obtain results for the records statistics of select stock market data and the geometric random walk, primarily through simulations. We show that the distribution of the age of records is a power law with the exponent α lying in the range 1.5≤α≤1.8. Further, the longest record ages follow the Fréchet distribution of extreme value theory. The records statistics of geometric random walk series is in good agreement with that obtained from empirical stock data.
Statistical methods for the beta-binomial model in teratology.
Yamamoto, E; Yanagimoto, T
1994-01-01
The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716
Vehicular headways on signalized intersections: theory, models, and reality
NASA Astrophysics Data System (ADS)
Krbálek, Milan; Šleis, Jiří
2015-01-01
We discuss statistical properties of vehicular headways measured on signalized crossroads. On the basis of mathematical approaches, we formulate theoretical and empirically inspired criteria for the acceptability of theoretical headway distributions. Sequentially, the multifarious families of statistical distributions (commonly used to fit real-road headway statistics) are confronted with these criteria, and with original empirical time clearances gauged among neighboring vehicles leaving signal-controlled crossroads after a green signal appears. Using three different numerical schemes, we demonstrate that an arrangement of vehicles on an intersection is a consequence of the general stochastic nature of queueing systems, rather than a consequence of traffic rules, driver estimation processes, or decision-making procedures.
Empirical analysis of storm-time energetic electron enhancements
NASA Astrophysics Data System (ADS)
O'Brien, Thomas Paul, III
This Ph.D. thesis documents a program for studying the appearance of energetic electrons in the Earth's outer radiation belts that is associated with many geomagnetic storms. The dynamic evolution of the electron radiation belts is an outstanding empirical problem in both theoretical space physics and its applied sibling, space weather. The project emphasizes the development of empirical tools and their use in testing several theoretical models of the energization of the electron belts. First, I develop the Statistical Asynchronous Regression technique to provide proxy electron fluxes throughout the parts of the radiation belts explored by geosynchronous and GPS spacecraft. Next, I show that a theoretical adiabatic model can relate the local time asymmetry of the proxy geosynchronous fluxes to the asymmetry of the geomagnetic field. Then, I perform a superposed epoch analysis on the proxy fluxes at local noon to identify magnetospheric and interplanetary precursors of relativistic electron enhancements. Finally, I use statistical and neural network phase space analyses to determine the hourly evolution of flux at a virtual stationary monitor. The dynamic equation quantitatively identifies the importance of different drivers of the electron belts. This project provides empirical constraints on theoretical models of electron acceleration.
Watanabe, Hayafumi; Sano, Yukie; Takayasu, Hideki; Takayasu, Misako
2016-11-01
To elucidate the nontrivial empirical statistical properties of fluctuations of a typical nonsteady time series representing the appearance of words in blogs, we investigated approximately 3×10^{9} Japanese blog articles over a period of six years and analyze some corresponding mathematical models. First, we introduce a solvable nonsteady extension of the random diffusion model, which can be deduced by modeling the behavior of heterogeneous random bloggers. Next, we deduce theoretical expressions for both the temporal and ensemble fluctuation scalings of this model, and demonstrate that these expressions can reproduce all empirical scalings over eight orders of magnitude. Furthermore, we show that the model can reproduce other statistical properties of time series representing the appearance of words in blogs, such as functional forms of the probability density and correlations in the total number of blogs. As an application, we quantify the abnormality of special nationwide events by measuring the fluctuation scalings of 1771 basic adjectives.
Modeling noisy resonant system response
NASA Astrophysics Data System (ADS)
Weber, Patrick Thomas; Walrath, David Edwin
2017-02-01
In this paper, a theory-based model replicating empirical acoustic resonant signals is presented and studied to understand sources of noise present in acoustic signals. Statistical properties of empirical signals are quantified and a noise amplitude parameter, which models frequency and amplitude-based noise, is created, defined, and presented. This theory-driven model isolates each phenomenon and allows for parameters to be independently studied. Using seven independent degrees of freedom, this model will accurately reproduce qualitative and quantitative properties measured from laboratory data. Results are presented and demonstrate success in replicating qualitative and quantitative properties of experimental data.
The consentaneous model of the financial markets exhibiting spurious nature of long-range memory
NASA Astrophysics Data System (ADS)
Gontis, V.; Kononovicius, A.
2018-09-01
It is widely accepted that there is strong persistence in the volatility of financial time series. The origin of the observed persistence, or long-range memory, is still an open problem as the observed phenomenon could be a spurious effect. Earlier we have proposed the consentaneous model of the financial markets based on the non-linear stochastic differential equations. The consentaneous model successfully reproduces empirical probability and power spectral densities of volatility. This approach is qualitatively different from models built using fractional Brownian motion. In this contribution we investigate burst and inter-burst duration statistics of volatility in the financial markets employing the consentaneous model. Our analysis provides an evidence that empirical statistical properties of burst and inter-burst duration can be explained by non-linear stochastic differential equations driving the volatility in the financial markets. This serves as an strong argument that long-range memory in finance can have spurious nature.
ERIC Educational Resources Information Center
Cafri, Guy; Kromrey, Jeffrey D.; Brannick, Michael T.
2010-01-01
This article uses meta-analyses published in "Psychological Bulletin" from 1995 to 2005 to describe meta-analyses in psychology, including examination of statistical power, Type I errors resulting from multiple comparisons, and model choice. Retrospective power estimates indicated that univariate categorical and continuous moderators, individual…
Control Theory and Statistical Generalizations.
ERIC Educational Resources Information Center
Powers, William T.
1990-01-01
Contrasts modeling methods in control theory to the methods of statistical generalizations in empirical studies of human or animal behavior. Presents a computer simulation that predicts behavior based on variables (effort and rewards) determined by the invariable (desired reward). Argues that control theory methods better reflect relationships to…
Benchmarking test of empirical root water uptake models
NASA Astrophysics Data System (ADS)
dos Santos, Marcos Alex; de Jong van Lier, Quirijn; van Dam, Jos C.; Freire Bezerra, Andre Herman
2017-01-01
Detailed physical models describing root water uptake (RWU) are an important tool for the prediction of RWU and crop transpiration, but the hydraulic parameters involved are hardly ever available, making them less attractive for many studies. Empirical models are more readily used because of their simplicity and the associated lower data requirements. The purpose of this study is to evaluate the capability of some empirical models to mimic the RWU distribution under varying environmental conditions predicted from numerical simulations with a detailed physical model. A review of some empirical models used as sub-models in ecohydrological models is presented, and alternative empirical RWU models are proposed. All these empirical models are analogous to the standard Feddes model, but differ in how RWU is partitioned over depth or how the transpiration reduction function is defined. The parameters of the empirical models are determined by inverse modelling of simulated depth-dependent RWU. The performance of the empirical models and their optimized empirical parameters depends on the scenario. The standard empirical Feddes model only performs well in scenarios with low root length density R, i.e. for scenarios with low RWU compensation
. For medium and high R, the Feddes RWU model cannot mimic properly the root uptake dynamics as predicted by the physical model. The Jarvis RWU model in combination with the Feddes reduction function (JMf) only provides good predictions for low and medium R scenarios. For high R, it cannot mimic the uptake patterns predicted by the physical model. Incorporating a newly proposed reduction function into the Jarvis model improved RWU predictions. Regarding the ability of the models to predict plant transpiration, all models accounting for compensation show good performance. The Akaike information criterion (AIC) indicates that the Jarvis (2010) model (JMII), with no empirical parameters to be estimated, is the best model
. The proposed models are better in predicting RWU patterns similar to the physical model. The statistical indices point to them as the best alternatives for mimicking RWU predictions of the physical model.
Adopting adequate leaching requirement for practical response models of basil to salinity
NASA Astrophysics Data System (ADS)
Babazadeh, Hossein; Tabrizi, Mahdi Sarai; Darvishi, Hossein Hassanpour
2016-07-01
Several mathematical models are being used for assessing plant response to salinity of the root zone. Objectives of this study included quantifying the yield salinity threshold value of basil plants to irrigation water salinity and investigating the possibilities of using irrigation water salinity instead of saturated extract salinity in the available mathematical models for estimating yield. To achieve the above objectives, an extensive greenhouse experiment was conducted with 13 irrigation water salinity levels, namely 1.175 dS m-1 (control treatment) and 1.8 to 10 dS m-1. The result indicated that, among these models, the modified discount model (one of the most famous root water uptake model which is based on statistics) produced more accurate results in simulating the basil yield reduction function using irrigation water salinities. Overall the statistical model of Steppuhn et al. on the modified discount model and the math-empirical model of van Genuchten and Hoffman provided the best results. In general, all of the statistical models produced very similar results and their results were better than math-empirical models. It was also concluded that if enough leaching was present, there was no significant difference between the soil salinity saturated extract models and the models using irrigation water salinity.
Comments of statistical issue in numerical modeling for underground nuclear test monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nicholson, W.L.; Anderson, K.K.
1993-03-01
The Symposium concluded with prepared summaries by four experts in the involved disciplines. These experts made no mention of statistics and/or the statistical content of issues. The first author contributed an extemporaneous statement at the Symposium because there are important issues associated with conducting and evaluating numerical modeling that are familiar to statisticians and often treated successfully by them. This note expands upon these extemporaneous remarks. Statistical ideas may be helpful in resolving some numerical modeling issues. Specifically, we comment first on the role of statistical design/analysis in the quantification process to answer the question ``what do we know aboutmore » the numerical modeling of underground nuclear tests?`` and second on the peculiar nature of uncertainty analysis for situations involving numerical modeling. The simulations described in the workshop, though associated with topic areas, were basically sets of examples. Each simulation was tuned towards agreeing with either empirical evidence or an expert`s opinion of what empirical evidence would be. While the discussions were reasonable, whether the embellishments were correct or a forced fitting of reality is unclear and illustrates that ``simulation is easy.`` We also suggest that these examples of simulation are typical and the questions concerning the legitimacy and the role of knowing the reality are fair, in general, with respect to simulation. The answers will help us understand why ``prediction is difficult.``« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sathaye, Jayant A.
2000-04-01
Integrated assessment (IA) modeling of climate policy is increasingly global in nature, with models incorporating regional disaggregation. The existing empirical basis for IA modeling, however, largely arises from research on industrialized economies. Given the growing importance of developing countries in determining long-term global energy and carbon emissions trends, filling this gap with improved statistical information on developing countries' energy and carbon-emissions characteristics is an important priority for enhancing IA modeling. Earlier research at LBNL on this topic has focused on assembling and analyzing statistical data on productivity trends and technological change in the energy-intensive manufacturing sectors of five developing countries,more » India, Brazil, Mexico, Indonesia, and South Korea. The proposed work will extend this analysis to the agriculture and electric power sectors in India, South Korea, and two other developing countries. They will also examine the impact of alternative model specifications on estimates of productivity growth and technological change for each of the three sectors, and estimate the contribution of various capital inputs--imported vs. indigenous, rigid vs. malleable-- in contributing to productivity growth and technological change. The project has already produced a data resource on the manufacturing sector which is being shared with IA modelers. This will be extended to the agriculture and electric power sectors, which would also be made accessible to IA modeling groups seeking to enhance the empirical descriptions of developing country characteristics. The project will entail basic statistical and econometric analysis of productivity and energy trends in these developing country sectors, with parameter estimates also made available to modeling groups. The parameter estimates will be developed using alternative model specifications that could be directly utilized by the existing IAMs for the manufacturing, agriculture, and electric power sectors.« less
Statistical microeconomics and commodity prices: theory and empirical results.
Baaquie, Belal E
2016-01-13
A review is made of the statistical generalization of microeconomics by Baaquie (Baaquie 2013 Phys. A 392, 4400-4416. (doi:10.1016/j.physa.2013.05.008)), where the market price of every traded commodity, at each instant of time, is considered to be an independent random variable. The dynamics of commodity market prices is given by the unequal time correlation function and is modelled by the Feynman path integral based on an action functional. The correlation functions of the model are defined using the path integral. The existence of the action functional for commodity prices that was postulated to exist in Baaquie (Baaquie 2013 Phys. A 392, 4400-4416. (doi:10.1016/j.physa.2013.05.008)) has been empirically ascertained in Baaquie et al. (Baaquie et al. 2015 Phys. A 428, 19-37. (doi:10.1016/j.physa.2015.02.030)). The model's action functionals for different commodities has been empirically determined and calibrated using the unequal time correlation functions of the market commodity prices using a perturbation expansion (Baaquie et al. 2015 Phys. A 428, 19-37. (doi:10.1016/j.physa.2015.02.030)). Nine commodities drawn from the energy, metal and grain sectors are empirically studied and their auto-correlation for up to 300 days is described by the model to an accuracy of R(2)>0.90-using only six parameters. © 2015 The Author(s).
DEVELOPMENT OF THE VIRTUAL BEACH MODEL, PHASE 1: AN EMPIRICAL MODEL
With increasing attention focused on the use of multiple linear regression (MLR) modeling of beach fecal bacteria concentration, the validity of the entire statistical process should be carefully evaluated to assure satisfactory predictions. This work aims to identify pitfalls an...
Using statistical equivalence testing logic and mixed model theory an approach has been developed, that extends the work of Stork et al (JABES,2008), to define sufficient similarity in dose-response for chemical mixtures containing the same chemicals with different ratios ...
AGENT-BASED MODELS IN EMPIRICAL SOCIAL RESEARCH*
Bruch, Elizabeth; Atwell, Jon
2014-01-01
Agent-based modeling has become increasingly popular in recent years, but there is still no codified set of recommendations or practices for how to use these models within a program of empirical research. This article provides ideas and practical guidelines drawn from sociology, biology, computer science, epidemiology, and statistics. We first discuss the motivations for using agent-based models in both basic science and policy-oriented social research. Next, we provide an overview of methods and strategies for incorporating data on behavior and populations into agent-based models, and review techniques for validating and testing the sensitivity of agent-based models. We close with suggested directions for future research. PMID:25983351
Modeling Traffic on the Web Graph
NASA Astrophysics Data System (ADS)
Meiss, Mark R.; Gonçalves, Bruno; Ramasco, José J.; Flammini, Alessandro; Menczer, Filippo
Analysis of aggregate and individual Web requests shows that PageRank is a poor predictor of traffic. We use empirical data to characterize properties of Web traffic not reproduced by Markovian models, including both aggregate statistics such as page and link traffic, and individual statistics such as entropy and session size. As no current model reconciles all of these observations, we present an agent-based model that explains them through realistic browsing behaviors: (1) revisiting bookmarked pages; (2) backtracking; and (3) seeking out novel pages of topical interest. The resulting model can reproduce the behaviors we observe in empirical data, especially heterogeneous session lengths, reconciling the narrowly focused browsing patterns of individual users with the extreme variance in aggregate traffic measurements. We can thereby identify a few salient features that are necessary and sufficient to interpret Web traffic data. Beyond the descriptive and explanatory power of our model, these results may lead to improvements in Web applications such as search and crawling.
Complex dynamics and empirical evidence (Invited Paper)
NASA Astrophysics Data System (ADS)
Delli Gatti, Domenico; Gaffeo, Edoardo; Giulioni, Gianfranco; Gallegati, Mauro; Kirman, Alan; Palestrini, Antonio; Russo, Alberto
2005-05-01
Standard macroeconomics, based on a reductionist approach centered on the representative agent, is badly equipped to explain the empirical evidence where heterogeneity and industrial dynamics are the rule. In this paper we show that a simple agent-based model of heterogeneous financially fragile agents is able to replicate a large number of scaling type stylized facts with a remarkable degree of statistical precision.
Evaluation of Models of the Reading Process.
ERIC Educational Resources Information Center
Balajthy, Ernest
A variety of reading process models have been proposed and evaluated in reading research. Traditional approaches to model evaluation specify the workings of a system in a simplified fashion to enable organized, systematic study of the system's components. Following are several statistical methods of model evaluation: (1) empirical research on…
Covariations in ecological scaling laws fostered by community dynamics.
Zaoli, Silvia; Giometto, Andrea; Maritan, Amos; Rinaldo, Andrea
2017-10-03
Scaling laws in ecology, intended both as functional relationships among ecologically relevant quantities and the probability distributions that characterize their occurrence, have long attracted the interest of empiricists and theoreticians. Empirical evidence exists of power laws associated with the number of species inhabiting an ecosystem, their abundances, and traits. Although their functional form appears to be ubiquitous, empirical scaling exponents vary with ecosystem type and resource supply rate. The idea that ecological scaling laws are linked has been entertained before, but the full extent of macroecological pattern covariations, the role of the constraints imposed by finite resource supply, and a comprehensive empirical verification are still unexplored. Here, we propose a theoretical scaling framework that predicts the linkages of several macroecological patterns related to species' abundances and body sizes. We show that such a framework is consistent with the stationary-state statistics of a broad class of resource-limited community dynamics models, regardless of parameterization and model assumptions. We verify predicted theoretical covariations by contrasting empirical data and provide testable hypotheses for yet unexplored patterns. We thus place the observed variability of ecological scaling exponents into a coherent statistical framework where patterns in ecology embed constrained fluctuations.
Towards Validation of an Adaptive Flight Control Simulation Using Statistical Emulation
NASA Technical Reports Server (NTRS)
He, Yuning; Lee, Herbert K. H.; Davies, Misty D.
2012-01-01
Traditional validation of flight control systems is based primarily upon empirical testing. Empirical testing is sufficient for simple systems in which a.) the behavior is approximately linear and b.) humans are in-the-loop and responsible for off-nominal flight regimes. A different possible concept of operation is to use adaptive flight control systems with online learning neural networks (OLNNs) in combination with a human pilot for off-nominal flight behavior (such as when a plane has been damaged). Validating these systems is difficult because the controller is changing during the flight in a nonlinear way, and because the pilot and the control system have the potential to co-adapt in adverse ways traditional empirical methods are unlikely to provide any guarantees in this case. Additionally, the time it takes to find unsafe regions within the flight envelope using empirical testing means that the time between adaptive controller design iterations is large. This paper describes a new concept for validating adaptive control systems using methods based on Bayesian statistics. This validation framework allows the analyst to build nonlinear models with modal behavior, and to have an uncertainty estimate for the difference between the behaviors of the model and system under test.
Estimation and confidence intervals for empirical mixing distributions
Link, W.A.; Sauer, J.R.
1995-01-01
Questions regarding collections of parameter estimates can frequently be expressed in terms of an empirical mixing distribution (EMD). This report discusses empirical Bayes estimation of an EMD, with emphasis on the construction of interval estimates. Estimation of the EMD is accomplished by substitution of estimates of prior parameters in the posterior mean of the EMD. This procedure is examined in a parametric model (the normal-normal mixture) and in a semi-parametric model. In both cases, the empirical Bayes bootstrap of Laird and Louis (1987, Journal of the American Statistical Association 82, 739-757) is used to assess the variability of the estimated EMD arising from the estimation of prior parameters. The proposed methods are applied to a meta-analysis of population trend estimates for groups of birds.
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
NASA Astrophysics Data System (ADS)
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
Estimating trends in the global mean temperature record
NASA Astrophysics Data System (ADS)
Poppick, Andrew; Moyer, Elisabeth J.; Stein, Michael L.
2017-06-01
Given uncertainties in physical theory and numerical climate simulations, the historical temperature record is often used as a source of empirical information about climate change. Many historical trend analyses appear to de-emphasize physical and statistical assumptions: examples include regression models that treat time rather than radiative forcing as the relevant covariate, and time series methods that account for internal variability in nonparametric rather than parametric ways. However, given a limited data record and the presence of internal variability, estimating radiatively forced temperature trends in the historical record necessarily requires some assumptions. Ostensibly empirical methods can also involve an inherent conflict in assumptions: they require data records that are short enough for naive trend models to be applicable, but long enough for long-timescale internal variability to be accounted for. In the context of global mean temperatures, empirical methods that appear to de-emphasize assumptions can therefore produce misleading inferences, because the trend over the twentieth century is complex and the scale of temporal correlation is long relative to the length of the data record. We illustrate here how a simple but physically motivated trend model can provide better-fitting and more broadly applicable trend estimates and can allow for a wider array of questions to be addressed. In particular, the model allows one to distinguish, within a single statistical framework, between uncertainties in the shorter-term vs. longer-term response to radiative forcing, with implications not only on historical trends but also on uncertainties in future projections. We also investigate the consequence on inferred uncertainties of the choice of a statistical description of internal variability. While nonparametric methods may seem to avoid making explicit assumptions, we demonstrate how even misspecified parametric statistical methods, if attuned to the important characteristics of internal variability, can result in more accurate uncertainty statements about trends.
Empirical intrinsic geometry for nonlinear modeling and time series filtering.
Talmon, Ronen; Coifman, Ronald R
2013-07-30
In this paper, we present a method for time series analysis based on empirical intrinsic geometry (EIG). EIG enables one to reveal the low-dimensional parametric manifold as well as to infer the underlying dynamics of high-dimensional time series. By incorporating concepts of information geometry, this method extends existing geometric analysis tools to support stochastic settings and parametrizes the geometry of empirical distributions. However, the statistical models are not required as priors; hence, EIG may be applied to a wide range of real signals without existing definitive models. We show that the inferred model is noise-resilient and invariant under different observation and instrumental modalities. In addition, we show that it can be extended efficiently to newly acquired measurements in a sequential manner. These two advantages enable us to revisit the Bayesian approach and incorporate empirical dynamics and intrinsic geometry into a nonlinear filtering framework. We show applications to nonlinear and non-Gaussian tracking problems as well as to acoustic signal localization.
Wiedermann, Wolfgang; Li, Xintong
2018-04-16
In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.
Correcting for population structure and kinship using the linear mixed model: theory and extensions.
Hoffman, Gabriel E
2013-01-01
Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.
NASA Astrophysics Data System (ADS)
West, Damien; West, Bruce J.
2012-07-01
There are a substantial number of empirical relations that began with the identification of a pattern in data; were shown to have a terse power-law description; were interpreted using existing theory; reached the level of "law" and given a name; only to be subsequently fade away when it proved impossible to connect the "law" with a larger body of theory and/or data. Various forms of allometry relations (ARs) have followed this path. The ARs in biology are nearly two hundred years old and those in ecology, geophysics, physiology and other areas of investigation are not that much younger. In general if X is a measure of the size of a complex host network and Y is a property of a complex subnetwork embedded within the host network a theoretical AR exists between the two when Y = aXb. We emphasize that the reductionistic models of AR interpret X and Y as dynamic variables, albeit the ARs themselves are explicitly time independent even though in some cases the parameter values change over time. On the other hand, the phenomenological models of AR are based on the statistical analysis of data and interpret X and Y as averages to yield the empirical AR:
Modeling Smoke Plume-Rise and Dispersion from Southern United States Prescribed Burns with Daysmoke
G L Achtemeier; S L Goodrick; Y Liu; F Garcia-Menendez; Y Hu; M. Odman
2011-01-01
We present Daysmoke, an empirical-statistical plume rise and dispersion model for simulating smoke from prescribed burns. Prescribed fires are characterized by complex plume structure including multiple-core updrafts which makes modeling with simple plume models difficult. Daysmoke accounts for plume structure in a three-dimensional veering/sheering atmospheric...
Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis
ERIC Educational Resources Information Center
Young, Cristobal; Holsteen, Katherine
2017-01-01
Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all…
NASA Astrophysics Data System (ADS)
Jaber, Abobaker M.
2014-12-01
Two nonparametric methods for prediction and modeling of financial time series signals are proposed. The proposed techniques are designed to handle non-stationary and non-linearity behave and to extract meaningful signals for reliable prediction. Due to Fourier Transform (FT), the methods select significant decomposed signals that will be employed for signal prediction. The proposed techniques developed by coupling Holt-winter method with Empirical Mode Decomposition (EMD) and it is Extending the scope of empirical mode decomposition by smoothing (SEMD). To show performance of proposed techniques, we analyze daily closed price of Kuala Lumpur stock market index.
Bonnet-Lebrun, Anne-Sophie; Manica, Andrea; Eriksson, Anders; Rodrigues, Ana S L
2017-05-01
Community characteristics reflect past ecological and evolutionary dynamics. Here, we investigate whether it is possible to obtain realistically shaped modeled communities-that is with phylogenetic trees and species abundance distributions shaped similarly to typical empirical bird and mammal communities-from neutral community models. To test the effect of gene flow, we contrasted two spatially explicit individual-based neutral models: one with protracted speciation, delayed by gene flow, and one with point mutation speciation, unaffected by gene flow. The former produced more realistic communities (shape of phylogenetic tree and species-abundance distribution), consistent with gene flow being a key process in macro-evolutionary dynamics. Earlier models struggled to capture the empirically observed branching tempo in phylogenetic trees, as measured by the gamma statistic. We show that the low gamma values typical of empirical trees can be obtained in models with protracted speciation, in preequilibrium communities developing from an initially abundant and widespread species. This was even more so in communities sampled incompletely, particularly if the unknown species are the youngest. Overall, our results demonstrate that the characteristics of empirical communities that we have studied can, to a large extent, be explained through a purely neutral model under preequilibrium conditions. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
ASSESSMENT OF SPATIAL AUTOCORRELATION IN EMPIRICAL MODELS IN ECOLOGY
Statistically assessing ecological models is inherently difficult because data are autocorrelated and this autocorrelation varies in an unknown fashion. At a simple level, the linking of a single species to a habitat type is a straightforward analysis. With some investigation int...
Space evolution model and empirical analysis of an urban public transport network
NASA Astrophysics Data System (ADS)
Sui, Yi; Shao, Feng-jing; Sun, Ren-cheng; Li, Shu-jing
2012-07-01
This study explores the space evolution of an urban public transport network, using empirical evidence and a simulation model validated on that data. Public transport patterns primarily depend on traffic spatial-distribution, demands of passengers and expected utility of investors. Evolution is an iterative process of satisfying the needs of passengers and investors based on a given traffic spatial-distribution. The temporal change of urban public transport network is evaluated both using topological measures and spatial ones. The simulation model is validated using empirical data from nine big cities in China. Statistical analyses on topological and spatial attributes suggest that an evolution network with traffic demands characterized by power-law numerical values which distribute in a mode of concentric circles tallies well with these nine cities.
Comparison between volatility return intervals of the S&P 500 index and two common models
NASA Astrophysics Data System (ADS)
Vodenska-Chitkushev, I.; Wang, F. Z.; Weber, P.; Yamasaki, K.; Havlin, S.; Stanley, H. E.
2008-01-01
We analyze the S&P 500 index data for the 13-year period, from January 1, 1984 to December 31, 1996, with one data point every 10 min. For this database, we study the distribution and clustering of volatility return intervals, which are defined as the time intervals between successive volatilities above a certain threshold q. We find that the long memory in the volatility leads to a clustering of above-median as well as below-median return intervals. In addition, it turns out that the short return intervals form larger clusters compared to the long return intervals. When comparing the empirical results to the ARMA-FIGARCH and fBm models for volatility, we find that the fBm model predicts scaling better than the ARMA-FIGARCH model, which is consistent with the argument that both ARMA-FIGARCH and fBm capture the long-term dependence in return intervals to a certain extent, but only fBm accounts for the scaling. We perform the Student's t-test to compare the empirical data with the shuffled records, ARMA-FIGARCH and fBm. We analyze separately the clusters of above-median return intervals and the clusters of below-median return intervals for different thresholds q. We find that the empirical data are statistically different from the shuffled data for all thresholds q. Our results also suggest that the ARMA-FIGARCH model is statistically different from the S&P 500 for intermediate q for both above-median and below-median clusters, while fBm is statistically different from S&P 500 for small and large q for above-median clusters and for small q for below-median clusters. Neither model can fully explain the entire regime of q studied.
Long-term Trends and Variability of Eddy Activities in the South China Sea
NASA Astrophysics Data System (ADS)
Zhang, M.; von Storch, H.
2017-12-01
For constructing empirical downscaling models and projecting possible future states of eddy activities in the South China Sea (SCS), long-term statistical characteristics of the SCS eddy are needed. We use a daily global eddy-resolving model product named STORM covering the period of 1950-2010. This simulation has employed the MPI-OM model with a mean horizontal resolution of 10km and been driven by the NCEP reanalysis-1 data set. An eddy detection and tracking algorithm operating on the gridded sea surface height anomaly (SSHA) fields was developed. A set of parameters for the criteria in the SCS are determined through sensitivity tests. Our method detected more than 6000 eddy tracks in the South China Sea. For all of them, eddy diameters, track length, eddy intensity, eddy lifetime and eddy frequency were determined. The long-term trends and variability of those properties also has been derived. Most of the eddies propagate westward. Nearly 100 eddies travel longer than 1000km, and over 800 eddies have a lifespan of more than 2 months. Furthermore, for building the statistical empirical model, the relationship between the SCS eddy statistics and the large-scale atmospheric and oceanic phenomena has been investigated.
NASA Astrophysics Data System (ADS)
Pernot, Pascal; Savin, Andreas
2018-06-01
Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.
Statistical ecology comes of age.
Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric
2014-12-01
The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.
Statistical ecology comes of age
Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric
2014-01-01
The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151
Linking agent-based models and stochastic models of financial markets
Feng, Ling; Li, Baowen; Podobnik, Boris; Preis, Tobias; Stanley, H. Eugene
2012-01-01
It is well-known that financial asset returns exhibit fat-tailed distributions and long-term memory. These empirical features are the main objectives of modeling efforts using (i) stochastic processes to quantitatively reproduce these features and (ii) agent-based simulations to understand the underlying microscopic interactions. After reviewing selected empirical and theoretical evidence documenting the behavior of traders, we construct an agent-based model to quantitatively demonstrate that “fat” tails in return distributions arise when traders share similar technical trading strategies and decisions. Extending our behavioral model to a stochastic model, we derive and explain a set of quantitative scaling relations of long-term memory from the empirical behavior of individual market participants. Our analysis provides a behavioral interpretation of the long-term memory of absolute and squared price returns: They are directly linked to the way investors evaluate their investments by applying technical strategies at different investment horizons, and this quantitative relationship is in agreement with empirical findings. Our approach provides a possible behavioral explanation for stochastic models for financial systems in general and provides a method to parameterize such models from market data rather than from statistical fitting. PMID:22586086
Linking agent-based models and stochastic models of financial markets.
Feng, Ling; Li, Baowen; Podobnik, Boris; Preis, Tobias; Stanley, H Eugene
2012-05-29
It is well-known that financial asset returns exhibit fat-tailed distributions and long-term memory. These empirical features are the main objectives of modeling efforts using (i) stochastic processes to quantitatively reproduce these features and (ii) agent-based simulations to understand the underlying microscopic interactions. After reviewing selected empirical and theoretical evidence documenting the behavior of traders, we construct an agent-based model to quantitatively demonstrate that "fat" tails in return distributions arise when traders share similar technical trading strategies and decisions. Extending our behavioral model to a stochastic model, we derive and explain a set of quantitative scaling relations of long-term memory from the empirical behavior of individual market participants. Our analysis provides a behavioral interpretation of the long-term memory of absolute and squared price returns: They are directly linked to the way investors evaluate their investments by applying technical strategies at different investment horizons, and this quantitative relationship is in agreement with empirical findings. Our approach provides a possible behavioral explanation for stochastic models for financial systems in general and provides a method to parameterize such models from market data rather than from statistical fitting.
NASA Astrophysics Data System (ADS)
Mechlem, Korbinian; Ehn, Sebastian; Sellerer, Thorsten; Pfeiffer, Franz; Noël, Peter B.
2017-03-01
In spectral computed tomography (spectral CT), the additional information about the energy dependence of attenuation coefficients can be exploited to generate material selective images. These images have found applications in various areas such as artifact reduction, quantitative imaging or clinical diagnosis. However, significant noise amplification on material decomposed images remains a fundamental problem of spectral CT. Most spectral CT algorithms separate the process of material decomposition and image reconstruction. Separating these steps is suboptimal because the full statistical information contained in the spectral tomographic measurements cannot be exploited. Statistical iterative reconstruction (SIR) techniques provide an alternative, mathematically elegant approach to obtaining material selective images with improved tradeoffs between noise and resolution. Furthermore, image reconstruction and material decomposition can be performed jointly. This is accomplished by a forward model which directly connects the (expected) spectral projection measurements and the material selective images. To obtain this forward model, detailed knowledge of the different photon energy spectra and the detector response was assumed in previous work. However, accurately determining the spectrum is often difficult in practice. In this work, a new algorithm for statistical iterative material decomposition is presented. It uses a semi-empirical forward model which relies on simple calibration measurements. Furthermore, an efficient optimization algorithm based on separable surrogate functions is employed. This partially negates one of the major shortcomings of SIR, namely high computational cost and long reconstruction times. Numerical simulations and real experiments show strongly improved image quality and reduced statistical bias compared to projection-based material decomposition.
Estimating procedure times for surgeries by determining location parameters for the lognormal model.
Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H
2004-05-01
We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.
How to Quantify Deterministic and Random Influences on the Statistics of the Foreign Exchange Market
NASA Astrophysics Data System (ADS)
Friedrich, R.; Peinke, J.; Renner, Ch.
2000-05-01
It is shown that price changes of the U.S. dollar-German mark exchange rates upon different delay times can be regarded as a stochastic Marcovian process. Furthermore, we show how Kramers-Moyal coefficients can be estimated from the empirical data. Finally, we present an explicit Fokker-Planck equation which models very precisely the empirical probability distributions, in particular, their non-Gaussian heavy tails.
Modeling Zero-Inflated and Overdispersed Count Data: An Empirical Study of School Suspensions
ERIC Educational Resources Information Center
Desjardins, Christopher David
2016-01-01
The purpose of this article is to develop a statistical model that best explains variability in the number of school days suspended. Number of school days suspended is a count variable that may be zero-inflated and overdispersed relative to a Poisson model. Four models were examined: Poisson, negative binomial, Poisson hurdle, and negative…
Bootstrap Estimation of Sample Statistic Bias in Structural Equation Modeling.
ERIC Educational Resources Information Center
Thompson, Bruce; Fan, Xitao
This study empirically investigated bootstrap bias estimation in the area of structural equation modeling (SEM). Three correctly specified SEM models were used under four different sample size conditions. Monte Carlo experiments were carried out to generate the criteria against which bootstrap bias estimation should be judged. For SEM fit indices,…
Model Identification in Time-Series Analysis: Some Empirical Results.
ERIC Educational Resources Information Center
Padia, William L.
Model identification of time-series data is essential to valid statistical tests of intervention effects. Model identification is, at best, inexact in the social and behavioral sciences where one is often confronted with small numbers of observations. These problems are discussed, and the results of independent identifications of 130 social and…
Modeling stock price dynamics by continuum percolation system and relevant complex systems analysis
NASA Astrophysics Data System (ADS)
Xiao, Di; Wang, Jun
2012-10-01
The continuum percolation system is developed to model a random stock price process in this work. Recent empirical research has demonstrated various statistical features of stock price changes, the financial model aiming at understanding price fluctuations needs to define a mechanism for the formation of the price, in an attempt to reproduce and explain this set of empirical facts. The continuum percolation model is usually referred to as a random coverage process or a Boolean model, the local interaction or influence among traders is constructed by the continuum percolation, and a cluster of continuum percolation is applied to define the cluster of traders sharing the same opinion about the market. We investigate and analyze the statistical behaviors of normalized returns of the price model by some analysis methods, including power-law tail distribution analysis, chaotic behavior analysis and Zipf analysis. Moreover, we consider the daily returns of Shanghai Stock Exchange Composite Index from January 1997 to July 2011, and the comparisons of return behaviors between the actual data and the simulation data are exhibited.
Development and Implementation of an Empirical Ionosphere Variability Model
NASA Technical Reports Server (NTRS)
Minow, Joesph I.; Almond, Deborah (Technical Monitor)
2002-01-01
Spacecraft designers and operations support personnel involved in space environment analysis for low Earth orbit missions require ionospheric specification and forecast models that provide not only average ionospheric plasma parameters for a given set of geophysical conditions but the statistical variations about the mean as well. This presentation describes the development of a prototype empirical model intended for use with the International Reference Ionosphere (IRI) to provide ionospheric Ne and Te variability. We first describe the database of on-orbit observations from a variety of spacecraft and ground based radars over a wide range of latitudes and altitudes used to obtain estimates of the environment variability. Next, comparison of the observations with the IRI model provide estimates of the deviations from the average model as well as the range of possible values that may correspond to a given IRI output. Options for implementation of the statistical variations in software that can be run with the IRI model are described. Finally, we provide example applications including thrust estimates for tethered satellites and specification of sunrise Ne, Te conditions required to support spacecraft charging issues for satellites with high voltage solar arrays.
Predicting Operator Execution Times Using CogTool
NASA Technical Reports Server (NTRS)
Santiago-Espada, Yamira; Latorella, Kara A.
2013-01-01
Researchers and developers of NextGen systems can use predictive human performance modeling tools as an initial approach to obtain skilled user performance times analytically, before system testing with users. This paper describes the CogTool models for a two pilot crew executing two different types of a datalink clearance acceptance tasks, and on two different simulation platforms. The CogTool time estimates for accepting and executing Required Time of Arrival and Interval Management clearances were compared to empirical data observed in video tapes and registered in simulation files. Results indicate no statistically significant difference between empirical data and the CogTool predictions. A population comparison test found no significant differences between the CogTool estimates and the empirical execution times for any of the four test conditions. We discuss modeling caveats and considerations for applying CogTool to crew performance modeling in advanced cockpit environments.
NASA Astrophysics Data System (ADS)
Song, Seok Goo; Kwak, Sangmin; Lee, Kyungbook; Park, Donghee
2017-04-01
It is a critical element to predict the intensity and variability of strong ground motions in seismic hazard assessment. The characteristics and variability of earthquake rupture process may be a dominant factor in determining the intensity and variability of near-source strong ground motions. Song et al. (2014) demonstrated that the variability of earthquake rupture scenarios could be effectively quantified in the framework of 1-point and 2-point statistics of earthquake source parameters, constrained by rupture dynamics and past events. The developed pseudo-dynamic source modeling schemes were also validated against the recorded ground motion data of past events and empirical ground motion prediction equations (GMPEs) at the broadband platform (BBP) developed by the Southern California Earthquake Center (SCEC). Recently we improved the computational efficiency of the developed pseudo-dynamic source-modeling scheme by adopting the nonparametric co-regionalization algorithm, introduced and applied in geostatistics initially. We also investigated the effect of earthquake rupture process on near-source ground motion characteristics in the framework of 1-point and 2-point statistics, particularly focusing on the forward directivity region. Finally we will discuss whether the pseudo-dynamic source modeling can reproduce the variability (standard deviation) of empirical GMPEs and the efficiency of 1-point and 2-point statistics to address the variability of ground motions.
NASA Astrophysics Data System (ADS)
Kamiyama, M.; Orourke, M. J.; Flores-Berrones, R.
1992-09-01
A new type of semi-empirical expression for scaling strong-motion peaks in terms of seismic source, propagation path, and local site conditions is derived. Peak acceleration, peak velocity, and peak displacement are analyzed in a similar fashion because they are interrelated. However, emphasis is placed on the peak velocity which is a key ground motion parameter for lifeline earthquake engineering studies. With the help of seismic source theories, the semi-empirical model is derived using strong motions obtained in Japan. In the derivation, statistical considerations are used in the selection of the model itself and the model parameters. Earthquake magnitude M and hypocentral distance r are selected as independent variables and the dummy variables are introduced to identify the amplification factor due to individual local site conditions. The resulting semi-empirical expressions for the peak acceleration, velocity, and displacement are then compared with strong-motion data observed during three earthquakes in the U.S. and Mexico.
A new concept in seismic landslide hazard analysis for practical application
NASA Astrophysics Data System (ADS)
Lee, Chyi-Tyi
2017-04-01
A seismic landslide hazard model could be constructed using deterministic approach (Jibson et al., 2000) or statistical approach (Lee, 2014). Both approaches got landslide spatial probability under a certain return-period earthquake. In the statistical approach, our recent study found that there are common patterns among different landslide susceptibility models of the same region. The common susceptibility could reflect relative stability of slopes at a region; higher susceptibility indicates lower stability. Using the common susceptibility together with an earthquake event landslide inventory and a map of topographically corrected Arias intensity, we can build the relationship among probability of failure, Arias intensity and the susceptibility. This relationship can immediately be used to construct a seismic landslide hazard map for the region that the empirical relationship built. If the common susceptibility model is further normalized and the empirical relationship built with normalized susceptibility, then the empirical relationship may be practically applied to different region with similar tectonic environments and climate conditions. This could be feasible, when a region has no existing earthquake-induce landslide data to train the susceptibility model and to build the relationship. It is worth mentioning that a rain-induced landslide susceptibility model has common pattern similar to earthquake-induced landslide susceptibility in the same region, and is usable to build the relationship with an earthquake event landslide inventory and a map of Arias intensity. These will be introduced with examples in the meeting.
Role of Statistical Random-Effects Linear Models in Personalized Medicine.
Diaz, Francisco J; Yeh, Hung-Wen; de Leon, Jose
2012-03-01
Some empirical studies and recent developments in pharmacokinetic theory suggest that statistical random-effects linear models are valuable tools that allow describing simultaneously patient populations as a whole and patients as individuals. This remarkable characteristic indicates that these models may be useful in the development of personalized medicine, which aims at finding treatment regimes that are appropriate for particular patients, not just appropriate for the average patient. In fact, published developments show that random-effects linear models may provide a solid theoretical framework for drug dosage individualization in chronic diseases. In particular, individualized dosages computed with these models by means of an empirical Bayesian approach may produce better results than dosages computed with some methods routinely used in therapeutic drug monitoring. This is further supported by published empirical and theoretical findings that show that random effects linear models may provide accurate representations of phase III and IV steady-state pharmacokinetic data, and may be useful for dosage computations. These models have applications in the design of clinical algorithms for drug dosage individualization in chronic diseases; in the computation of dose correction factors; computation of the minimum number of blood samples from a patient that are necessary for calculating an optimal individualized drug dosage in therapeutic drug monitoring; measure of the clinical importance of clinical, demographic, environmental or genetic covariates; study of drug-drug interactions in clinical settings; the implementation of computational tools for web-site-based evidence farming; design of pharmacogenomic studies; and in the development of a pharmacological theory of dosage individualization.
Parameter estimation and order selection for an empirical model of VO2 on-kinetics.
Alata, O; Bernard, O
2007-04-27
In humans, VO2 on-kinetics are noisy numerical signals that reflect the pulmonary oxygen exchange kinetics at the onset of exercise. They are empirically modelled as a sum of an offset and delayed exponentials. The number of delayed exponentials; i.e. the order of the model, is commonly supposed to be 1 for low-intensity exercises and 2 for high-intensity exercises. As no ground truth has ever been provided to validate these postulates, physiologists still need statistical methods to verify their hypothesis about the number of exponentials of the VO2 on-kinetics especially in the case of high-intensity exercises. Our objectives are first to develop accurate methods for estimating the parameters of the model at a fixed order, and then, to propose statistical tests for selecting the appropriate order. In this paper, we provide, on simulated Data, performances of Simulated Annealing for estimating model parameters and performances of Information Criteria for selecting the order. These simulated Data are generated with both single-exponential and double-exponential models, and noised by white and Gaussian noise. The performances are given at various Signal to Noise Ratio (SNR). Considering parameter estimation, results show that the confidences of estimated parameters are improved by increasing the SNR of the response to be fitted. Considering model selection, results show that Information Criteria are adapted statistical criteria to select the number of exponentials.
VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data
Daunizeau, Jean; Adam, Vincent; Rigoux, Lionel
2014-01-01
This work is in line with an on-going effort tending toward a computational (quantitative and refutable) understanding of human neuro-cognitive processes. Many sophisticated models for behavioural and neurobiological data have flourished during the past decade. Most of these models are partly unspecified (i.e. they have unknown parameters) and nonlinear. This makes them difficult to peer with a formal statistical data analysis framework. In turn, this compromises the reproducibility of model-based empirical studies. This work exposes a software toolbox that provides generic, efficient and robust probabilistic solutions to the three problems of model-based analysis of empirical data: (i) data simulation, (ii) parameter estimation/model selection, and (iii) experimental design optimization. PMID:24465198
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2014-01-01
This report describes a modeling and simulation approach for disturbance patterns representative of the environment experienced by a digital system in an electromagnetic reverberation chamber. The disturbance is modeled by a multi-variate statistical distribution based on empirical observations. Extended versions of the Rejection Samping and Inverse Transform Sampling techniques are developed to generate multi-variate random samples of the disturbance. The results show that Inverse Transform Sampling returns samples with higher fidelity relative to the empirical distribution. This work is part of an ongoing effort to develop a resilience assessment methodology for complex safety-critical distributed systems.
Component Models for Fuzzy Data
ERIC Educational Resources Information Center
Coppi, Renato; Giordani, Paolo; D'Urso, Pierpaolo
2006-01-01
The fuzzy perspective in statistical analysis is first illustrated with reference to the "Informational Paradigm" allowing us to deal with different types of uncertainties related to the various informational ingredients (data, model, assumptions). The fuzzy empirical data are then introduced, referring to "J" LR fuzzy variables as observed on "I"…
Code of Federal Regulations, 2010 CFR
2010-04-01
... charges. An OTC derivatives dealer shall provide a description of all statistical models used for pricing... controls over those models, and a statement regarding whether the firm has developed its own internal VAR models. If the OTC derivatives dealer's VAR model incorporates empirical correlations across risk...
Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I
2013-05-01
When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.
Stochastic Price Models and Optimal Tree Cutting: Results for Loblolly Pine
Robert G. Haight; Thomas P. Holmes
1991-01-01
An empirical investigation of stumpage price models and optimal harvest policies is conducted for loblolly pine plantations in the southeastern United States. The stationarity of monthly and quarterly series of sawtimber prices is analyzed using a unit root test. The statistical evidence supports stationary autoregressive models for the monthly series and for the...
Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory
ERIC Educational Resources Information Center
Wells, Craig S.; Bolt, Daniel M.
2008-01-01
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
NASA Astrophysics Data System (ADS)
Xu, M., III; Liu, X.
2017-12-01
In the past 60 years, both the runoff and sediment load in the Yellow River Basin showed significant decreasing trends owing to the influences of human activities and climate change. Quantifying the impact of each factor (e.g. precipitation, sediment trapping dams, pasture, terrace, etc.) on the runoff and sediment load is among the key issues to guide the implement of water and soil conservation measures, and to predict the variation trends in the future. Hundreds of methods have been developed for studying the runoff and sediment load in the Yellow River Basin. Generally, these methods can be classified into empirical methods and physical-based models. The empirical methods, including hydrological method, soil and water conservation method, etc., are widely used in the Yellow River management engineering. These methods generally apply the statistical analyses like the regression analysis to build the empirical relationships between the main characteristic variables in a river basin. The elasticity method extensively used in the hydrological research can be classified into empirical method as it is mathematically deduced to be equivalent with the hydrological method. Physical-based models mainly include conceptual models and distributed models. The conceptual models are usually lumped models (e.g. SYMHD model, etc.) and can be regarded as transition of empirical models and distributed models. Seen from the publications that less studies have been conducted applying distributed models than empirical models as the simulation results of runoff and sediment load based on distributed models (e.g. the Digital Yellow Integrated Model, the Geomorphology-Based Hydrological Model, etc.) were usually not so satisfied owing to the intensive human activities in the Yellow River Basin. Therefore, this study primarily summarizes the empirical models applied in the Yellow River Basin and theoretically analyzes the main causes for the significantly different results using different empirical researching methods. Besides, we put forward an assessment frame for the researching methods of the runoff and sediment load variations in the Yellow River Basin from the point of view of inputting data, model structure and result output. And the assessment frame was then applied in the Huangfuchuan River.
Single photon counting linear mode avalanche photodiode technologies
NASA Astrophysics Data System (ADS)
Williams, George M.; Huntington, Andrew S.
2011-10-01
The false count rate of a single-photon-sensitive photoreceiver consisting of a high-gain, low-excess-noise linear-mode InGaAs avalanche photodiode (APD) and a high-bandwidth transimpedance amplifier (TIA) is fit to a statistical model. The peak height distribution of the APD's multiplied dark current is approximated by the weighted sum of McIntyre distributions, each characterizing dark current generated at a different location within the APD's junction. The peak height distribution approximated in this way is convolved with a Gaussian distribution representing the input-referred noise of the TIA to generate the statistical distribution of the uncorrelated sum. The cumulative distribution function (CDF) representing count probability as a function of detection threshold is computed, and the CDF model fit to empirical false count data. It is found that only k=0 McIntyre distributions fit the empirically measured CDF at high detection threshold, and that false count rate drops faster than photon count rate as detection threshold is raised. Once fit to empirical false count data, the model predicts the improvement of the false count rate to be expected from reductions in TIA noise and APD dark current. Improvement by at least three orders of magnitude is thought feasible with further manufacturing development and a capacitive-feedback TIA (CTIA).
A wavelet-based statistical analysis of FMRI data: I. motivation and data distribution modeling.
Dinov, Ivo D; Boscardin, John W; Mega, Michael S; Sowell, Elizabeth L; Toga, Arthur W
2005-01-01
We propose a new method for statistical analysis of functional magnetic resonance imaging (fMRI) data. The discrete wavelet transformation is employed as a tool for efficient and robust signal representation. We use structural magnetic resonance imaging (MRI) and fMRI to empirically estimate the distribution of the wavelet coefficients of the data both across individuals and spatial locations. An anatomical subvolume probabilistic atlas is used to tessellate the structural and functional signals into smaller regions each of which is processed separately. A frequency-adaptive wavelet shrinkage scheme is employed to obtain essentially optimal estimations of the signals in the wavelet space. The empirical distributions of the signals on all the regions are computed in a compressed wavelet space. These are modeled by heavy-tail distributions because their histograms exhibit slower tail decay than the Gaussian. We discovered that the Cauchy, Bessel K Forms, and Pareto distributions provide the most accurate asymptotic models for the distribution of the wavelet coefficients of the data. Finally, we propose a new model for statistical analysis of functional MRI data using this atlas-based wavelet space representation. In the second part of our investigation, we will apply this technique to analyze a large fMRI dataset involving repeated presentation of sensory-motor response stimuli in young, elderly, and demented subjects.
NASA Astrophysics Data System (ADS)
Cianciara, Aleksander
2016-09-01
The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.
NASA Astrophysics Data System (ADS)
Tsutsumi, Morito; Seya, Hajime
2009-12-01
This study discusses the theoretical foundation of the application of spatial hedonic approaches—the hedonic approach employing spatial econometrics or/and spatial statistics—to benefits evaluation. The study highlights the limitations of the spatial econometrics approach since it uses a spatial weight matrix that is not employed by the spatial statistics approach. Further, the study presents empirical analyses by applying the Spatial Autoregressive Error Model (SAEM), which is based on the spatial econometrics approach, and the Spatial Process Model (SPM), which is based on the spatial statistics approach. SPMs are conducted based on both isotropy and anisotropy and applied to different mesh sizes. The empirical analysis reveals that the estimated benefits are quite different, especially between isotropic and anisotropic SPM and between isotropic SPM and SAEM; the estimated benefits are similar for SAEM and anisotropic SPM. The study demonstrates that the mesh size does not affect the estimated amount of benefits. Finally, the study provides a confidence interval for the estimated benefits and raises an issue with regard to benefit evaluation.
Statistical validity of using ratio variables in human kinetics research.
Liu, Yuanlong; Schutz, Robert W
2003-09-01
The purposes of this study were to investigate the validity of the simple ratio and three alternative deflation models and examine how the variation of the numerator and denominator variables affects the reliability of a ratio variable. A simple ratio and three alternative deflation models were fitted to four empirical data sets, and common criteria were applied to determine the best model for deflation. Intraclass correlation was used to examine the component effect on the reliability of a ratio variable. The results indicate that the validity, of a deflation model depends on the statistical characteristics of the particular component variables used, and an optimal deflation model for all ratio variables may not exist. Therefore, it is recommended that different models be fitted to each empirical data set to determine the best deflation model. It was found that the reliability of a simple ratio is affected by the coefficients of variation and the within- and between-trial correlations between the numerator and denominator variables. It was recommended that researchers should compute the reliability of the derived ratio scores and not assume that strong reliabilities in the numerator and denominator measures automatically lead to high reliability in the ratio measures.
Modified Distribution-Free Goodness-of-Fit Test Statistic.
Chun, So Yeon; Browne, Michael W; Shapiro, Alexander
2018-03-01
Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
An Application of Structural Equation Modeling for Developing Good Teaching Characteristics Ontology
ERIC Educational Resources Information Center
Phiakoksong, Somjin; Niwattanakul, Suphakit; Angskun, Thara
2013-01-01
Ontology is a knowledge representation technique which aims to make knowledge explicit by defining the core concepts and their relationships. The Structural Equation Modeling (SEM) is a statistical technique which aims to explore the core factors from empirical data and estimates the relationship between these factors. This article presents an…
ERIC Educational Resources Information Center
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Understanding macroscale invasion patterns and processes with FIA data
Songlin Fei; Basil V. Iannone III; Christopher M. Oswalt; Qinfeng Guo; Kevin M. Potter; Sonja N. Oswalt; Bryan C. Pijanowski; Gabriela C. Nunez-Mir
2015-01-01
Using empirical data from FIA, we modeled invasion richness and invasion prevalence as functions of 22 factors reflective of propagule pressure and/or habitat invasibility across the continental US. Our statistical models suggest that both propagule pressure and habitat invasibility contribute to macroscale patterns of forest plant invasions. Our investigation provides...
Rates of profit as correlated sums of random variables
NASA Astrophysics Data System (ADS)
Greenblatt, R. E.
2013-10-01
Profit realization is the dominant feature of market-based economic systems, determining their dynamics to a large extent. Rather than attaining an equilibrium, profit rates vary widely across firms, and the variation persists over time. Differing definitions of profit result in differing empirical distributions. To study the statistical properties of profit rates, I used data from a publicly available database for the US Economy for 2009-2010 (Risk Management Association). For each of three profit rate measures, the sample space consists of 771 points. Each point represents aggregate data from a small number of US manufacturing firms of similar size and type (NAICS code of principal product). When comparing the empirical distributions of profit rates, significant ‘heavy tails’ were observed, corresponding principally to a number of firms with larger profit rates than would be expected from simple models. An apparently novel correlated sum of random variables statistical model was used to model the data. In the case of operating and net profit rates, a number of firms show negative profits (losses), ruling out simple gamma or lognormal distributions as complete models for these data.
Empirical likelihood method for non-ignorable missing data problems.
Guan, Zhong; Qin, Jing
2017-01-01
Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.
Dominant role of many-body effects on the carrier distribution function of quantum dot lasers
NASA Astrophysics Data System (ADS)
Peyvast, Negin; Zhou, Kejia; Hogg, Richard A.; Childs, David T. D.
2016-03-01
The effects of free-carrier-induced shift and broadening on the carrier distribution function are studied considering different extreme cases for carrier statistics (Fermi-Dirac and random carrier distributions) as well as quantum dot (QD) ensemble inhomogeneity and state separation using a Monte Carlo model. Using this model, we show that the dominant factor determining the carrier distribution function is the free carrier effects and not the choice of carrier statistics. By using empirical values of the free-carrier-induced shift and broadening, good agreement is obtained with experimental data of QD materials obtained under electrical injection for both extreme cases of carrier statistics.
ERIC Educational Resources Information Center
Chen, Greg; Weikart, Lynne A.
2008-01-01
This study develops and tests a school disorder and student achievement model based upon the school climate framework. The model was fitted to 212 New York City middle schools using the Structural Equations Modeling Analysis method. The analysis shows that the model fits the data well based upon test statistics and goodness of fit indices. The…
Louis R. Iverson; Anantha M. Prasad; Stephen N. Matthews; Matthew P. Peters
2011-01-01
We present an approach to modeling potential climate-driven changes in habitat for tree and bird species in the eastern United States. First, we took an empirical-statistical modeling approach, using randomForest, with species abundance data from national inventories combined with soil, climate, and landscape variables, to build abundance-based habitat models for 134...
Toward a Model-Based Approach to the Clinical Assessment of Personality Psychopathology
Eaton, Nicholas R.; Krueger, Robert F.; Docherty, Anna R.; Sponheim, Scott R.
2015-01-01
Recent years have witnessed tremendous growth in the scope and sophistication of statistical methods available to explore the latent structure of psychopathology, involving continuous, discrete, and hybrid latent variables. The availability of such methods has fostered optimism that they can facilitate movement from classification primarily crafted through expert consensus to classification derived from empirically-based models of psychopathological variation. The explication of diagnostic constructs with empirically supported structures can then facilitate the development of assessment tools that appropriately characterize these constructs. Our goal in this paper is to illustrate how new statistical methods can inform conceptualization of personality psychopathology and therefore its assessment. We use magical thinking as example, because both theory and earlier empirical work suggested the possibility of discrete aspects to the latent structure of personality psychopathology, particularly forms of psychopathology involving distortions of reality testing, yet other data suggest that personality psychopathology is generally continuous in nature. We directly compared the fit of a variety of latent variable models to magical thinking data from a sample enriched with clinically significant variation in psychotic symptomatology for explanatory purposes. Findings generally suggested a continuous latent variable model best represented magical thinking, but results varied somewhat depending on different indices of model fit. We discuss the implications of the findings for classification and applied personality assessment. We also highlight some limitations of this type of approach that are illustrated by these data, including the importance of substantive interpretation, in addition to use of model fit indices, when evaluating competing structural models. PMID:24007309
Role of Statistical Random-Effects Linear Models in Personalized Medicine
Diaz, Francisco J; Yeh, Hung-Wen; de Leon, Jose
2012-01-01
Some empirical studies and recent developments in pharmacokinetic theory suggest that statistical random-effects linear models are valuable tools that allow describing simultaneously patient populations as a whole and patients as individuals. This remarkable characteristic indicates that these models may be useful in the development of personalized medicine, which aims at finding treatment regimes that are appropriate for particular patients, not just appropriate for the average patient. In fact, published developments show that random-effects linear models may provide a solid theoretical framework for drug dosage individualization in chronic diseases. In particular, individualized dosages computed with these models by means of an empirical Bayesian approach may produce better results than dosages computed with some methods routinely used in therapeutic drug monitoring. This is further supported by published empirical and theoretical findings that show that random effects linear models may provide accurate representations of phase III and IV steady-state pharmacokinetic data, and may be useful for dosage computations. These models have applications in the design of clinical algorithms for drug dosage individualization in chronic diseases; in the computation of dose correction factors; computation of the minimum number of blood samples from a patient that are necessary for calculating an optimal individualized drug dosage in therapeutic drug monitoring; measure of the clinical importance of clinical, demographic, environmental or genetic covariates; study of drug-drug interactions in clinical settings; the implementation of computational tools for web-site-based evidence farming; design of pharmacogenomic studies; and in the development of a pharmacological theory of dosage individualization. PMID:23467392
ERIC Educational Resources Information Center
Pavel, D. Michael
This paper on postsecondary outcomes illustrates a technique to determine whether or not mainstream models are appropriate for predicting educational outcomes of American Indians (AIs) and Alaskan Native (ANs). It introduces a prominent statistical procedure to assess models with empirical data and shows how the results can have implications for…
1989-03-15
essence of the idea ycessible mtho forunrtandig eth- Tis tand thP ra) rm guh ide propet oaes nd d of e aessie meh bsd fooesadng asymptoti- isthe for s...network? This of Such empirical parametric model fitting is of course depends heavily on the class of net- course the essence of much of applied...smaller problems is the essence of graphical modeling. A model hy- attributes. Let e be the discrete joint outcome space for those N pergraph, g
Efficient estimation of Pareto model: Some modified percentile estimators.
Bhatti, Sajjad Haider; Hussain, Shahzad; Ahmad, Tanvir; Aslam, Muhammad; Aftab, Muhammad; Raza, Muhammad Ali
2018-01-01
The article proposes three modified percentile estimators for parameter estimation of the Pareto distribution. These modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first-order statistic. The proposed modified estimators are compared with traditional percentile estimators through a Monte Carlo simulation for different parameter combinations with varying sample sizes. Performance of different estimators is assessed in terms of total mean square error and total relative deviation. It is determined that modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic provides efficient and precise parameter estimates compared to other estimators considered. The simulation results were further confirmed using two real life examples where maximum likelihood and moment estimators were also considered.
Increasing the relevance of GCM simulations for Climate Services
NASA Astrophysics Data System (ADS)
Smith, L. A.; Suckling, E.
2012-12-01
The design and interpretation of model simulations for climate services differ significantly from experimental design for the advancement of the fundamental research on predictability that underpins it. Climate services consider the sources of best information available today; this calls for a frank evaluation of model skill in the face of statistical benchmarks defined by empirical models. The fact that Physical simulation models are thought to provide the only reliable method for extrapolating into conditions not previously observed has no bearing on whether or not today's simulation models outperform empirical models. Evidence on the length scales on which today's simulation models fail to outperform empirical benchmarks is presented; it is illustrated that this occurs even on global scales in decadal prediction. At all timescales considered thus far (as of July 2012), predictions based on simulation models are improved by blending with the output of statistical models. Blending is shown to be more interesting in the climate context than it is in the weather context, where blending with a history-based climatology is straightforward. As GCMs improve and as the Earth's climate moves further from that of the last century, the skill from simulation models and their relevance to climate services is expected to increase. Examples from both seasonal and decadal forecasting will be used to discuss a third approach that may increase the role of current GCMs more quickly. Specifically, aspects of the experimental design in previous hind cast experiments are shown to hinder the use of GCM simulations for climate services. Alternative designs are proposed. The value in revisiting Thompson's classic approach to improving weather forecasting in the fifties in the context of climate services is discussed.
Empirical microeconomics action functionals
NASA Astrophysics Data System (ADS)
Baaquie, Belal E.; Du, Xin; Tanputraman, Winson
2015-06-01
A statistical generalization of microeconomics has been made in Baaquie (2013), where the market price of every traded commodity, at each instant of time, is considered to be an independent random variable. The dynamics of commodity market prices is modeled by an action functional-and the focus of this paper is to empirically determine the action functionals for different commodities. The correlation functions of the model are defined using a Feynman path integral. The model is calibrated using the unequal time correlation of the market commodity prices as well as their cubic and quartic moments using a perturbation expansion. The consistency of the perturbation expansion is verified by a numerical evaluation of the path integral. Nine commodities drawn from the energy, metal and grain sectors are studied and their market behavior is described by the model to an accuracy of over 90% using only six parameters. The paper empirically establishes the existence of the action functional for commodity prices that was postulated to exist in Baaquie (2013).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Busuioc, A.; Storch, H. von; Schnur, R.
Empirical downscaling procedures relate large-scale atmospheric features with local features such as station rainfall in order to facilitate local scenarios of climate change. The purpose of the present paper is twofold: first, a downscaling technique is used as a diagnostic tool to verify the performance of climate models on the regional scale; second, a technique is proposed for verifying the validity of empirical downscaling procedures in climate change applications. The case considered is regional seasonal precipitation in Romania. The downscaling model is a regression based on canonical correlation analysis between observed station precipitation and European-scale sea level pressure (SLP). Themore » climate models considered here are the T21 and T42 versions of the Hamburg ECHAM3 atmospheric GCM run in time-slice mode. The climate change scenario refers to the expected time of doubled carbon dioxide concentrations around the year 2050. Generally, applications of statistical downscaling to climate change scenarios have been based on the assumption that the empirical link between the large-scale and regional parameters remains valid under a changed climate. In this study, a rationale is proposed for this assumption by showing the consistency of the 2 x CO{sub 2} GCM scenarios in winter, derived directly from the gridpoint data, with the regional scenarios obtained through empirical downscaling. Since the skill of the GCMs in regional terms is already established, it is concluded that the downscaling technique is adequate for describing climatically changing regional and local conditions, at least for precipitation in Romania during winter.« less
Hoyle, R H
1991-02-01
Indirect measures of psychological constructs are vital to clinical research. On occasion, however, the meaning of indirect measures of psychological constructs is obfuscated by statistical procedures that do not account for the complex relations between items and latent variables and among latent variables. Covariance structure analysis (CSA) is a statistical procedure for testing hypotheses about the relations among items that indirectly measure a psychological construct and relations among psychological constructs. This article introduces clinical researchers to the strengths and limitations of CSA as a statistical procedure for conceiving and testing structural hypotheses that are not tested adequately with other statistical procedures. The article is organized around two empirical examples that illustrate the use of CSA for evaluating measurement models with correlated error terms, higher-order factors, and measured and latent variables.
Improving an Empirical Formula for the Absorption of Sound in the Sea
2008-05-01
Onderwater propagatie Advanced acoustic modelling Auteur (s) ir. C.A.M van Moll Programmanummer Projectnummer dr. M.A. Ainslie V512 032.11648 ing. J...versus calculated absorption .................................................................... 9 3 Inverse theory ...45 7.1 Statistical theory
LANDSCAPE ASSESSMENT TOOLS FOR WATERSHED CHARACTERIZATION
A combination of process-based, empirical and statistical models has been developed to assist states in their efforts to assess water quality, locate impairments over large areas, and calculate TMDL allocations. By synthesizing outputs from a number of these tools, LIPS demonstr...
Comparisons of thermospheric density data sets and models
NASA Astrophysics Data System (ADS)
Doornbos, Eelco; van Helleputte, Tom; Emmert, John; Drob, Douglas; Bowman, Bruce R.; Pilinski, Marcin
During the past decade, continuous long-term data sets of thermospheric density have become available to researchers. These data sets have been derived from accelerometer measurements made by the CHAMP and GRACE satellites and from Space Surveillance Network (SSN) tracking data and related Two-Line Element (TLE) sets. These data have already resulted in a large number of publications on physical interpretation and improvement of empirical density modelling. This study compares four different density data sets and two empirical density models, for the period 2002-2009. These data sources are the CHAMP (1) and GRACE (2) accelerometer measurements, the long-term database of densities derived from TLE data (3), the High Accuracy Satellite Drag Model (4) run by Air Force Space Command, calibrated using SSN data, and the NRLMSISE-00 (5) and Jacchia-Bowman 2008 (6) empirical models. In describing these data sets and models, specific attention is given to differences in the geo-metrical and aerodynamic satellite modelling, applied in the conversion from drag to density measurements, which are main sources of density biases. The differences in temporal and spa-tial resolution of the density data sources are also described and taken into account. With these aspects in mind, statistics of density comparisons have been computed, both as a function of solar and geomagnetic activity levels, and as a function of latitude and local solar time. These statistics give a detailed view of the relative accuracy of the different data sets and of the biases between them. The differences are analysed with the aim at providing rough error bars on the data and models and pinpointing issues which could receive attention in future iterations of data processing algorithms and in future model development.
Ng'andu, N H
1997-03-30
In the analysis of survival data using the Cox proportional hazard (PH) model, it is important to verify that the explanatory variables analysed satisfy the proportional hazard assumption of the model. This paper presents results of a simulation study that compares five test statistics to check the proportional hazard assumption of Cox's model. The test statistics were evaluated under proportional hazards and the following types of departures from the proportional hazard assumption: increasing relative hazards; decreasing relative hazards; crossing hazards; diverging hazards, and non-monotonic hazards. The test statistics compared include those based on partitioning of failure time and those that do not require partitioning of failure time. The simulation results demonstrate that the time-dependent covariate test, the weighted residuals score test and the linear correlation test have equally good power for detection of non-proportionality in the varieties of non-proportional hazards studied. Using illustrative data from the literature, these test statistics performed similarly.
Monroe, Scott; Cai, Li
2015-01-01
This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.
Empirical validation of an agent-based model of wood markets in Switzerland
Hilty, Lorenz M.; Lemm, Renato; Thees, Oliver
2018-01-01
We present an agent-based model of wood markets and show our efforts to validate this model using empirical data from different sources, including interviews, workshops, experiments, and official statistics. Own surveys closed gaps where data was not available. Our approach to model validation used a variety of techniques, including the replication of historical production amounts, prices, and survey results, as well as a historical case study of a large sawmill entering the market and becoming insolvent only a few years later. Validating the model using this case provided additional insights, showing how the model can be used to simulate scenarios of resource availability and resource allocation. We conclude that the outcome of the rigorous validation qualifies the model to simulate scenarios concerning resource availability and allocation in our study region. PMID:29351300
ERIC Educational Resources Information Center
Hoover, H. D.; Plake, Barbara
The relative power of the Mann-Whitney statistic, the t-statistic, the median test, a test based on exceedances (A,B), and two special cases of (A,B) the Tukey quick test and the revised Tukey quick test, was investigated via a Monte Carlo experiment. These procedures were compared across four population probability models: uniform, beta, normal,…
A BRDF statistical model applying to space target materials modeling
NASA Astrophysics Data System (ADS)
Liu, Chenghao; Li, Zhi; Xu, Can; Tian, Qichen
2017-10-01
In order to solve the problem of poor effect in modeling the large density BRDF measured data with five-parameter semi-empirical model, a refined statistical model of BRDF which is suitable for multi-class space target material modeling were proposed. The refined model improved the Torrance-Sparrow model while having the modeling advantages of five-parameter model. Compared with the existing empirical model, the model contains six simple parameters, which can approximate the roughness distribution of the material surface, can approximate the intensity of the Fresnel reflectance phenomenon and the attenuation of the reflected light's brightness with the azimuth angle changes. The model is able to achieve parameter inversion quickly with no extra loss of accuracy. The genetic algorithm was used to invert the parameters of 11 different samples in the space target commonly used materials, and the fitting errors of all materials were below 6%, which were much lower than those of five-parameter model. The effect of the refined model is verified by comparing the fitting results of the three samples at different incident zenith angles in 0° azimuth angle. Finally, the three-dimensional modeling visualizations of these samples in the upper hemisphere space was given, in which the strength of the optical scattering of different materials could be clearly shown. It proved the good describing ability of the refined model at the material characterization as well.
The role of empirical Bayes methodology as a leading principle in modern medical statistics.
van Houwelingen, Hans C
2014-11-01
This paper reviews and discusses the role of Empirical Bayes methodology in medical statistics in the last 50 years. It gives some background on the origin of the empirical Bayes approach and its link with the famous Stein estimator. The paper describes the application in four important areas in medical statistics: disease mapping, health care monitoring, meta-analysis, and multiple testing. It ends with a warning that the application of the outcome of an empirical Bayes analysis to the individual "subjects" is a delicate matter that should be handled with prudence and care. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Gontis, V.; Kononovicius, A.
2017-10-01
We address the problem of long-range memory in the financial markets. There are two conceptually different ways to reproduce power-law decay of auto-correlation function: using fractional Brownian motion as well as non-linear stochastic differential equations. In this contribution we address this problem by analyzing empirical return and trading activity time series from the Forex. From the empirical time series we obtain probability density functions of burst and inter-burst duration. Our analysis reveals that the power-law exponents of the obtained probability density functions are close to 3 / 2, which is a characteristic feature of the one-dimensional stochastic processes. This is in a good agreement with earlier proposed model of absolute return based on the non-linear stochastic differential equations derived from the agent-based herding model.
Linking customisation of ERP systems to support effort: an empirical study
NASA Astrophysics Data System (ADS)
Koch, Stefan; Mitteregger, Kurt
2016-01-01
The amount of customisation to an enterprise resource planning (ERP) system has always been a major concern in the context of the implementation. This article focuses on the phase of maintenance and presents an empirical study about the relationship between the amount of customising and the resulting support effort. We establish a structural equation modelling model that explains support effort using customisation effort, organisational characteristics and scope of implementation. The findings using data from an ERP provider show that there is a statistically significant effect: with an increasing amount of customisation, the quantity of telephone calls to support increases, as well as the duration of each call.
NASA Technical Reports Server (NTRS)
Matney, Mark
2011-01-01
A number of statistical tools have been developed over the years for assessing the risk of reentering objects to human populations. These tools make use of the characteristics (e.g., mass, material, shape, size) of debris that are predicted by aerothermal models to survive reentry. The statistical tools use this information to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of the analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. Because this information is used in making policy and engineering decisions, it is important that these assumptions be tested using empirical data. This study uses the latest database of known uncontrolled reentry locations measured by the United States Department of Defense. The predicted ground footprint distributions of these objects are based on the theory that their orbits behave basically like simple Kepler orbits. However, there are a number of factors in the final stages of reentry - including the effects of gravitational harmonics, the effects of the Earth s equatorial bulge on the atmosphere, and the rotation of the Earth and atmosphere - that could cause them to diverge from simple Kepler orbit behavior and possibly change the probability of reentering over a given location. In this paper, the measured latitude and longitude distributions of these objects are directly compared with the predicted distributions, providing a fundamental empirical test of the model assumptions.
ERIC Educational Resources Information Center
Miller, Joshua D.; Lynam, Donald R.
2008-01-01
Assessment of the "Diagnostic and Statistical Manual of Mental Disorders" (4th Ed.; "DSM-IV") personality disorders (PDs) using five-factor model (FFM) prototypes and counts has shown substantial promise, with a few exceptions. Miller, Reynolds, and Pilkonis suggested that the expert-generated FFM dependent prototype might be misspecified in…
Empirical likelihood-based tests for stochastic ordering
BARMI, HAMMOU EL; MCKEAGUE, IAN W.
2013-01-01
This paper develops an empirical likelihood approach to testing for the presence of stochastic ordering among univariate distributions based on independent random samples from each distribution. The proposed test statistic is formed by integrating a localized empirical likelihood statistic with respect to the empirical distribution of the pooled sample. The asymptotic null distribution of this test statistic is found to have a simple distribution-free representation in terms of standard Brownian bridge processes. The approach is used to compare the lengths of rule of Roman Emperors over various historical periods, including the “decline and fall” phase of the empire. In a simulation study, the power of the proposed test is found to improve substantially upon that of a competing test due to El Barmi and Mukerjee. PMID:23874142
ERIC Educational Resources Information Center
Hsieh, Chueh-An; Maier, Kimberly S.
2009-01-01
The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection.…
ERIC Educational Resources Information Center
Collazo, Andrés A.
2018-01-01
A model derived from the theory of planned behavior was empirically assessed for understanding faculty intention to use student ratings for teaching improvement. A sample of 175 professors participated in the study. The model was statistically significant and had a very large explanatory power. Instrumental attitude, affective attitude, perceived…
Statistical wave climate projections for coastal impact assessments
NASA Astrophysics Data System (ADS)
Camus, P.; Losada, I. J.; Izaguirre, C.; Espejo, A.; Menéndez, M.; Pérez, J.
2017-09-01
Global multimodel wave climate projections are obtained at 1.0° × 1.0° scale from 30 Coupled Model Intercomparison Project Phase 5 (CMIP5) global circulation model (GCM) realizations. A semi-supervised weather-typing approach based on a characterization of the ocean wave generation areas and the historical wave information from the recent GOW2 database are used to train the statistical model. This framework is also applied to obtain high resolution projections of coastal wave climate and coastal impacts as port operability and coastal flooding. Regional projections are estimated using the collection of weather types at spacing of 1.0°. This assumption is feasible because the predictor is defined based on the wave generation area and the classification is guided by the local wave climate. The assessment of future changes in coastal impacts is based on direct downscaling of indicators defined by empirical formulations (total water level for coastal flooding and number of hours per year with overtopping for port operability). Global multimodel projections of the significant wave height and peak period are consistent with changes obtained in previous studies. Statistical confidence of expected changes is obtained due to the large number of GCMs to construct the ensemble. The proposed methodology is proved to be flexible to project wave climate at different spatial scales. Regional changes of additional variables as wave direction or other statistics can be estimated from the future empirical distribution with extreme values restricted to high percentiles (i.e., 95th, 99th percentiles). The statistical framework can also be applied to evaluate regional coastal impacts integrating changes in storminess and sea level rise.
NASA Technical Reports Server (NTRS)
Shipman, D. L.
1972-01-01
The development of a model to simulate the information system of a program management type of organization is reported. The model statistically determines the following parameters: type of messages, destinations, delivery durations, type processing, processing durations, communication channels, outgoing messages, and priorites. The total management information system of the program management organization is considered, including formal and informal information flows and both facilities and equipment. The model is written in General Purpose System Simulation 2 computer programming language for use on the Univac 1108, Executive 8 computer. The model is simulated on a daily basis and collects queue and resource utilization statistics for each decision point. The statistics are then used by management to evaluate proposed resource allocations, to evaluate proposed changes to the system, and to identify potential problem areas. The model employs both empirical and theoretical distributions which are adjusted to simulate the information flow being studied.
The Gaussian copula model for the joint deficit index for droughts
NASA Astrophysics Data System (ADS)
Van de Vyver, H.; Van den Bergh, J.
2018-06-01
The characterization of droughts and their impacts is very dependent on the time scale that is involved. In order to obtain an overall drought assessment, the cumulative effects of water deficits over different times need to be examined together. For example, the recently developed joint deficit index (JDI) is based on multivariate probabilities of precipitation over various time scales from 1- to 12-months, and was constructed from empirical copulas. In this paper, we examine the Gaussian copula model for the JDI. We model the covariance across the temporal scales with a two-parameter function that is commonly used in the specific context of spatial statistics or geostatistics. The validity of the covariance models is demonstrated with long-term precipitation series. Bootstrap experiments indicate that the Gaussian copula model has advantages over the empirical copula method in the context of drought severity assessment: (i) it is able to quantify droughts outside the range of the empirical copula, (ii) provides adequate drought quantification, and (iii) provides a better understanding of the uncertainty in the estimation.
High-order fuzzy time-series based on multi-period adaptation model for forecasting stock markets
NASA Astrophysics Data System (ADS)
Chen, Tai-Liang; Cheng, Ching-Hsue; Teoh, Hia-Jong
2008-02-01
Stock investors usually make their short-term investment decisions according to recent stock information such as the late market news, technical analysis reports, and price fluctuations. To reflect these short-term factors which impact stock price, this paper proposes a comprehensive fuzzy time-series, which factors linear relationships between recent periods of stock prices and fuzzy logical relationships (nonlinear relationships) mined from time-series into forecasting processes. In empirical analysis, the TAIEX (Taiwan Stock Exchange Capitalization Weighted Stock Index) and HSI (Heng Seng Index) are employed as experimental datasets, and four recent fuzzy time-series models, Chen’s (1996), Yu’s (2005), Cheng’s (2006) and Chen’s (2007), are used as comparison models. Besides, to compare with conventional statistic method, the method of least squares is utilized to estimate the auto-regressive models of the testing periods within the databases. From analysis results, the performance comparisons indicate that the multi-period adaptation model, proposed in this paper, can effectively improve the forecasting performance of conventional fuzzy time-series models which only factor fuzzy logical relationships in forecasting processes. From the empirical study, the traditional statistic method and the proposed model both reveal that stock price patterns in the Taiwan stock and Hong Kong stock markets are short-term.
NASA Astrophysics Data System (ADS)
Strauch, R. L.; Istanbulluoglu, E.
2017-12-01
We develop a landslide hazard modeling approach that integrates a data-driven statistical model and a probabilistic process-based shallow landslide model for mapping probability of landslide initiation, transport, and deposition at regional scales. The empirical model integrates the influence of seven site attribute (SA) classes: elevation, slope, curvature, aspect, land use-land cover, lithology, and topographic wetness index, on over 1,600 observed landslides using a frequency ratio (FR) approach. A susceptibility index is calculated by adding FRs for each SA on a grid-cell basis. Using landslide observations we relate susceptibility index to an empirically-derived probability of landslide impact. This probability is combined with results from a physically-based model to produce an integrated probabilistic map. Slope was key in landslide initiation while deposition was linked to lithology and elevation. Vegetation transition from forest to alpine vegetation and barren land cover with lower root cohesion leads to higher frequency of initiation. Aspect effects are likely linked to differences in root cohesion and moisture controlled by solar insulation and snow. We demonstrate the model in the North Cascades of Washington, USA and identify locations of high and low probability of landslide impacts that can be used by land managers in their design, planning, and maintenance.
Neutral dynamics with environmental noise: Age-size statistics and species lifetimes
NASA Astrophysics Data System (ADS)
Kessler, David; Suweis, Samir; Formentin, Marco; Shnerb, Nadav M.
2015-08-01
Neutral dynamics, where taxa are assumed to be demographically equivalent and their abundance is governed solely by the stochasticity of the underlying birth-death process, has proved itself as an important minimal model that accounts for many empirical datasets in genetics and ecology. However, the restriction of the model to demographic [O (√{N }) ] noise yields relatively slow dynamics that appears to be in conflict with both short-term and long-term characteristics of the observed systems. Here we analyze two of these problems—age-size relationships and species extinction time—in the framework of a neutral theory with both demographic and environmental stochasticity. It turns out that environmentally induced variations of the demographic rates control the long-term dynamics and modify dramatically the predictions of the neutral theory with demographic noise only, yielding much better agreement with empirical data. We consider two prototypes of "zero mean" environmental noise, one which is balanced with regard to the arithmetic abundance, another balanced in the logarithmic (fitness) space, study their species lifetime statistics, and discuss their relevance to realistic models of community dynamics.
Li, Ji; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Blanquart, François; Bataillon, Thomas
2016-01-01
The fitness landscape defines the relationship between genotypes and fitness in a given environment and underlies fundamental quantities such as the distribution of selection coefficient and the magnitude and type of epistasis. A better understanding of variation in landscape structure across species and environments is thus necessary to understand and predict how populations will adapt. An increasing number of experiments investigate the properties of fitness landscapes by identifying mutations, constructing genotypes with combinations of these mutations, and measuring the fitness of these genotypes. Yet these empirical landscapes represent a very small sample of the vast space of all possible genotypes, and this sample is often biased by the protocol used to identify mutations. Here we develop a rigorous statistical framework based on Approximate Bayesian Computation to address these concerns and use this flexible framework to fit a broad class of phenotypic fitness models (including Fisher’s model) to 26 empirical landscapes representing nine diverse biological systems. Despite uncertainty owing to the small size of most published empirical landscapes, the inferred landscapes have similar structure in similar biological systems. Surprisingly, goodness-of-fit tests reveal that this class of phenotypic models, which has been successful so far in interpreting experimental data, is a plausible in only three of nine biological systems. More precisely, although Fisher’s model was able to explain several statistical properties of the landscapes—including the mean and SD of selection and epistasis coefficients—it was often unable to explain the full structure of fitness landscapes. PMID:27052568
LD-SPatt: large deviations statistics for patterns on Markov chains.
Nuel, G
2004-01-01
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.
An operational GLS model for hydrologic regression
Tasker, Gary D.; Stedinger, J.R.
1989-01-01
Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.
Invariance in the recurrence of large returns and the validation of models of price dynamics
NASA Astrophysics Data System (ADS)
Chang, Lo-Bin; Geman, Stuart; Hsieh, Fushing; Hwang, Chii-Ruey
2013-08-01
Starting from a robust, nonparametric definition of large returns (“excursions”), we study the statistics of their occurrences, focusing on the recurrence process. The empirical waiting-time distribution between excursions is remarkably invariant to year, stock, and scale (return interval). This invariance is related to self-similarity of the marginal distributions of returns, but the excursion waiting-time distribution is a function of the entire return process and not just its univariate probabilities. Generalized autoregressive conditional heteroskedasticity (GARCH) models, market-time transformations based on volume or trades, and generalized (Lévy) random-walk models all fail to fit the statistical structure of excursions.
Landowner interest in multifunctional agroforestry riparian buffers.
Katie Trozzo; John Munsell; James Chamberlain
2014-01-01
Adoption of temperate agroforestry practices generally remains limited despite considerable advances in basic science. This study builds on temperate agroforestry adoption research by empirically testing a statistical model of interest in native fruit and nut tree riparian buffers using technology and agroforestry adoption theory. Data...
Two models for evaluating landslide hazards
Davis, J.C.; Chung, C.-J.; Ohlmacher, G.C.
2006-01-01
Two alternative procedures for estimating landslide hazards were evaluated using data on topographic digital elevation models (DEMs) and bedrock lithologies in an area adjacent to the Missouri River in Atchison County, Kansas, USA. The two procedures are based on the likelihood ratio model but utilize different assumptions. The empirical likelihood ratio model is based on non-parametric empirical univariate frequency distribution functions under an assumption of conditional independence while the multivariate logistic discriminant model assumes that likelihood ratios can be expressed in terms of logistic functions. The relative hazards of occurrence of landslides were estimated by an empirical likelihood ratio model and by multivariate logistic discriminant analysis. Predictor variables consisted of grids containing topographic elevations, slope angles, and slope aspects calculated from a 30-m DEM. An integer grid of coded bedrock lithologies taken from digitized geologic maps was also used as a predictor variable. Both statistical models yield relative estimates in the form of the proportion of total map area predicted to already contain or to be the site of future landslides. The stabilities of estimates were checked by cross-validation of results from random subsamples, using each of the two procedures. Cell-by-cell comparisons of hazard maps made by the two models show that the two sets of estimates are virtually identical. This suggests that the empirical likelihood ratio and the logistic discriminant analysis models are robust with respect to the conditional independent assumption and the logistic function assumption, respectively, and that either model can be used successfully to evaluate landslide hazards. ?? 2006.
Vu, Duy; Lomi, Alessandro; Mascia, Daniele; Pallotti, Francesca
2017-06-30
The main objective of this paper is to introduce and illustrate relational event models, a new class of statistical models for the analysis of time-stamped data with complex temporal and relational dependencies. We outline the main differences between recently proposed relational event models and more conventional network models based on the graph-theoretic formalism typically adopted in empirical studies of social networks. Our main contribution involves the definition and implementation of a marked point process extension of currently available models. According to this approach, the sequence of events of interest is decomposed into two components: (a) event time and (b) event destination. This decomposition transforms the problem of selection of event destination in relational event models into a conditional multinomial logistic regression problem. The main advantages of this formulation are the possibility of controlling for the effect of event-specific data and a significant reduction in the estimation time of currently available relational event models. We demonstrate the empirical value of the model in an analysis of interhospital patient transfers within a regional community of health care organizations. We conclude with a discussion of how the models we presented help to overcome some the limitations of statistical models for networks that are currently available. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Forbes, G. S.; Pielke, R. A.
1985-01-01
Various empirical and statistical weather-forecasting studies which utilize stratification by weather regime are described. Objective classification was used to determine weather regime in some studies. In other cases the weather pattern was determined on the basis of a parameter representing the physical and dynamical processes relevant to the anticipated mesoscale phenomena, such as low level moisture convergence and convective precipitation, or the Froude number and the occurrence of cold-air damming. For mesoscale phenomena already in existence, new forecasting techniques were developed. The use of cloud models in operational forecasting is discussed. Models to calculate the spatial scales of forcings and resultant response for mesoscale systems are presented. The use of these models to represent the climatologically most prevalent systems, and to perform case-by-case simulations is reviewed. Operational implementation of mesoscale data into weather forecasts, using both actual simulation output and method-output statistics is discussed.
Limits of Predictability in Commuting Flows in the Absence of Data for Calibration
Yang, Yingxiang; Herrera, Carlos; Eagle, Nathan; González, Marta C.
2014-01-01
The estimation of commuting flows at different spatial scales is a fundamental problem for different areas of study. Many current methods rely on parameters requiring calibration from empirical trip volumes. Their values are often not generalizable to cases without calibration data. To solve this problem we develop a statistical expression to calculate commuting trips with a quantitative functional form to estimate the model parameter when empirical trip data is not available. We calculate commuting trip volumes at scales from within a city to an entire country, introducing a scaling parameter α to the recently proposed parameter free radiation model. The model requires only widely available population and facility density distributions. The parameter can be interpreted as the influence of the region scale and the degree of heterogeneity in the facility distribution. We explore in detail the scaling limitations of this problem, namely under which conditions the proposed model can be applied without trip data for calibration. On the other hand, when empirical trip data is available, we show that the proposed model's estimation accuracy is as good as other existing models. We validated the model in different regions in the U.S., then successfully applied it in three different countries. PMID:25012599
ERIC Educational Resources Information Center
Rupp, Andre A.
2012-01-01
In the focus article of this issue, von Davier, Naemi, and Roberts essentially coupled: (1) a short methodological review of structural similarities of latent variable models with discrete and continuous latent variables; and (2) 2 short empirical case studies that show how these models can be applied to real, rather than simulated, large-scale…
Climate change and the eco-hydrology of fire: Will area burned increase in a warming western USA?
Donald McKenzie; Jeremy S. Littell
2017-01-01
Wildfire area is predicted to increase with global warming. Empirical statistical models and process-based simulations agree almost universally. The key relationship for this unanimity, observed at multiple spatial and temporal scales, is between drought and fire. Predictive models often focus on ecosystems in which this relationship appears to be particularly strong,...
Robustness of fit indices to outliers and leverage observations in structural equation modeling.
Yuan, Ke-Hai; Zhong, Xiaoling
2013-06-01
Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Cascading Walks Model for Human Mobility Patterns
Han, Xiao-Pu; Wang, Xiang-Wen; Yan, Xiao-Yong; Wang, Bing-Hong
2015-01-01
Background Uncovering the mechanism behind the scaling laws and series of anomalies in human trajectories is of fundamental significance in understanding many spatio-temporal phenomena. Recently, several models, e.g. the explorations-returns model (Song et al., 2010) and the radiation model for intercity travels (Simini et al., 2012), have been proposed to study the origin of these anomalies and the prediction of human movements. However, an agent-based model that could reproduce most of empirical observations without priori is still lacking. Methodology/Principal Findings In this paper, considering the empirical findings on the correlations of move-lengths and staying time in human trips, we propose a simple model which is mainly based on the cascading processes to capture the human mobility patterns. In this model, each long-range movement activates series of shorter movements that are organized by the law of localized explorations and preferential returns in prescribed region. Conclusions/Significance Based on the numerical simulations and analytical studies, we show more than five statistical characters that are well consistent with the empirical observations, including several types of scaling anomalies and the ultraslow diffusion properties, implying the cascading processes associated with the localized exploration and preferential returns are indeed a key in the understanding of human mobility activities. Moreover, the model shows both of the diverse individual mobility and aggregated scaling displacements, bridging the micro and macro patterns in human mobility. In summary, our model successfully explains most of empirical findings and provides deeper understandings on the emergence of human mobility patterns. PMID:25860140
Cascading walks model for human mobility patterns.
Han, Xiao-Pu; Wang, Xiang-Wen; Yan, Xiao-Yong; Wang, Bing-Hong
2015-01-01
Uncovering the mechanism behind the scaling laws and series of anomalies in human trajectories is of fundamental significance in understanding many spatio-temporal phenomena. Recently, several models, e.g. the explorations-returns model (Song et al., 2010) and the radiation model for intercity travels (Simini et al., 2012), have been proposed to study the origin of these anomalies and the prediction of human movements. However, an agent-based model that could reproduce most of empirical observations without priori is still lacking. In this paper, considering the empirical findings on the correlations of move-lengths and staying time in human trips, we propose a simple model which is mainly based on the cascading processes to capture the human mobility patterns. In this model, each long-range movement activates series of shorter movements that are organized by the law of localized explorations and preferential returns in prescribed region. Based on the numerical simulations and analytical studies, we show more than five statistical characters that are well consistent with the empirical observations, including several types of scaling anomalies and the ultraslow diffusion properties, implying the cascading processes associated with the localized exploration and preferential returns are indeed a key in the understanding of human mobility activities. Moreover, the model shows both of the diverse individual mobility and aggregated scaling displacements, bridging the micro and macro patterns in human mobility. In summary, our model successfully explains most of empirical findings and provides deeper understandings on the emergence of human mobility patterns.
Geospace environment modeling 2008--2009 challenge: Dst index
Rastätter, L.; Kuznetsova, M.M.; Glocer, A.; Welling, D.; Meng, X.; Raeder, J.; Wittberger, M.; Jordanova, V.K.; Yu, Y.; Zaharia, S.; Weigel, R.S.; Sazykin, S.; Boynton, R.; Wei, H.; Eccles, V.; Horton, W.; Mays, M.L.; Gannon, J.
2013-01-01
This paper reports the metrics-based results of the Dst index part of the 2008–2009 GEM Metrics Challenge. The 2008–2009 GEM Metrics Challenge asked modelers to submit results for four geomagnetic storm events and five different types of observations that can be modeled by statistical, climatological or physics-based models of the magnetosphere-ionosphere system. We present the results of 30 model settings that were run at the Community Coordinated Modeling Center and at the institutions of various modelers for these events. To measure the performance of each of the models against the observations, we use comparisons of 1 hour averaged model data with the Dst index issued by the World Data Center for Geomagnetism, Kyoto, Japan, and direct comparison of 1 minute model data with the 1 minute Dst index calculated by the United States Geological Survey. The latter index can be used to calculate spectral variability of model outputs in comparison to the index. We find that model rankings vary widely by skill score used. None of the models consistently perform best for all events. We find that empirical models perform well in general. Magnetohydrodynamics-based models of the global magnetosphere with inner magnetosphere physics (ring current model) included and stand-alone ring current models with properly defined boundary conditions perform well and are able to match or surpass results from empirical models. Unlike in similar studies, the statistical models used in this study found their challenge in the weakest events rather than the strongest events.
NASA Astrophysics Data System (ADS)
Karpushin, P. A.; Popov, Yu B.; Popova, A. I.; Popova, K. Yu; Krasnenko, N. P.; Lavrinenko, A. V.
2017-11-01
In this paper, the probabilities of faultless operation of aerologic stations are analyzed, the hypothesis of normality of the empirical data required for using the Kalman filter algorithms is tested, and the spatial correlation functions of distributions of meteorological parameters are determined. The results of a statistical analysis of two-term (0, 12 GMT) radiosonde observations of the temperature and wind velocity components at some preset altitude ranges in the troposphere in 2001-2016 are presented. These data can be used in mathematical modeling of physical processes in the atmosphere.
Empirical Corrections to Nutation Amplitudes and Precession Computed from a Global VLBI Solution
NASA Astrophysics Data System (ADS)
Schuh, H.; Ferrandiz, J. M.; Belda-Palazón, S.; Heinkelmann, R.; Karbon, M.; Nilsson, T.
2017-12-01
The IAU2000A nutation and IAU2006 precession models were adopted to provide accurate estimations and predictions of the Celestial Intermediate Pole (CIP). However, they are not fully accurate and VLBI (Very Long Baseline Interferometry) observations show that the CIP deviates from the position resulting from the application of the IAU2006/2000A model. Currently, those deviations or offsets of the CIP (Celestial Pole Offsets - CPO), can only be obtained by the VLBI technique. The accuracy of the order of 0.1 milliseconds of arc (mas) allows to compare the observed nutation with theoretical prediction model for a rigid Earth and constrain geophysical parameters describing the Earth's interior. In this study, we empirically evaluate the consistency, systematics and deviations of the IAU 2006/2000A precession-nutation model using several CPO time series derived from the global analysis of VLBI sessions. The final objective is the reassessment of the precession offset and rate, and the amplitudes of the principal terms of nutation, trying to empirically improve the conventional values derived from the precession/nutation theories. The statistical analysis of the residuals after re-fitting the main nutation terms demonstrates that our empirical corrections attain an error reduction by almost 15 micro arc seconds.
The logical primitives of thought: Empirical foundations for compositional cognitive models.
Piantadosi, Steven T; Tenenbaum, Joshua B; Goodman, Noah D
2016-07-01
The notion of a compositional language of thought (LOT) has been central in computational accounts of cognition from earliest attempts (Boole, 1854; Fodor, 1975) to the present day (Feldman, 2000; Penn, Holyoak, & Povinelli, 2008; Fodor, 2008; Kemp, 2012; Goodman, Tenenbaum, & Gerstenberg, 2015). Recent modeling work shows how statistical inferences over compositionally structured hypothesis spaces might explain learning and development across a variety of domains. However, the primitive components of such representations are typically assumed a priori by modelers and theoreticians rather than determined empirically. We show how different sets of LOT primitives, embedded in a psychologically realistic approximate Bayesian inference framework, systematically predict distinct learning curves in rule-based concept learning experiments. We use this feature of LOT models to design a set of large-scale concept learning experiments that can determine the most likely primitives for psychological concepts involving Boolean connectives and quantification. Subjects' inferences are most consistent with a rich (nonminimal) set of Boolean operations, including first-order, but not second-order, quantification. Our results more generally show how specific LOT theories can be distinguished empirically. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian
2016-01-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564
Austin, Peter C; Steyerberg, Ewout W
2012-06-20
When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Holgado-Tello, Fco P; Chacón-Moscoso, Salvador; Sanduvete-Chaves, Susana; Pérez-Gil, José A
2016-01-01
The Campbellian tradition provides a conceptual framework to assess threats to validity. On the other hand, different models of causal analysis have been developed to control estimation biases in different research designs. However, the link between design features, measurement issues, and concrete impact estimation analyses is weak. In order to provide an empirical solution to this problem, we use Structural Equation Modeling (SEM) as a first approximation to operationalize the analytical implications of threats to validity in quasi-experimental designs. Based on the analogies established between the Classical Test Theory (CTT) and causal analysis, we describe an empirical study based on SEM in which range restriction and statistical power have been simulated in two different models: (1) A multistate model in the control condition (pre-test); and (2) A single-trait-multistate model in the control condition (post-test), adding a new mediator latent exogenous (independent) variable that represents a threat to validity. Results show, empirically, how the differences between both the models could be partially or totally attributed to these threats. Therefore, SEM provides a useful tool to analyze the influence of potential threats to validity.
Holgado-Tello, Fco. P.; Chacón-Moscoso, Salvador; Sanduvete-Chaves, Susana; Pérez-Gil, José A.
2016-01-01
The Campbellian tradition provides a conceptual framework to assess threats to validity. On the other hand, different models of causal analysis have been developed to control estimation biases in different research designs. However, the link between design features, measurement issues, and concrete impact estimation analyses is weak. In order to provide an empirical solution to this problem, we use Structural Equation Modeling (SEM) as a first approximation to operationalize the analytical implications of threats to validity in quasi-experimental designs. Based on the analogies established between the Classical Test Theory (CTT) and causal analysis, we describe an empirical study based on SEM in which range restriction and statistical power have been simulated in two different models: (1) A multistate model in the control condition (pre-test); and (2) A single-trait-multistate model in the control condition (post-test), adding a new mediator latent exogenous (independent) variable that represents a threat to validity. Results show, empirically, how the differences between both the models could be partially or totally attributed to these threats. Therefore, SEM provides a useful tool to analyze the influence of potential threats to validity. PMID:27378991
Relating triggering processes in lab experiments with earthquakes.
NASA Astrophysics Data System (ADS)
Baro Urbea, J.; Davidsen, J.; Kwiatek, G.; Charalampidou, E. M.; Goebel, T.; Stanchits, S. A.; Vives, E.; Dresen, G.
2016-12-01
Statistical relations such as Gutenberg-Richter's, Omori-Utsu's and the productivity of aftershocks were first observed in seismology, but are also common to other physical phenomena exhibiting avalanche dynamics such as solar flares, rock fracture, structural phase transitions and even stock market transactions. All these examples exhibit spatio-temporal correlations that can be explained as triggering processes: Instead of being activated as a response to external driving or fluctuations, some events are consequence of previous activity. Although different plausible explanations have been suggested in each system, the ubiquity of such statistical laws remains unknown. However, the case of rock fracture may exhibit a physical connection with seismology. It has been suggested that some features of seismology have a microscopic origin and are reproducible over a vast range of scales. This hypothesis has motivated mechanical experiments to generate artificial catalogues of earthquakes at a laboratory scale -so called labquakes- and under controlled conditions. Microscopic fractures in lab tests release elastic waves that are recorded as ultrasonic (kHz-MHz) acoustic emission (AE) events by means of piezoelectric transducers. Here, we analyse the statistics of labquakes recorded during the failure of small samples of natural rocks and artificial porous materials under different controlled compression regimes. Temporal and spatio-temporal correlations are identified in certain cases. Specifically, we distinguish between the background and triggered events, revealing some differences in the statistical properties. We fit the data to statistical models of seismicity. As a particular case, we explore the branching process approach simplified in the Epidemic Type Aftershock Sequence (ETAS) model. We evaluate the empirical spatio-temporal kernel of the model and investigate the physical origins of triggering. Our analysis of the focal mechanisms implies that the occurrence of the empirical laws extends well beyond purely frictional sliding events, in contrast to what is often assumed.
Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses
Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy
2015-01-01
Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579
May, Michael R; Moore, Brian R
2016-11-01
Evolutionary biologists have long been fascinated by the extreme differences in species numbers across branches of the Tree of Life. This has motivated the development of statistical methods for detecting shifts in the rate of lineage diversification across the branches of phylogenic trees. One of the most frequently used methods, MEDUSA, explores a set of diversification-rate models, where each model assigns branches of the phylogeny to a set of diversification-rate categories. Each model is first fit to the data, and the Akaike information criterion (AIC) is then used to identify the optimal diversification model. Surprisingly, the statistical behavior of this popular method is uncharacterized, which is a concern in light of: (1) the poor performance of the AIC as a means of choosing among models in other phylogenetic contexts; (2) the ad hoc algorithm used to visit diversification models, and; (3) errors that we reveal in the likelihood function used to fit diversification models to the phylogenetic data. Here, we perform an extensive simulation study demonstrating that MEDUSA (1) has a high false-discovery rate (on average, spurious diversification-rate shifts are identified [Formula: see text] of the time), and (2) provides biased estimates of diversification-rate parameters. Understanding the statistical behavior of MEDUSA is critical both to empirical researchers-in order to clarify whether these methods can make reliable inferences from empirical datasets-and to theoretical biologists-in order to clarify the specific problems that need to be solved in order to develop more reliable approaches for detecting shifts in the rate of lineage diversification. [Akaike information criterion; extinction; lineage-specific diversification rates; phylogenetic model selection; speciation.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
May, Michael R.; Moore, Brian R.
2016-01-01
Evolutionary biologists have long been fascinated by the extreme differences in species numbers across branches of the Tree of Life. This has motivated the development of statistical methods for detecting shifts in the rate of lineage diversification across the branches of phylogenic trees. One of the most frequently used methods, MEDUSA, explores a set of diversification-rate models, where each model assigns branches of the phylogeny to a set of diversification-rate categories. Each model is first fit to the data, and the Akaike information criterion (AIC) is then used to identify the optimal diversification model. Surprisingly, the statistical behavior of this popular method is uncharacterized, which is a concern in light of: (1) the poor performance of the AIC as a means of choosing among models in other phylogenetic contexts; (2) the ad hoc algorithm used to visit diversification models, and; (3) errors that we reveal in the likelihood function used to fit diversification models to the phylogenetic data. Here, we perform an extensive simulation study demonstrating that MEDUSA (1) has a high false-discovery rate (on average, spurious diversification-rate shifts are identified ≈30% of the time), and (2) provides biased estimates of diversification-rate parameters. Understanding the statistical behavior of MEDUSA is critical both to empirical researchers—in order to clarify whether these methods can make reliable inferences from empirical datasets—and to theoretical biologists—in order to clarify the specific problems that need to be solved in order to develop more reliable approaches for detecting shifts in the rate of lineage diversification. [Akaike information criterion; extinction; lineage-specific diversification rates; phylogenetic model selection; speciation.] PMID:27037081
Predictive Modeling of Risk Associated with Temperature Extremes over Continental US
NASA Astrophysics Data System (ADS)
Kravtsov, S.; Roebber, P.; Brazauskas, V.
2016-12-01
We build an extremely statistically accurate, essentially bias-free empirical emulator of atmospheric surface temperature and apply it for meteorological risk assessment over the domain of continental US. The resulting prediction scheme achieves an order-of-magnitude or larger gain of numerical efficiency compared with the schemes based on high-resolution dynamical atmospheric models, leading to unprecedented accuracy of the estimated risk distributions. The empirical model construction methodology is based on our earlier work, but is further modified to account for the influence of large-scale, global climate change on regional US weather and climate. The resulting estimates of the time-dependent, spatially extended probability of temperature extremes over the simulation period can be used as a risk management tool by insurance companies and regulatory governmental agencies.
NASA Astrophysics Data System (ADS)
Deng, Wei; Wang, Jun
2015-06-01
We investigate and quantify the multifractal detrended cross-correlation of return interval series for Chinese stock markets and a proposed price model, the price model is established by oriented percolation. The return interval describes the waiting time between two successive price volatilities which are above some threshold, the present work is an attempt to quantify the level of multifractal detrended cross-correlation for the return intervals. Further, the concept of MF-DCCA coefficient of return intervals is introduced, and the corresponding empirical research is performed. The empirical results show that the return intervals of SSE and SZSE are weakly positive multifractal power-law cross-correlated, and exhibit the fluctuation patterns of MF-DCCA coefficients. The similar behaviors of return intervals for the price model is also demonstrated.
Regional analyses of labor markets and demography: a model based Norwegian example.
Stambol, L S; Stolen, N M; Avitsland, T
1998-01-01
The authors discuss the regional REGARD model, developed by Statistics Norway to analyze the regional implications of macroeconomic development of employment, labor force, and unemployment. "In building the model, empirical analyses of regional producer behavior in manufacturing industries have been performed, and the relation between labor market development and regional migration has been investigated. Apart from providing a short description of the REGARD model, this article demonstrates the functioning of the model, and presents some results of an application." excerpt
The Hierarchical Personality Structure of Aspiring Creative Writers
ERIC Educational Resources Information Center
Maslej, Marta M.; Rain, Marina; Fong, Katrina; Oatley, Keith; Mar, Raymond A.
2014-01-01
Empirical studies of personality traits in creative writers have demonstrated mixed findings, perhaps due to issues of sampling, measurement, and the reporting of statistical information. The goal of this study is to quantify the personality structure of aspiring creative writers according to a modern hierarchal model of trait personality. A…
Circulation Clusters--An Empirical Approach to Decentralization of Academic Libraries.
ERIC Educational Resources Information Center
McGrath, William E.
1986-01-01
Discusses the issue of centralization or decentralization of academic library collections, and describes a statistical analysis of book circulation at the University of Southwestern Louisiana that yielded subject area clusters as a compromise solution to the problem. Applications of the cluster model for all types of library catalogs are…
Multifactorial modelling of high-temperature treatment of timber in the saturated water steam medium
NASA Astrophysics Data System (ADS)
Prosvirnikov, D. B.; Safin, R. G.; Ziatdinova, D. F.; Timerbaev, N. F.; Lashkov, V. A.
2016-04-01
The paper analyses experimental data obtained in studies of high-temperature treatment of softwood and hardwood in an environment of saturated water steam. Data were processed in the Curve Expert software for the purpose of statistical modelling of processes and phenomena occurring during this process. The multifactorial modelling resulted in the empirical dependences, allowing determining the main parameters of this type of hydrothermal treatment with high accuracy.
Statistical mechanics of neocortical interactions: Constraints on 40-Hz models of short-term memory
NASA Astrophysics Data System (ADS)
Ingber, Lester
1995-10-01
Calculations presented in L. Ingber and P.L. Nunez, Phys. Rev. E 51, 5074 (1995) detailed the evolution of short-term memory in the neocortex, supporting the empirical 7+/-2 rule of constraints on the capacity of neocortical processing. These results are given further support when other recent models of 40-Hz subcycles of low-frequency oscillations are considered.
NASA Astrophysics Data System (ADS)
McCauley, Joseph L.
2009-09-01
Preface; 1. Econophysics: why and what; 2. Neo-classical economic theory; 3. Probability and stochastic processes; 4. Introduction to financial economics; 5. Introduction to portfolio selection theory; 6. Scaling, pair correlations, and conditional densities; 7. Statistical ensembles: deducing dynamics from time series; 8. Martingale option pricing; 9. FX market globalization: evolution of the dollar to worldwide reserve currency; 10. Macroeconomics and econometrics: regression models vs. empirically based modeling; 11. Complexity; Index.
Quantifying Confidence in Model Predictions for Hypersonic Aircraft Structures
2015-03-01
of isolating calibrations of models in the network, segmented and simultaneous calibration are compared using the Kullback - Leibler ...value of θ. While not all test -statistics are as simple as measuring goodness or badness of fit , their directional interpretations tend to remain...data quite well, qualitatively. Quantitative goodness - of - fit tests are problematic because they assume a true empirical CDF is being tested or
Steve P. Verrill; Frank C. Owens; David E. Kretschmann; Rubin Shmulsky
2017-01-01
It is common practice to assume that a two-parameter Weibull probability distribution is suitable for modeling lumber properties. Verrill and co-workers demonstrated theoretically and empirically that the modulus of rupture (MOR) distribution of visually graded or machine stress rated (MSR) lumber is not distributed as a Weibull. Instead, the tails of the MOR...
South, Susan C.; Hamdi, Nayla; Krueger, Robert F.
2015-01-01
For more than a decade, biometric moderation models have been used to examine whether genetic and environmental influences on individual differences might vary within the population. These quantitative gene × environment interaction (G×E) models not only have the potential to elucidate when genetic and environmental influences on a phenotype might differ, but why, as they provide an empirical test of several theoretical paradigms that serve as useful heuristics to explain etiology—diathesis-stress, bioecological, differential susceptibility, and social control. In the current manuscript, we review how these developmental theories align with different patterns of findings from statistical models of gene-environment interplay. We then describe the extant empirical evidence, using work by our own research group and others, to lay out genetically-informative plausible accounts of how phenotypes related to social inequality—physical health and cognition—might relate to these theoretical models. PMID:26426103
South, Susan C; Hamdi, Nayla R; Krueger, Robert F
2017-02-01
For more than a decade, biometric moderation models have been used to examine whether genetic and environmental influences on individual differences might vary within the population. These quantitative Gene × Environment interaction models have the potential to elucidate not only when genetic and environmental influences on a phenotype might differ, but also why, as they provide an empirical test of several theoretical paradigms that serve as useful heuristics to explain etiology-diathesis-stress, bioecological, differential susceptibility, and social control. In the current article, we review how these developmental theories align with different patterns of findings from statistical models of gene-environment interplay. We then describe the extant empirical evidence, using work by our own research group and others, to lay out genetically informative plausible accounts of how phenotypes related to social inequality-physical health and cognition-might relate to these theoretical models. © 2015 Wiley Periodicals, Inc.
Zuthi, M F R; Ngo, H H; Guo, W S; Nghiem, L D; Hai, F I; Xia, S Q; Zhang, Z Q; Li, J X
2015-08-01
This study investigates the influence of key biomass parameters on specific oxygen uptake rate (SOUR) in a sponge submerged membrane bioreactor (SSMBR) to develop mathematical models of biomass viability. Extra-cellular polymeric substances (EPS) were considered as a lumped parameter of bound EPS (bEPS) and soluble microbial products (SMP). Statistical analyses of experimental results indicate that the bEPS, SMP, mixed liquor suspended solids and volatile suspended solids (MLSS and MLVSS) have functional relationships with SOUR and their relative influence on SOUR was in the order of EPS>bEPS>SMP>MLVSS/MLSS. Based on correlations among biomass parameters and SOUR, two independent empirical models of biomass viability were developed. The models were validated using results of the SSMBR. However, further validation of the models for different operating conditions is suggested. Copyright © 2015 Elsevier Ltd. All rights reserved.
Effects of video modeling on communicative social skills of college students with Asperger syndrome.
Mason, Rose A; Rispoli, Mandy; Ganz, Jennifer B; Boles, Margot B; Orr, Kristie
2012-01-01
Empirical support regarding effective interventions for individuals with autism spectrum disorder (ASD) within a postsecondary community is limited. Video modeling, an empirically supported intervention for children and adolescents with ASD, may prove effective in addressing the needs of individuals with ASD in higher education. This study evaluated the effects of video modeling without additional treatment components to improve social-communicative skills, specifically, eye contact, facial expression, and conversational turntaking in college students with ASD. This study utilized a multiple baseline single-case design across behaviors for two post-secondary students with ASD to evaluate the effects of the video modeling intervention. Large effect sizes and statistically significant change across all targeted skills for one participant and eye contact and turntaking for the other participant were obtained. The use of video modeling without additional intervention may increase the social skills of post-secondary students with ASD. Implications for future research are discussed.
Flood Change Assessment and Attribution in Austrian alpine Basins
NASA Astrophysics Data System (ADS)
Claps, Pierluigi; Allamano, Paola; Como, Anastasia; Viglione, Alberto
2016-04-01
The present paper aims to investigate the sensitivity of flood peaks to global warming in the Austrian alpine basins. A group of 97 Austrian watersheds, with areas ranging from 14 to 6000 km2 and with average elevation ranging from 1000 to 2900 m a.s.l. have been considered. Annual maximum floods are available for the basins from 1890 to 2007 with two densities of observation. In a first period, until 1950, an average of 42 records of flood peaks are available. From 1951 to 2007 the density of observation increases to an average amount of contemporary peaks of 85. This information is very important with reference to the statistical tools used for the empirical assessment of change over time, that is linear quantile regressions. Application of this tool to the data set unveils trends in extreme events, confirmed by statistical testing, for the 0.75 and 0.95 empirical quantiles. All applications are made with specific (discharges/area) values . Similarly of what done in a previous approach, multiple quantile regressions have also been applied, confirming the presence of trends even when the possible interference of the specific discharge and morphoclimatic parameters (i.e. mean elevation and catchment area). Application of a geomorphoclimatic model by Allamano et al (2009) can allow to mimic to which extent the empirically available increase in air temperature and annual rainfall can justify the attribution of change derived by the empirical statistical tools. An comparison with data from Swiss alpine basins treated in a previous paper is finally undertaken.
Formulating Spatially Varying Performance in the Statistical Fusion Framework
Landman, Bennett A.
2012-01-01
To date, label fusion methods have primarily relied either on global (e.g. STAPLE, globally weighted vote) or voxelwise (e.g. locally weighted vote) performance models. Optimality of the statistical fusion framework hinges upon the validity of the stochastic model of how a rater errs (i.e., the labeling process model). Hitherto, approaches have tended to focus on the extremes of potential models. Herein, we propose an extension to the STAPLE approach to seamlessly account for spatially varying performance by extending the performance level parameters to account for a smooth, voxelwise performance level field that is unique to each rater. This approach, Spatial STAPLE, provides significant improvements over state-of-the-art label fusion algorithms in both simulated and empirical data sets. PMID:22438513
Soil Erosion as a stochastic process
NASA Astrophysics Data System (ADS)
Casper, Markus C.
2015-04-01
The main tools to provide estimations concerning risk and amount of erosion are different types of soil erosion models: on the one hand, there are empirically based model concepts on the other hand there are more physically based or process based models. However, both types of models have substantial weak points. All empirical model concepts are only capable of providing rough estimates over larger temporal and spatial scales, they do not account for many driving factors that are in the scope of scenario related analysis. In addition, the physically based models contain important empirical parts and hence, the demand for universality and transferability is not given. As a common feature, we find, that all models rely on parameters and input variables, which are to certain, extend spatially and temporally averaged. A central question is whether the apparent heterogeneity of soil properties or the random nature of driving forces needs to be better considered in our modelling concepts. Traditionally, researchers have attempted to remove spatial and temporal variability through homogenization. However, homogenization has been achieved through physical manipulation of the system, or by statistical averaging procedures. The price for obtaining this homogenized (average) model concepts of soils and soil related processes has often been a failure to recognize the profound importance of heterogeneity in many of the properties and processes that we study. Especially soil infiltrability and the resistance (also called "critical shear stress" or "critical stream power") are the most important empirical factors of physically based erosion models. The erosion resistance is theoretically a substrate specific parameter, but in reality, the threshold where soil erosion begins is determined experimentally. The soil infiltrability is often calculated with empirical relationships (e.g. based on grain size distribution). Consequently, to better fit reality, this value needs to be corrected experimentally. To overcome this disadvantage of our actual models, soil erosion models are needed that are able to use stochastic directly variables and parameter distributions. There are only some minor approaches in this direction. The most advanced is the model "STOSEM" proposed by Sidorchuk in 2005. In this model, only a small part of the soil erosion processes is described, the aggregate detachment and the aggregate transport by flowing water. The concept is highly simplified, for example, many parameters are temporally invariant. Nevertheless, the main problem is that our existing measurements and experiments are not geared to provide stochastic parameters (e.g. as probability density functions); in the best case they deliver a statistical validation of the mean values. Again, we get effective parameters, spatially and temporally averaged. There is an urgent need for laboratory and field experiments on overland flow structure, raindrop effects and erosion rate, which deliver information on spatial and temporal structure of soil and surface properties and processes.
Forecasting runout of rock and debris avalanches
Iverson, Richard M.; Evans, S.G.; Mugnozza, G.S.; Strom, A.; Hermanns, R.L.
2006-01-01
Physically based mathematical models and statistically based empirical equations each may provide useful means of forecasting runout of rock and debris avalanches. This paper compares the foundations, strengths, and limitations of a physically based model and a statistically based forecasting method, both of which were developed to predict runout across three-dimensional topography. The chief advantage of the physically based model results from its ties to physical conservation laws and well-tested axioms of soil and rock mechanics, such as the Coulomb friction rule and effective-stress principle. The output of this model provides detailed information about the dynamics of avalanche runout, at the expense of high demands for accurate input data, numerical computation, and experimental testing. In comparison, the statistical method requires relatively modest computation and no input data except identification of prospective avalanche source areas and a range of postulated avalanche volumes. Like the physically based model, the statistical method yields maps of predicted runout, but it provides no information on runout dynamics. Although the two methods differ significantly in their structure and objectives, insights gained from one method can aid refinement of the other.
NASA Technical Reports Server (NTRS)
1981-01-01
The application of statistical methods to recorded ozone measurements. The effects of a long term depletion of ozone at magnitudes predicted by the NAS is harmful to most forms of life. Empirical prewhitening filters the derivation of which is independent of the underlying physical mechanisms were analyzed. Statistical analysis performs a checks and balances effort. Time series filters variations into systematic and random parts, errors are uncorrelated, and significant phase lag dependencies are identified. The use of time series modeling to enhance the capability of detecting trends is discussed.
Milic, Natasa M.; Trajkovic, Goran Z.; Bukumiric, Zoran M.; Cirkovic, Andja; Nikolic, Ivan M.; Milin, Jelena S.; Milic, Nikola V.; Savic, Marko D.; Corac, Aleksandar M.; Marinkovic, Jelena M.; Stanisavljevic, Dejana M.
2016-01-01
Background Although recent studies report on the benefits of blended learning in improving medical student education, there is still no empirical evidence on the relative effectiveness of blended over traditional learning approaches in medical statistics. We implemented blended along with on-site (i.e. face-to-face) learning to further assess the potential value of web-based learning in medical statistics. Methods This was a prospective study conducted with third year medical undergraduate students attending the Faculty of Medicine, University of Belgrade, who passed (440 of 545) the final exam of the obligatory introductory statistics course during 2013–14. Student statistics achievements were stratified based on the two methods of education delivery: blended learning and on-site learning. Blended learning included a combination of face-to-face and distance learning methodologies integrated into a single course. Results Mean exam scores for the blended learning student group were higher than for the on-site student group for both final statistics score (89.36±6.60 vs. 86.06±8.48; p = 0.001) and knowledge test score (7.88±1.30 vs. 7.51±1.36; p = 0.023) with a medium effect size. There were no differences in sex or study duration between the groups. Current grade point average (GPA) was higher in the blended group. In a multivariable regression model, current GPA and knowledge test scores were associated with the final statistics score after adjusting for study duration and learning modality (p<0.001). Conclusion This study provides empirical evidence to support educator decisions to implement different learning environments for teaching medical statistics to undergraduate medical students. Blended and on-site training formats led to similar knowledge acquisition; however, students with higher GPA preferred the technology assisted learning format. Implementation of blended learning approaches can be considered an attractive, cost-effective, and efficient alternative to traditional classroom training in medical statistics. PMID:26859832
Milic, Natasa M; Trajkovic, Goran Z; Bukumiric, Zoran M; Cirkovic, Andja; Nikolic, Ivan M; Milin, Jelena S; Milic, Nikola V; Savic, Marko D; Corac, Aleksandar M; Marinkovic, Jelena M; Stanisavljevic, Dejana M
2016-01-01
Although recent studies report on the benefits of blended learning in improving medical student education, there is still no empirical evidence on the relative effectiveness of blended over traditional learning approaches in medical statistics. We implemented blended along with on-site (i.e. face-to-face) learning to further assess the potential value of web-based learning in medical statistics. This was a prospective study conducted with third year medical undergraduate students attending the Faculty of Medicine, University of Belgrade, who passed (440 of 545) the final exam of the obligatory introductory statistics course during 2013-14. Student statistics achievements were stratified based on the two methods of education delivery: blended learning and on-site learning. Blended learning included a combination of face-to-face and distance learning methodologies integrated into a single course. Mean exam scores for the blended learning student group were higher than for the on-site student group for both final statistics score (89.36±6.60 vs. 86.06±8.48; p = 0.001) and knowledge test score (7.88±1.30 vs. 7.51±1.36; p = 0.023) with a medium effect size. There were no differences in sex or study duration between the groups. Current grade point average (GPA) was higher in the blended group. In a multivariable regression model, current GPA and knowledge test scores were associated with the final statistics score after adjusting for study duration and learning modality (p<0.001). This study provides empirical evidence to support educator decisions to implement different learning environments for teaching medical statistics to undergraduate medical students. Blended and on-site training formats led to similar knowledge acquisition; however, students with higher GPA preferred the technology assisted learning format. Implementation of blended learning approaches can be considered an attractive, cost-effective, and efficient alternative to traditional classroom training in medical statistics.
Hristov, A N; Kebreab, E; Niu, M; Oh, J; Bannink, A; Bayat, A R; Boland, T B; Brito, A F; Casper, D P; Crompton, L A; Dijkstra, J; Eugène, M; Garnsworthy, P C; Haque, N; Hellwing, A L F; Huhtanen, P; Kreuzer, M; Kuhla, B; Lund, P; Madsen, J; Martin, C; Moate, P J; Muetzel, S; Muñoz, C; Peiren, N; Powell, J M; Reynolds, C K; Schwarm, A; Shingfield, K J; Storlien, T M; Weisbjerg, M R; Yáñez-Ruiz, D R; Yu, Z
2018-04-18
Ruminant production systems are important contributors to anthropogenic methane (CH 4 ) emissions, but there are large uncertainties in national and global livestock CH 4 inventories. Sources of uncertainty in enteric CH 4 emissions include animal inventories, feed dry matter intake (DMI), ingredient and chemical composition of the diets, and CH 4 emission factors. There is also significant uncertainty associated with enteric CH 4 measurements. The most widely used techniques are respiration chambers, the sulfur hexafluoride (SF 6 ) tracer technique, and the automated head-chamber system (GreenFeed; C-Lock Inc., Rapid City, SD). All 3 methods have been successfully used in a large number of experiments with dairy or beef cattle in various environmental conditions, although studies that compare techniques have reported inconsistent results. Although different types of models have been developed to predict enteric CH 4 emissions, relatively simple empirical (statistical) models have been commonly used for inventory purposes because of their broad applicability and ease of use compared with more detailed empirical and process-based mechanistic models. However, extant empirical models used to predict enteric CH 4 emissions suffer from narrow spatial focus, limited observations, and limitations of the statistical technique used. Therefore, prediction models must be developed from robust data sets that can only be generated through collaboration of scientists across the world. To achieve high prediction accuracy, these data sets should encompass a wide range of diets and production systems within regions and globally. Overall, enteric CH 4 prediction models are based on various animal or feed characteristic inputs but are dominated by DMI in one form or another. As a result, accurate prediction of DMI is essential for accurate prediction of livestock CH 4 emissions. Analysis of a large data set of individual dairy cattle data showed that simplified enteric CH 4 prediction models based on DMI alone or DMI and limited feed- or animal-related inputs can predict average CH 4 emission with a similar accuracy to more complex empirical models. These simplified models can be reliably used for emission inventory purposes. The Authors. Published by FASS Inc. and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Stochastic empirical loading and dilution model (SELDM) version 1.0.0
Granato, Gregory E.
2013-01-01
The Stochastic Empirical Loading and Dilution Model (SELDM) is designed to transform complex scientific data into meaningful information about the risk of adverse effects of runoff on receiving waters, the potential need for mitigation measures, and the potential effectiveness of such management measures for reducing these risks. The U.S. Geological Survey developed SELDM in cooperation with the Federal Highway Administration to help develop planning-level estimates of event mean concentrations, flows, and loads in stormwater from a site of interest and from an upstream basin. Planning-level estimates are defined as the results of analyses used to evaluate alternative management measures; planning-level estimates are recognized to include substantial uncertainties (commonly orders of magnitude). SELDM uses information about a highway site, the associated receiving-water basin, precipitation events, stormflow, water quality, and the performance of mitigation measures to produce a stochastic population of runoff-quality variables. SELDM provides input statistics for precipitation, prestorm flow, runoff coefficients, and concentrations of selected water-quality constituents from National datasets. Input statistics may be selected on the basis of the latitude, longitude, and physical characteristics of the site of interest and the upstream basin. The user also may derive and input statistics for each variable that are specific to a given site of interest or a given area. SELDM is a stochastic model because it uses Monte Carlo methods to produce the random combinations of input variable values needed to generate the stochastic population of values for each component variable. SELDM calculates the dilution of runoff in the receiving waters and the resulting downstream event mean concentrations and annual average lake concentrations. Results are ranked, and plotting positions are calculated, to indicate the level of risk of adverse effects caused by runoff concentrations, flows, and loads on receiving waters by storm and by year. Unlike deterministic hydrologic models, SELDM is not calibrated by changing values of input variables to match a historical record of values. Instead, input values for SELDM are based on site characteristics and representative statistics for each hydrologic variable. Thus, SELDM is an empirical model based on data and statistics rather than theoretical physiochemical equations. SELDM is a lumped parameter model because the highway site, the upstream basin, and the lake basin each are represented as a single homogeneous unit. Each of these source areas is represented by average basin properties, and results from SELDM are calculated as point estimates for the site of interest. Use of the lumped parameter approach facilitates rapid specification of model parameters to develop planning-level estimates with available data. The approach allows for parsimony in the required inputs to and outputs from the model and flexibility in the use of the model. For example, SELDM can be used to model runoff from various land covers or land uses by using the highway-site definition as long as representative water quality and impervious-fraction data are available.
Statistical label fusion with hierarchical performance models
Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.
2014-01-01
Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809
Assessment of Current Jet Noise Prediction Capabilities
NASA Technical Reports Server (NTRS)
Hunter, Craid A.; Bridges, James E.; Khavaran, Abbas
2008-01-01
An assessment was made of the capability of jet noise prediction codes over a broad range of jet flows, with the objective of quantifying current capabilities and identifying areas requiring future research investment. Three separate codes in NASA s possession, representative of two classes of jet noise prediction codes, were evaluated, one empirical and two statistical. The empirical code is the Stone Jet Noise Module (ST2JET) contained within the ANOPP aircraft noise prediction code. It is well documented, and represents the state of the art in semi-empirical acoustic prediction codes where virtual sources are attributed to various aspects of noise generation in each jet. These sources, in combination, predict the spectral directivity of a jet plume. A total of 258 jet noise cases were examined on the ST2JET code, each run requiring only fractions of a second to complete. Two statistical jet noise prediction codes were also evaluated, JeNo v1, and Jet3D. Fewer cases were run for the statistical prediction methods because they require substantially more resources, typically a Reynolds-Averaged Navier-Stokes solution of the jet, volume integration of the source statistical models over the entire plume, and a numerical solution of the governing propagation equation within the jet. In the evaluation process, substantial justification of experimental datasets used in the evaluations was made. In the end, none of the current codes can predict jet noise within experimental uncertainty. The empirical code came within 2dB on a 1/3 octave spectral basis for a wide range of flows. The statistical code Jet3D was within experimental uncertainty at broadside angles for hot supersonic jets, but errors in peak frequency and amplitude put it out of experimental uncertainty at cooler, lower speed conditions. Jet3D did not predict changes in directivity in the downstream angles. The statistical code JeNo,v1 was within experimental uncertainty predicting noise from cold subsonic jets at all angles, but did not predict changes with heating of the jet and did not account for directivity changes at supersonic conditions. Shortcomings addressed here give direction for future work relevant to the statistical-based prediction methods. A full report will be released as a chapter in a NASA publication assessing the state of the art in aircraft noise prediction.
The Love of Large Numbers: A Popularity Bias in Consumer Choice.
Powell, Derek; Yu, Jingqi; DeWolf, Melissa; Holyoak, Keith J
2017-10-01
Social learning-the ability to learn from observing the decisions of other people and the outcomes of those decisions-is fundamental to human evolutionary and cultural success. The Internet now provides social evidence on an unprecedented scale. However, properly utilizing this evidence requires a capacity for statistical inference. We examined how people's interpretation of online review scores is influenced by the numbers of reviews-a potential indicator both of an item's popularity and of the precision of the average review score. Our task was designed to pit statistical information against social information. We modeled the behavior of an "intuitive statistician" using empirical prior information from millions of reviews posted on Amazon.com and then compared the model's predictions with the behavior of experimental participants. Under certain conditions, people preferred a product with more reviews to one with fewer reviews even though the statistical model indicated that the latter was likely to be of higher quality than the former. Overall, participants' judgments suggested that they failed to make meaningful statistical inferences.
CHILD CARE AVAILABILITY AND FIRST-BIRTH TIMING IN NORWAY*
Rindfuss, Ronald R.; Guilkey, David; Morgan, S. Philip; Kravdal, Øystein; Guzzo, Karen Benjamin
2010-01-01
Both sociological and economic theories posit that widely available, high-quality, and affordable child care should have pronatalist effects. Yet to date, the empirical evidence has not consistently supported this hypothesis. We argue that this previous empirical work has been plagued by the inability to control for endogenous placement of day care centers and the possibility that people migrate to take advantage of the availability of child care facilities. Using Norwegian register data and a statistically defensible fixed-effects model, we find strong positive effects of day care availability on the transition to motherhood. PMID:17583309
Gopnik, Alison
2012-09-28
New theoretical ideas and empirical research show that very young children's learning and thinking are strikingly similar to much learning and thinking in science. Preschoolers test hypotheses against data and make causal inferences; they learn from statistics and informal experimentation, and from watching and listening to others. The mathematical framework of probabilistic models and Bayesian inference can describe this learning in precise ways. These discoveries have implications for early childhood education and policy. In particular, they suggest both that early childhood experience is extremely important and that the trend toward more structured and academic early childhood programs is misguided.
Two Empirical Models for Land-falling Hurricane Gust Factors
NASA Technical Reports Server (NTRS)
Merceret, Franics J.
2008-01-01
Gaussian and lognormal models for gust factors as a function of height and mean windspeed in land-falling hurricanes are presented. The models were empirically derived using data from 2004 hurricanes Frances and Jeanne and independently verified using data from 2005 hurricane Wilma. The data were collected from three wind towers at Kennedy Space Center and Cape Canaveral Air Force Station with instrumentation at multiple levels from 12 to 500 feet above ground level. An additional 200-foot tower was available for the verification. Mean wind speeds from 15 to 60 knots were included in the data. The models provide formulas for the mean and standard deviation of the gust factor given the mean windspeed and height above ground. These statistics may then be used to assess the probability of exceeding a specified peak wind threshold of operational significance given a specified mean wind speed.
Management of Library Pedagogical Development: From Models to Statistics.
ERIC Educational Resources Information Center
Herron, David
Pedagogical research and empirical studies show that student learning is encouraged by a number of general characteristics displayed by teachers (Ramsden, 1992). One of them is enthusiasm for the subject and for teaching it. User-education in research libraries is often highly repetitive and contact time with students is limited. So the question…
Outlier Detection in High-Stakes Certification Testing.
ERIC Educational Resources Information Center
Meijer, Rob R.
2002-01-01
Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)
Long Term Change in Personal Income Distribution: Theoretical Approaches, Evidence and Explanations.
ERIC Educational Resources Information Center
Schultz, T. Paul
The paper discusses various models and theories of personal income distribution inequality. The first section presents the logic for adopting one conceptual and statistical approach in measuring and analyzing income inequality and the second presents empirical evidence on income inequality from 1939 to 1970. A brief survey of the human capital…
On Predictability of System Anomalies in Real World
2011-08-01
distributed system SETI @home [44]. Different from the above work, this work focuses on quantifying the predictability of real-world system anomalies. V...J.-M. Vincent, and D. Anderson, “Mining for statistical models of availability in large-scale distributed systems: An empirical study of seti @home,” in Proc. of MASCOTS, sept. 2009.
Modelling innovation performance of European regions using multi-output neural networks
Henriques, Roberto
2017-01-01
Regional innovation performance is an important indicator for decision-making regarding the implementation of policies intended to support innovation. However, patterns in regional innovation structures are becoming increasingly diverse, complex and nonlinear. To address these issues, this study aims to develop a model based on a multi-output neural network. Both intra- and inter-regional determinants of innovation performance are empirically investigated using data from the 4th and 5th Community Innovation Surveys of NUTS 2 (Nomenclature of Territorial Units for Statistics) regions. The results suggest that specific innovation strategies must be developed based on the current state of input attributes in the region. Thus, it is possible to develop appropriate strategies and targeted interventions to improve regional innovation performance. We demonstrate that support of entrepreneurship is an effective instrument of innovation policy. We also provide empirical support that both business and government R&D activity have a sigmoidal effect, implying that the most effective R&D support should be directed to regions with below-average and average R&D activity. We further show that the multi-output neural network outperforms traditional statistical and machine learning regression models. In general, therefore, it seems that the proposed model can effectively reflect both the multiple-output nature of innovation performance and the interdependency of the output attributes. PMID:28968449
Modelling innovation performance of European regions using multi-output neural networks.
Hajek, Petr; Henriques, Roberto
2017-01-01
Regional innovation performance is an important indicator for decision-making regarding the implementation of policies intended to support innovation. However, patterns in regional innovation structures are becoming increasingly diverse, complex and nonlinear. To address these issues, this study aims to develop a model based on a multi-output neural network. Both intra- and inter-regional determinants of innovation performance are empirically investigated using data from the 4th and 5th Community Innovation Surveys of NUTS 2 (Nomenclature of Territorial Units for Statistics) regions. The results suggest that specific innovation strategies must be developed based on the current state of input attributes in the region. Thus, it is possible to develop appropriate strategies and targeted interventions to improve regional innovation performance. We demonstrate that support of entrepreneurship is an effective instrument of innovation policy. We also provide empirical support that both business and government R&D activity have a sigmoidal effect, implying that the most effective R&D support should be directed to regions with below-average and average R&D activity. We further show that the multi-output neural network outperforms traditional statistical and machine learning regression models. In general, therefore, it seems that the proposed model can effectively reflect both the multiple-output nature of innovation performance and the interdependency of the output attributes.
Nuttall, David; Parkin, David; Devlin, Nancy
2015-01-01
This paper describes the development of a methodology for the case-mix adjustment of patient-reported outcome measures (PROMs) data permitting the comparison of outcomes between providers on a like-for-like basis. Statistical models that take account of provider-specific effects form the basis of the proposed case-mix adjustment methodology. Indirect standardisation provides a transparent means of case mix adjusting the PROMs data, which are updated on a monthly basis. Recently published PROMs data for patients undergoing unilateral knee replacement are used to estimate empirical models and to demonstrate the application of the proposed case-mix adjustment methodology in practice. The results are illustrative and are used to highlight a number of theoretical and empirical issues that warrant further exploration. For example, because of differences between PROMs instruments, case-mix adjustment methodologies may require instrument-specific approaches. A number of key assumptions are made in estimating the empirical models, which could be open to challenge. The covariates of post-operative health status could be expanded, and alternative econometric methods could be employed. © 2013 Crown copyright.
Tarasova, Irina A; Goloborodko, Anton A; Perlova, Tatyana Y; Pridatchenko, Marina L; Gorshkov, Alexander V; Evreinov, Victor V; Ivanov, Alexander R; Gorshkov, Mikhail V
2015-07-07
The theory of critical chromatography for biomacromolecules (BioLCCC) describes polypeptide retention in reversed-phase HPLC using the basic principles of statistical thermodynamics. However, whether this theory correctly depicts a variety of empirical observations and laws introduced for peptide chromatography over the last decades remains to be determined. In this study, by comparing theoretical results with experimental data, we demonstrate that the BioLCCC: (1) fits the empirical dependence of the polypeptide retention on the amino acid sequence length with R(2) > 0.99 and allows in silico determination of the linear regression coefficients of the log-length correction in the additive model for arbitrary sequences and lengths and (2) predicts the distribution coefficients of polypeptides with an accuracy from 0.98 to 0.99 R(2). The latter enables direct calculation of the retention factors for given solvent compositions and modeling of the migration dynamics of polypeptides separated under isocratic or gradient conditions. The obtained results demonstrate that the suggested theory correctly relates the main aspects of polypeptide separation in reversed-phase HPLC.
Expert knowledge as a foundation for the management of secretive species and their habitat
Drew, C. Ashton; Collazo, Jaime
2012-01-01
In this chapter, we share lessons learned during the elicitation and application of expert knowledge in the form of a belief network model for the habitat of a waterbird, the King Rail (Rallus elegans). A belief network is a statistical framework used to graphically represent and evaluate hypothesized cause and effect relationships among variables. Our model was a pilot project to explore the value of such a model as a tool to help the US Fish and Wildlife Service (USFWS) conserve species that lack sufficient empirical data to guide management decisions. Many factors limit the availability of empirical data that can support landscape-scale conservation planning. Globally, most species simply have not yet been subject to empirical study (Wilson 2000). Even for well-studied species, data are often restricted to specific geographic extents, to particular seasons, or to specific segments of a species’ life history. The USFWS mandates that the agency’s conservation actions (1) be coordinated across regional landscapes, (2) be founded on the best available science (with testable assumptions), and (3) support adaptive management through monitoring and assessment of action outcomes. Given limits on the available data, the concept of “best available science” in the context of conservation planning generally includes a mix of empirical data and expert knowledge (Sullivan et al. 2006).
Using a Statistical Approach to Anticipate Leaf Wetness Duration Under Climate Change in France
NASA Astrophysics Data System (ADS)
Huard, F.; Imig, A. F.; Perrin, P.
2014-12-01
Leaf wetness plays a major role in the development of fungal plant diseases. Leaf wetness duration (LWD) above a threshold value is determinant for infection and can be seen as a good indicator of impact of climate on infection occurrence and risk. As LWD is not widely measured, several methods, based on physics and empirical approach, have been developed to estimate it from weather data. Many LWD statistical models do exist, but the lack of standard for measurements require reassessments. A new empirical LWD model, called MEDHI (Modèle d'Estimation de la Durée d'Humectation à l'Inra) was developed for french configuration for wetness sensors (angle : 90°, height : 50 cm). This deployment is different from what is usually recommended from constructors or authors in other countries (angle from 10 to 60°, height from 10 to 150 cm…). MEDHI is a decision support system based on hourly climatic conditions at time steps n and n-1 taking account relative humidity, rainfall and previously simulated LWD. Air temperature, relative humidity, wind speed, rain and LWD data from several sensors with 2 configurations were measured during 6 months in Toulouse and Avignon (South West and South East of France) to calibrate MEDHI. A comparison of empirical models : NHRH (RH threshold), DPD (dew point depression), CART (classification and regression tree analysis dependant on RH, wind speed and dew point depression) and MEDHI, using meteorological and LWD measurements obtained during 5 months in Toulouse, showed that the development of this new model MEHDI was definitely better adapted to French conditions. In the context of climate change, MEDHI was used for mapping the evolution of leaf wetness duration in France from 1950 to 2100 with the French regional climate model ALADIN under different Representative Concentration Pathways (RCPs) and using a QM (Quantile-Mapping) statistical downscaling method. Results give information on the spatial distribution of infection risks during the current century. Such approach could be easily combined with thermal response curves of fungal infection for various pathogens.
Irvine, Kathryn M.; Miller, Scott; Al-Chokhachy, Robert K.; Archer, Erik; Roper, Brett B.; Kershner, Jeffrey L.
2015-01-01
Conceptual models are an integral facet of long-term monitoring programs. Proposed linkages between drivers, stressors, and ecological indicators are identified within the conceptual model of most mandated programs. We empirically evaluate a conceptual model developed for a regional aquatic and riparian monitoring program using causal models (i.e., Bayesian path analysis). We assess whether data gathered for regional status and trend estimation can also provide insights on why a stream may deviate from reference conditions. We target the hypothesized causal pathways for how anthropogenic drivers of road density, percent grazing, and percent forest within a catchment affect instream biological condition. We found instream temperature and fine sediments in arid sites and only fine sediments in mesic sites accounted for a significant portion of the maximum possible variation explainable in biological condition among managed sites. However, the biological significance of the direct effects of anthropogenic drivers on instream temperature and fine sediments were minimal or not detected. Consequently, there was weak to no biological support for causal pathways related to anthropogenic drivers’ impact on biological condition. With weak biological and statistical effect sizes, ignoring environmental contextual variables and covariates that explain natural heterogeneity would have resulted in no evidence of human impacts on biological integrity in some instances. For programs targeting the effects of anthropogenic activities, it is imperative to identify both land use practices and mechanisms that have led to degraded conditions (i.e., moving beyond simple status and trend estimation). Our empirical evaluation of the conceptual model underpinning the long-term monitoring program provided an opportunity for learning and, consequently, we discuss survey design elements that require modification to achieve question driven monitoring, a necessary step in the practice of adaptive monitoring. We suspect our situation is not unique and many programs may suffer from the same inferential disconnect. Commonly, the survey design is optimized for robust estimates of regional status and trend detection and not necessarily to provide statistical inferences on the causal mechanisms outlined in the conceptual model, even though these relationships are typically used to justify and promote the long-term monitoring of a chosen ecological indicator. Our application demonstrates a process for empirical evaluation of conceptual models and exemplifies the need for such interim assessments in order for programs to evolve and persist.
Modeling the atmospheric chemistry of TICs
NASA Astrophysics Data System (ADS)
Henley, Michael V.; Burns, Douglas S.; Chynwat, Veeradej; Moore, William; Plitz, Angela; Rottmann, Shawn; Hearn, John
2009-05-01
An atmospheric chemistry model that describes the behavior and disposition of environmentally hazardous compounds discharged into the atmosphere was coupled with the transport and diffusion model, SCIPUFF. The atmospheric chemistry model was developed by reducing a detailed atmospheric chemistry mechanism to a simple empirical effective degradation rate term (keff) that is a function of important meteorological parameters such as solar flux, temperature, and cloud cover. Empirically derived keff functions that describe the degradation of target toxic industrial chemicals (TICs) were derived by statistically analyzing data generated from the detailed chemistry mechanism run over a wide range of (typical) atmospheric conditions. To assess and identify areas to improve the developed atmospheric chemistry model, sensitivity and uncertainty analyses were performed to (1) quantify the sensitivity of the model output (TIC concentrations) with respect to changes in the input parameters and (2) improve, where necessary, the quality of the input data based on sensitivity results. The model predictions were evaluated against experimental data. Chamber data were used to remove the complexities of dispersion in the atmosphere.
Statistical Prediction of Sea Ice Concentration over Arctic
NASA Astrophysics Data System (ADS)
Kim, Jongho; Jeong, Jee-Hoon; Kim, Baek-Min
2017-04-01
In this study, a statistical method that predict sea ice concentration (SIC) over the Arctic is developed. We first calculate the Season-reliant Empirical Orthogonal Functions (S-EOFs) of monthly Arctic SIC from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data, which contain the seasonal cycles (12 months long) of dominant SIC anomaly patterns. Then, the current SIC state index is determined by projecting observed SIC anomalies for latest 12 months to the S-EOFs. Assuming the current SIC anomalies follow the spatio-temporal evolution in the S-EOFs, we project the future (upto 12 months) SIC anomalies by multiplying the SI and the corresponding S-EOF and then taking summation. The predictive skill is assessed by hindcast experiments initialized at all the months for 1980-2010. When comparing predictive skill of SIC predicted by statistical model and NCEP CFS v2, the statistical model shows a higher skill in predicting sea ice concentration and extent.
Do neural nets learn statistical laws behind natural language?
Takahashi, Shuntaro; Tanaka-Ishii, Kumiko
2017-01-01
The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.
NASA Technical Reports Server (NTRS)
Falls, L. W.
1973-01-01
This document replaces Cape Kennedy empirical wind component statistics which are presently being used for aerospace engineering applications that require component wind probabilities for various flight azimuths and selected altitudes. The normal (Gaussian) distribution is presented as an adequate statistical model to represent component winds at Cape Kennedy. Head-, tail-, and crosswind components are tabulated for all flight azimuths for altitudes from 0 to 70 km by monthly reference periods. Wind components are given for 11 selected percentiles ranging from 0.135 percent to 99,865 percent for each month. Results of statistical goodness-of-fit tests are presented to verify the use of the Gaussian distribution as an adequate model to represent component winds at Cape Kennedy, Florida.
Do neural nets learn statistical laws behind natural language?
Takahashi, Shuntaro
2017-01-01
The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks. PMID:29287076
Power-law rheology controls aftershock triggering and decay
Zhang, Xiaoming; Shcherbakov, Robert
2016-01-01
The occurrence of aftershocks is a signature of physical systems exhibiting relaxation phenomena. They are observed in various natural or experimental systems and usually obey several non-trivial empirical laws. Here we consider a cellular automaton realization of a nonlinear viscoelastic slider-block model in order to infer the physical mechanisms of triggering responsible for the occurrence of aftershocks. We show that nonlinear viscoelasticity plays a critical role in the occurrence of aftershocks. The model reproduces several empirical laws describing the statistics of aftershocks. In case of earthquakes, the proposed model suggests that the power-law rheology of the fault gauge, underlying lower crust, and upper mantle controls the decay rate of aftershocks. This is verified by analysing several prominent aftershock sequences for which the rheological properties of the underlying crust and upper mantle were established. PMID:27819355
Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel
2014-03-01
Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sheehan, D V; Sheehan, K H
1982-08-01
The history of the classification of anxiety, hysterical, and hypochondriacal disorders is reviewed. Problems in the ability of current classification schemes to predict, control, and describe the relationship between the symptoms and other phenomena are outlined. Existing classification schemes failed the first test of a good classification model--that of providing categories that are mutually exclusive. The independence of these diagnostic categories from each other does not appear to hold up on empirical testing. In the absence of inherently mutually exclusive categories, further empirical investigation of these classes is obstructed since statistically valid analysis of the nominal data and any useful multivariate analysis would be difficult if not impossible. It is concluded that the existing classifications are unsatisfactory and require some fundamental reconceptualization.
Three Empirical Strategies for Teaching Statistics
ERIC Educational Resources Information Center
Marson, Stephen M.
2007-01-01
This paper employs a three-step process to analyze three empirically supported strategies for teaching statistics to BSW students. The strategies included: repetition, immediate feedback, and use of original data. First, each strategy is addressed through the literature. Second, the application of employing each of the strategies over the period…
NASA Astrophysics Data System (ADS)
Bushra, N.; Trepanier, J. C.; Rohli, R. V.
2017-12-01
High winds, torrential rain, and storm surges from tropical cyclones (TCs) cause massive destruction to property and cost the lives of many people. The coastline of the Bay of Bengal (BoB) ranks as one of the most susceptible to TC storm surges in the world due to low-lying elevation and a high frequency of occurrence. Bangladesh suffers the most due to its geographical setting and population density. Various models have been developed to predict storm surge in this region but none of them quantify statistical risk with empirical data. This study describes the relationship and dependency between empirical TC storm surge and peak reported wind speed at the BoB using a bivariate statistical copula and data from 1885-2011. An Archimedean, Gumbel copula with margins defined by the empirical distributions is specified as the most appropriate choice for the BoB. The model provides return periods for pairs of TC storm surge and peak wind along the BoB coastline. The BoB can expect a TC with peak reported winds of at least 24 m s-1 and surge heights of at least 4.0 m, on average, once every 3.2 years, with a quartile pointwise confidence interval of 2.7-3.8 years. In addition, the BoB can expect peak reported winds of 62 m s-1 and surge heights of at least 8.0 m, on average, once every 115.4 years, with a quartile pointwise confidence interval of 55.8-381.1 years. The purpose of the analysis is to increase the understanding of these dangerous TC characteristics to reduce fatalities and monetary losses into the future. Application of the copula will mitigate future threats of storm surge impacts on coastal communities of the BoB.
A data colocation grid framework for big data medical image processing: backend design
NASA Astrophysics Data System (ADS)
Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.
2018-03-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop and HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design.
Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A
2018-03-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design
Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.
2018-01-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework’s performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available. PMID:29887668
DOE Office of Scientific and Technical Information (OSTI.GOV)
Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.
2008-10-30
The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less
Empirical evidence for acceleration-dependent amplification factors
Borcherdt, R.D.
2002-01-01
Site-specific amplification factors, Fa and Fv, used in current U.S. building codes decrease with increasing base acceleration level as implied by the Loma Prieta earthquake at 0.1g and extrapolated using numerical models and laboratory results. The Northridge earthquake recordings of 17 January 1994 and subsequent geotechnical data permit empirical estimates of amplification at base acceleration levels up to 0.5g. Distance measures and normalization procedures used to infer amplification ratios from soil-rock pairs in predetermined azimuth-distance bins significantly influence the dependence of amplification estimates on base acceleration. Factors inferred using a hypocentral distance norm do not show a statistically significant dependence on base acceleration. Factors inferred using norms implied by the attenuation functions of Abrahamson and Silva show a statistically significant decrease with increasing base acceleration. The decrease is statistically more significant for stiff clay and sandy soil (site class D) sites than for stiffer sites underlain by gravely soils and soft rock (site class C). The decrease in amplification with increasing base acceleration is more pronounced for the short-period amplification factor, Fa, than for the midperiod factor, Fv.
Bayesian modelling of lung function data from multiple-breath washout tests.
Mahar, Robert K; Carlin, John B; Ranganathan, Sarath; Ponsonby, Anne-Louise; Vuillermin, Peter; Vukcevic, Damjan
2018-05-30
Paediatric respiratory researchers have widely adopted the multiple-breath washout (MBW) test because it allows assessment of lung function in unsedated infants and is well suited to longitudinal studies of lung development and disease. However, a substantial proportion of MBW tests in infants fail current acceptability criteria. We hypothesised that a model-based approach to analysing the data, in place of traditional simple empirical summaries, would enable more efficient use of these tests. We therefore developed a novel statistical model for infant MBW data and applied it to 1197 tests from 432 individuals from a large birth cohort study. We focus on Bayesian estimation of the lung clearance index, the most commonly used summary of lung function from MBW tests. Our results show that the model provides an excellent fit to the data and shed further light on statistical properties of the standard empirical approach. Furthermore, the modelling approach enables the lung clearance index to be estimated by using tests with different degrees of completeness, something not possible with the standard approach. Our model therefore allows previously unused data to be used rather than discarded, as well as routine use of shorter tests without significant loss of precision. Beyond our specific application, our work illustrates a number of important aspects of Bayesian modelling in practice, such as the importance of hierarchical specifications to account for repeated measurements and the value of model checking via posterior predictive distributions. Copyright © 2018 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Plavchan, Peter; Bilinski, Christopher
The discovery of ''hot Jupiters'' very close to their parent stars confirmed that Jovian planets migrate inward via several potential mechanisms. We present empirical constraints on planet migration halting mechanisms. We compute model density functions of close-in exoplanets in the orbital semi-major axis-stellar mass plane to represent planet migration that is halted via several mechanisms, including the interior 1:2 resonance with the magnetospheric disk truncation radius, the interior 1:2 resonance with the dust sublimation radius, and several scenarios for tidal halting. The models differ in the predicted power-law dependence of the exoplanet orbital semi-major axis as a function of stellarmore » mass, and thus we also include a power-law model with the exponent as a free parameter. We use a Bayesian analysis to assess the model success in reproducing empirical distributions of confirmed exoplanets and Kepler candidates that orbit interior to 0.1 AU. Our results confirm a correlation of the halting distance with stellar mass. Tidal halting provides the best fit to the empirical distribution of confirmed Jovian exoplanets at a statistically robust level, consistent with the Kozai mechanism and the spin-orbit misalignment of a substantial fraction of hot Jupiters. We can rule out migration halting at the interior 1:2 resonances with the magnetospheric disk truncation radius and the interior 1:2 resonance with the dust disk sublimation radius, a uniform random distribution, and a distribution with no dependence on stellar mass. Note that our results do not rule out Type-II migration, but rather eliminate the role of a circumstellar disk in stopping exoplanet migration. For Kepler candidates, which have a more restricted range in stellar mass compared to confirmed planets, we are unable to discern between the tidal dissipation and magnetospheric disk truncation braking mechanisms at a statistically significant level. The power-law model favors exponents in the range of 0.38-0.9. This is larger than that predicted for tidal halting (0.23-0.33), which suggests that additional physics may be missing in the tidal halting theory.« less
Extracting and Converting Quantitative Data into Human Error Probabilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuan Q. Tran; Ronald L. Boring; Jeffrey C. Joe
2007-08-01
This paper discusses a proposed method using a combination of advanced statistical approaches (e.g., meta-analysis, regression, structural equation modeling) that will not only convert different empirical results into a common metric for scaling individual PSFs effects, but will also examine the complex interrelationships among PSFs. Furthermore, the paper discusses how the derived statistical estimates (i.e., effect sizes) can be mapped onto a HRA method (e.g. SPAR-H) to generate HEPs that can then be use in probabilistic risk assessment (PRA). The paper concludes with a discussion of the benefits of using academic literature in assisting HRA analysts in generating sound HEPsmore » and HRA developers in validating current HRA models and formulating new HRA models.« less
Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel
2016-07-20
A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nosedal-Sanchez, Alvaro; Jackson, Charles S.; Huerta, Gabriel
A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of fieldmore » and space covariances using "witch hat" graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.« less
NASA Astrophysics Data System (ADS)
Mejnertsen, L.; Eastwood, J. P.; Hietala, H.; Schwartz, S. J.; Chittenden, J. P.
2018-01-01
Empirical models of the Earth's bow shock are often used to place in situ measurements in context and to understand the global behavior of the foreshock/bow shock system. They are derived statistically from spacecraft bow shock crossings and typically treat the shock surface as a conic section parameterized according to a uniform solar wind ram pressure, although more complex models exist. Here a global magnetohydrodynamic simulation is used to analyze the variability of the Earth's bow shock under real solar wind conditions. The shape and location of the bow shock is found as a function of time, and this is used to calculate the shock velocity over the shock surface. The results are compared to existing empirical models. Good agreement is found in the variability of the subsolar shock location. However, empirical models fail to reproduce the two-dimensional shape of the shock in the simulation. This is because significant solar wind variability occurs on timescales less than the transit time of a single solar wind phase front over the curved shock surface. Empirical models must therefore be used with care when interpreting spacecraft data, especially when observations are made far from the Sun-Earth line. Further analysis reveals a bias to higher shock speeds when measured by virtual spacecraft. This is attributed to the fact that the spacecraft only observes the shock when it is in motion. This must be accounted for when studying bow shock motion and variability with spacecraft data.
NASA Technical Reports Server (NTRS)
Murphy, Kyle R.; Mann, Ian R.; Rae, I. Jonathan; Sibeck, David G.; Watt, Clare E. J.
2016-01-01
Wave-particle interactions play a crucial role in energetic particle dynamics in the Earths radiation belts. However, the relative importance of different wave modes in these dynamics is poorly understood. Typically, this is assessed during geomagnetic storms using statistically averaged empirical wave models as a function of geomagnetic activity in advanced radiation belt simulations. However, statistical averages poorly characterize extreme events such as geomagnetic storms in that storm-time ultralow frequency wave power is typically larger than that derived over a solar cycle and Kp is a poor proxy for storm-time wave power.
Sample and population exponents of generalized Taylor's law.
Giometto, Andrea; Formentin, Marco; Rinaldo, Andrea; Cohen, Joel E; Maritan, Amos
2015-06-23
Taylor's law (TL) states that the variance V of a nonnegative random variable is a power function of its mean M; i.e., V = aM(b). TL has been verified extensively in ecology, where it applies to population abundance, physics, and other natural sciences. Its ubiquitous empirical verification suggests a context-independent mechanism. Sample exponents b measured empirically via the scaling of sample mean and variance typically cluster around the value b = 2. Some theoretical models of population growth, however, predict a broad range of values for the population exponent b pertaining to the mean and variance of population density, depending on details of the growth process. Is the widely reported sample exponent b ≃ 2 the result of ecological processes or could it be a statistical artifact? Here, we apply large deviations theory and finite-sample arguments to show exactly that in a broad class of growth models the sample exponent is b ≃ 2 regardless of the underlying population exponent. We derive a generalized TL in terms of sample and population exponents b(jk) for the scaling of the kth vs. the jth cumulants. The sample exponent b(jk) depends predictably on the number of samples and for finite samples we obtain b(jk) ≃ k = j asymptotically in time, a prediction that we verify in two empirical examples. Thus, the sample exponent b ≃ 2 may indeed be a statistical artifact and not dependent on population dynamics under conditions that we specify exactly. Given the broad class of models investigated, our results apply to many fields where TL is used although inadequately understood.
A physical-based gas-surface interaction model for rarefied gas flow simulation
NASA Astrophysics Data System (ADS)
Liang, Tengfei; Li, Qi; Ye, Wenjing
2018-01-01
Empirical gas-surface interaction models, such as the Maxwell model and the Cercignani-Lampis model, are widely used as the boundary condition in rarefied gas flow simulations. The accuracy of these models in the prediction of macroscopic behavior of rarefied gas flows is less satisfactory in some cases especially the highly non-equilibrium ones. Molecular dynamics simulation can accurately resolve the gas-surface interaction process at atomic scale, and hence can predict accurate macroscopic behavior. They are however too computationally expensive to be applied in real problems. In this work, a statistical physical-based gas-surface interaction model, which complies with the basic relations of boundary condition, is developed based on the framework of the washboard model. In virtue of its physical basis, this new model is capable of capturing some important relations/trends for which the classic empirical models fail to model correctly. As such, the new model is much more accurate than the classic models, and in the meantime is more efficient than MD simulations. Therefore, it can serve as a more accurate and efficient boundary condition for rarefied gas flow simulations.
Balasopoulos, T; Charonis, A; Athanasakis, K; Kyriopoulos, J; Pavi, E
2017-03-01
Since 2010, the memoranda of understanding were implemented in Greece as a measure of fiscal adjustment. Public pharmaceutical expenditure was one of the main focuses of this implementation. Numerous policies, targeted on pharma spending, reduced the pharmaceutical budget by 60.5%. Yet, generics' penetration in Greece remained among the lowest among OECD countries. This study aims to highlight the factors that affect the perceptions of the population on generic drugs and to suggest effective policy measures. The empirical analysis is based on a national cross-sectional survey that was conducted through a sample of 2003 individuals, representative of the general population. Two ordinal logistic regression models were constructed in order to identify the determinants that affect the respondents' beliefs on the safety and the effectiveness of generic drugs. The empirical findings presented a positive and statistically significant correlation with income, bill payment difficulties, safety and effectiveness of drugs, prescription and dispensing preferences and the views toward pharmaceutical companies. Also, age and trust toward medical community have a positive and statistically significant correlation with the perception on the safety of generic drugs. Policy interventions are suggested on the bases of the empirical results on 3 major categories; (a) information campaigns, (b) incentives to doctors and pharmacists and (c) to strengthen the bioequivalence control framework and the dissemination of results. Copyright © 2017 Elsevier B.V. All rights reserved.
On the Determination of Poisson Statistics for Haystack Radar Observations of Orbital Debris
NASA Technical Reports Server (NTRS)
Stokely, Christopher L.; Benbrook, James R.; Horstman, Matt
2007-01-01
A convenient and powerful method is used to determine if radar detections of orbital debris are observed according to Poisson statistics. This is done by analyzing the time interval between detection events. For Poisson statistics, the probability distribution of the time interval between events is shown to be an exponential distribution. This distribution is a special case of the Erlang distribution that is used in estimating traffic loads on telecommunication networks. Poisson statistics form the basis of many orbital debris models but the statistical basis of these models has not been clearly demonstrated empirically until now. Interestingly, during the fiscal year 2003 observations with the Haystack radar in a fixed staring mode, there are no statistically significant deviations observed from that expected with Poisson statistics, either independent or dependent of altitude or inclination. One would potentially expect some significant clustering of events in time as a result of satellite breakups, but the presence of Poisson statistics indicates that such debris disperse rapidly with respect to Haystack's very narrow radar beam. An exception to Poisson statistics is observed in the months following the intentional breakup of the Fengyun satellite in January 2007.
Xu, Maoqi; Chen, Liang
2018-01-01
The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Reframing Serial Murder Within Empirical Research.
Gurian, Elizabeth A
2017-04-01
Empirical research on serial murder is limited due to the lack of consensus on a definition, the continued use of primarily descriptive statistics, and linkage to popular culture depictions. These limitations also inhibit our understanding of these offenders and affect credibility in the field of research. Therefore, this comprehensive overview of a sample of 508 cases (738 total offenders, including partnered groups of two or more offenders) provides analyses of solo male, solo female, and partnered serial killers to elucidate statistical differences and similarities in offending and adjudication patterns among the three groups. This analysis of serial homicide offenders not only supports previous research on offending patterns present in the serial homicide literature but also reveals that empirically based analyses can enhance our understanding beyond traditional case studies and descriptive statistics. Further research based on these empirical analyses can aid in the development of more accurate classifications and definitions of serial murderers.
Volatility behavior of visibility graph EMD financial time series from Ising interacting system
NASA Astrophysics Data System (ADS)
Zhang, Bo; Wang, Jun; Fang, Wen
2015-08-01
A financial market dynamics model is developed and investigated by stochastic Ising system, where the Ising model is the most popular ferromagnetic model in statistical physics systems. Applying two graph based analysis and multiscale entropy method, we investigate and compare the statistical volatility behavior of return time series and the corresponding IMF series derived from the empirical mode decomposition (EMD) method. And the real stock market indices are considered to be comparatively studied with the simulation data of the proposed model. Further, we find that the degree distribution of visibility graph for the simulation series has the power law tails, and the assortative network exhibits the mixing pattern property. All these features are in agreement with the real market data, the research confirms that the financial model established by the Ising system is reasonable.
The Effect of Graduate Education on the Performance of Air Force Officers
2007-03-01
models are estimated to examine the effects of graduate education on the retention of Captains and Majors and on promotion to Major using data from... education on the retention of Captains and Majors and on promotion to Major using data from the Active Duty Military Master File for fiscal years 1992...empirical methodology utilized. Chapter V discusses descriptive statistics and model results and their relevance to the effects of graduate education
Detection of Person Misfit in Computerized Adaptive Tests with Polytomous Items.
ERIC Educational Resources Information Center
van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.
2002-01-01
Compared the nominal and empirical null distributions of the standardized log-likelihood statistic for polytomous items for paper-and-pencil (P&P) and computerized adaptive tests (CATs). Results show that the empirical distribution of the statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Also…
Renault, Nisa K E; Pritchett, Sonja M; Howell, Robin E; Greer, Wenda L; Sapienza, Carmen; Ørstavik, Karen Helene; Hamilton, David C
2013-01-01
In eutherian mammals, one X-chromosome in every XX somatic cell is transcriptionally silenced through the process of X-chromosome inactivation (XCI). Females are thus functional mosaics, where some cells express genes from the paternal X, and the others from the maternal X. The relative abundance of the two cell populations (X-inactivation pattern, XIP) can have significant medical implications for some females. In mice, the ‘choice' of which X to inactivate, maternal or paternal, in each cell of the early embryo is genetically influenced. In humans, the timing of XCI choice and whether choice occurs completely randomly or under a genetic influence is debated. Here, we explore these questions by analysing the distribution of XIPs in large populations of normal females. Models were generated to predict XIP distributions resulting from completely random or genetically influenced choice. Each model describes the discrete primary distribution at the onset of XCI, and the continuous secondary distribution accounting for changes to the XIP as a result of development and ageing. Statistical methods are used to compare models with empirical data from Danish and Utah populations. A rigorous data treatment strategy maximises information content and allows for unbiased use of unphased XIP data. The Anderson–Darling goodness-of-fit statistics and likelihood ratio tests indicate that a model of genetically influenced XCI choice better fits the empirical data than models of completely random choice. PMID:23652377
Sanjak, Jaleal S.; Long, Anthony D.; Thornton, Kevin R.
2017-01-01
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation. PMID:28103232
Empirical models of wind conditions on Upper Klamath Lake, Oregon
Buccola, Norman L.; Wood, Tamara M.
2010-01-01
Upper Klamath Lake is a large (230 square kilometers), shallow (mean depth 2.8 meters at full pool) lake in southern Oregon. Lake circulation patterns are driven largely by wind, and the resulting currents affect the water quality and ecology of the lake. To support hydrodynamic modeling of the lake and statistical investigations of the relation between wind and lake water-quality measurements, the U.S. Geological Survey has monitored wind conditions along the lakeshore and at floating raft sites in the middle of the lake since 2005. In order to make the existing wind archive more useful, this report summarizes the development of empirical wind models that serve two purposes: (1) to fill short (on the order of hours or days) wind data gaps at raft sites in the middle of the lake, and (2) to reconstruct, on a daily basis, over periods of months to years, historical wind conditions at U.S. Geological Survey sites prior to 2005. Empirical wind models based on Artificial Neural Network (ANN) and Multivariate-Adaptive Regressive Splines (MARS) algorithms were compared. ANNs were better suited to simulating the 10-minute wind data that are the dependent variables of the gap-filling models, but the simpler MARS algorithm may be adequate to accurately simulate the daily wind data that are the dependent variables of the historical wind models. To further test the accuracy of the gap-filling models, the resulting simulated winds were used to force the hydrodynamic model of the lake, and the resulting simulated currents were compared to measurements from an acoustic Doppler current profiler. The error statistics indicated that the simulation of currents was degraded as compared to when the model was forced with observed winds, but probably is adequate for short gaps in the data of a few days or less. Transport seems to be less affected by the use of the simulated winds in place of observed winds. The simulated tracer concentration was similar between model results when simulated winds were used to force the model, and when observed winds were used to force the model, and differences between the two results did not accumulate over time.
ERIC Educational Resources Information Center
Goodboy, Alan K.
2017-01-01
For decades, instructional communication scholars have relied predominantly on cross-sectional survey methods to generate empirical associations between effective teaching and student learning. These studies typically correlate students' perceptions of their instructor's teaching behaviors with subjective self-report assessments of their own…
Three Insights from a Bayesian Interpretation of the One-Sided "P" Value
ERIC Educational Resources Information Center
Marsman, Maarten; Wagenmakers, Eric-Jan
2017-01-01
P values have been critiqued on several grounds but remain entrenched as the dominant inferential method in the empirical sciences. In this article, we elaborate on the fact that in many statistical models, the one-sided "P" value has a direct Bayesian interpretation as the approximate posterior mass for values lower than zero. The…
Use of dichotomous choice nonmarket methods to value the whooping crane resource
J. Michael Bowker; John R. Stoll
1985-01-01
A dichotomous choice form of contingent valuation is applied to quantify individuals' economic surplus associated with preservation of the whooping crane resource. Specific issues and limitations of the empirical approach are discussed. The results of this case study reveal that models with similar statistical fits can lead to very disparate measures of economic...
The Development of Rhythm at the Age of 6-11 Years: Non-Pitch Rhythmic Improvisation
ERIC Educational Resources Information Center
Paananen, Pirkko
2006-01-01
In the statistical and transcriptional analyses reported in this exploratory study, original rhythms of 6-11-year-old children (N=36) were examined. The hypotheses were based on a new model of musical development, and tested empirically using non-pitch rhythmic improvisation in a MIDI-environment. Several representational types were found in…
The Effect of Automobile Safety on Vehicle Type Choice: An Empirical Study.
ERIC Educational Resources Information Center
McCarthy, Patrick S.
An analysis was made of the extent to which the safety characteristics of new vehicles affect consumer purchase decisions. Using an extensive data set that combines vehicle data collected by the Automobile Club of Southern California Target Car Program with the responses from a national household survey of new car buyers, a statistical model of…
2013-06-01
or indicators are used as long range memory measurements. Hurst and Holder exponents are the most important and popular parameters. Traditionally...the relation between two important parameters, the Hurst exponent (measurement of global long range memory) and the Entropy (measurement of...empirical results and future study. II. BACKGROUND We recall briey the mathematical and statistical definitions and properties of the Hurst exponents
Weakly anomalous diffusion with non-Gaussian propagators
NASA Astrophysics Data System (ADS)
Cressoni, J. C.; Viswanathan, G. M.; Ferreira, A. S.; da Silva, M. A. A.
2012-08-01
A poorly understood phenomenon seen in complex systems is diffusion characterized by Hurst exponent H≈1/2 but with non-Gaussian statistics. Motivated by such empirical findings, we report an exact analytical solution for a non-Markovian random walk model that gives rise to weakly anomalous diffusion with H=1/2 but with a non-Gaussian propagator.
Rain Rate Statistics in Southern New Mexico
NASA Technical Reports Server (NTRS)
Paulic, Frank J., Jr.; Horan, Stephen
1997-01-01
The methodology used in determining empirical rain-rate distributions for Southern New Mexico in the vicinity of White Sands APT site is discussed. The hardware and the software developed to extract rain rate from the rain accumulation data collected at White Sands APT site are described. The accuracy of Crane's Global Model for rain rate predictions is analyzed.
Firm productivity, pollution, and output: theory and empirical evidence from China.
Tang, Erzi; Zhang, Jingjing; Haider, Zulfiqar
2015-11-01
Using a theoretical model, this paper argues that as firm productivity increases, there is a decrease in firm-level pollution intensity. However, as productivity increases, firms tend to increase their aggregate output, which requires the use of additional resources that increase pollution. Hence, an increase in productivity results in two opposing effects where increased productivity may in fact increase pollution created by a firm. We describe the joint effect of these two mechanisms on pollution emissions as the "productivity dilemma" of pollution emission. Based on firm-level data from China, we also empirically test this productivity dilemma hypothesis. Our empirical results suggest that, in general, firm productivity has a positive and statistically significant impact on pollution emission in China. However, the impact of productivity on pollution becomes negative when we control for increases in firm output. The empirical evidence also confirms the positive influence of productivity on output, which suggests that the main determinant of pollution is the firm's output. The empirical results provide evidence of the existence of, what we describe as, the productivity dilemma of pollution emission.
[Rank distributions in community ecology from the statistical viewpoint].
Maksimov, V N
2004-01-01
Traditional statistical methods for definition of empirical functions of abundance distribution (population, biomass, production, etc.) of species in a community are applicable for processing of multivariate data contained in the above quantitative indices of the communities. In particular, evaluation of moments of distribution suffices for convolution of the data contained in a list of species and their abundance. At the same time, the species should be ranked in the list in ascending rather than descending population and the distribution models should be analyzed on the basis of the data on abundant species only.
Empirical findings on socioeconomic determinants of fertility differentials in Costa Rica.
Carvajal, M J; Geithman, D T
1986-01-01
"This paper seeks to (1) identify socioeconomic variables that are expected to generate fertility differentials; (2) hypothesize the direction and magnitude of the effect of each variable by reference to a demand-for-children model; and (3) test empirically the model using evidence from Costa Rica. The estimates are obtained from a ten-percent systematic random sample of all Costa Rican individual-family households. There are 15,924 families in the sample...." The authors specifically seek "to capture the effects of changing relative prices and available income and time constraints on parental preferences for children. Least-squares estimates show statistically significant relationships between household fertility and opportunity cost of time, parental education, occurrence of an extended family, medical care, household sanitation, economic sector of employment, and household stock of nonhuman capital." excerpt
An empirical model to forecast solar wind velocity through statistical modeling
NASA Astrophysics Data System (ADS)
Gao, Y.; Ridley, A. J.
2013-12-01
The accurate prediction of the solar wind velocity has been a major challenge in the space weather community. Previous studies proposed many empirical and semi-empirical models to forecast the solar wind velocity based on either the historical observations, e.g. the persistence model, or the instantaneous observations of the sun, e.g. the Wang-Sheeley-Arge model. In this study, we use the one-minute WIND data from January 1995 to August 2012 to investigate and compare the performances of 4 models often used in literature, here referred to as the null model, the persistence model, the one-solar-rotation-ago model, and the Wang-Sheeley-Arge model. It is found that, measured by root mean square error, the persistence model gives the most accurate predictions within two days. Beyond two days, the Wang-Sheeley-Arge model serves as the best model, though it only slightly outperforms the null model and the one-solar-rotation-ago model. Finally, we apply the least-square regression to linearly combine the null model, the persistence model, and the one-solar-rotation-ago model to propose a 'general persistence model'. By comparing its performance against the 4 aforementioned models, it is found that the accuracy of the general persistence model outperforms the other 4 models within five days. Due to its great simplicity and superb performance, we believe that the general persistence model can serve as a benchmark in the forecast of solar wind velocity and has the potential to be modified to arrive at better models.
Puch-Solis, Roberto; Clayton, Tim
2014-07-01
The high sensitivity of the technology for producing profiles means that it has become routine to produce profiles from relatively small quantities of DNA. The profiles obtained from low template DNA (LTDNA) are affected by several phenomena which must be taken into consideration when interpreting and evaluating this evidence. Furthermore, many of the same phenomena affect profiles from higher amounts of DNA (e.g. where complex mixtures has been revealed). In this article we present a statistical model, which forms the basis of software DNA LiRa, and that is able to calculate likelihood ratios where one to four donors are postulated and for any number of replicates. The model can take into account dropin and allelic dropout for different contributors, template degradation and uncertain allele designations. In this statistical model unknown parameters are treated following the Empirical Bayesian paradigm. The performance of LiRa is tested using examples and the outputs are compared with those generated using two other statistical software packages likeLTD and LRmix. The concept of ban efficiency is introduced as a measure for assessing model sensitivity. Copyright © 2014. Published by Elsevier Ireland Ltd.
Getting the big picture in community science: methods that capture context.
Luke, Douglas A
2005-06-01
Community science has a rich tradition of using theories and research designs that are consistent with its core value of contextualism. However, a survey of empirical articles published in the American Journal of Community Psychology shows that community scientists utilize a narrow range of statistical tools that are not well suited to assess contextual data. Multilevel modeling, geographic information systems (GIS), social network analysis, and cluster analysis are recommended as useful tools to address contextual questions in community science. An argument for increased methodological consilience is presented, where community scientists are encouraged to adopt statistical methodology that is capable of modeling a greater proportion of the data than is typical with traditional methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poyer, D.A.
In this report, tests of statistical significance of five sets of variables with household energy consumption (at the point of end-use) are described. Five models, in sequence, were empirically estimated and tested for statistical significance by using the Residential Energy Consumption Survey of the US Department of Energy, Energy Information Administration. Each model incorporated additional information, embodied in a set of variables not previously specified in the energy demand system. The variable sets were generally labeled as economic variables, weather variables, household-structure variables, end-use variables, and housing-type variables. The tests of statistical significance showed each of the variable sets tomore » be highly significant in explaining the overall variance in energy consumption. The findings imply that the contemporaneous interaction of different types of variables, and not just one exclusive set of variables, determines the level of household energy consumption.« less
Armour, Cherie; O'Connor, Maja; Elklit, Ask; Elhai, Jon D
2013-10-01
The three-factor structure of posttraumatic stress disorder (PTSD) specified by the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, is not supported in the empirical literature. Two alternative four-factor models have received a wealth of empirical support. However, a consensus regarding which is superior has not been reached. A recent five-factor model has been shown to provide superior fit over the existing four-factor models. The present study investigated the fit of the five-factor model against the existing four-factor models and assessed the resultant factors' association with depression in a bereaved European trauma sample (N = 325). The participants were assessed for PTSD via the Harvard Trauma Questionnaire and depression via the Beck Depression Inventory. The five-factor model provided superior fit to the data compared with the existing four-factor models. In the dysphoric arousal model, depression was equally related to both dysphoric arousal and emotional numbing, whereas depression was more related to dysphoric arousal than to anxious arousal.
Specification of ISS Plasma Environment Variability
NASA Technical Reports Server (NTRS)
Minow, Joseph I.; Neergaard, Linda F.; Bui, Them H.; Mikatarian, Ronald R.; Barsamian, H.; Koontz, Steven L.
2004-01-01
Quantifying spacecraft charging risks and associated hazards for the International Space Station (ISS) requires a plasma environment specification for the natural variability of ionospheric temperature (Te) and density (Ne). Empirical ionospheric specification and forecast models such as the International Reference Ionosphere (IRI) model typically only provide long term (seasonal) mean Te and Ne values for the low Earth orbit environment. This paper describes a statistical analysis of historical ionospheric low Earth orbit plasma measurements from the AE-C, AE-D, and DE-2 satellites used to derive a model of deviations of observed data values from IRI-2001 estimates of Ne, Te parameters for each data point to provide a statistical basis for modeling the deviations of the plasma environment from the IRI model output. Application of the deviation model with the IRI-2001 output yields a method for estimating extreme environments for the ISS spacecraft charging analysis.
NASA Astrophysics Data System (ADS)
Krawiecki, A.
A multi-agent spin model for changes of prices in the stock market based on the Ising-like cellular automaton with interactions between traders randomly varying in time is investigated by means of Monte Carlo simulations. The structure of interactions has topology of a small-world network obtained from regular two-dimensional square lattices with various coordination numbers by randomly cutting and rewiring edges. Simulations of the model on regular lattices do not yield time series of logarithmic price returns with statistical properties comparable with the empirical ones. In contrast, in the case of networks with a certain degree of randomness for a wide range of parameters the time series of the logarithmic price returns exhibit intermittent bursting typical of volatility clustering. Also the tails of distributions of returns obey a power scaling law with exponents comparable to those obtained from the empirical data.
Empirical flow parameters : a tool for hydraulic model validity
Asquith, William H.; Burley, Thomas E.; Cleveland, Theodore G.
2013-01-01
The objectives of this project were (1) To determine and present from existing data in Texas, relations between observed stream flow, topographic slope, mean section velocity, and other hydraulic factors, to produce charts such as Figure 1 and to produce empirical distributions of the various flow parameters to provide a methodology to "check if model results are way off!"; (2) To produce a statistical regional tool to estimate mean velocity or other selected parameters for storm flows or other conditional discharges at ungauged locations (most bridge crossings) in Texas to provide a secondary way to compare such values to a conventional hydraulic modeling approach. (3.) To present ancillary values such as Froude number, stream power, Rosgen channel classification, sinuosity, and other selected characteristics (readily determinable from existing data) to provide additional information to engineers concerned with the hydraulic-soil-foundation component of transportation infrastructure.
Krueger, Robert F.; Markon, Kristian E.
2008-01-01
Research on psychopathology is at a historical crossroads. New technologies offer the promise of lasting advances in our understanding of the causes of human psychological suffering. Making the best use of these technologies, however, requires an empirically accurate model of psychopathology. Much current research is framed by the model of psychopathology portrayed in current versions of the Diagnostic and Statistical Manual of Mental Disorders (DSM; American Psychiatric Association, 2000). Although the modern DSMs have been fundamental in advancing psychopathology research, recent research also challenges some assumptions made in the DSM—for example, the assumption that all forms of psychopathology are well conceived of as discrete categories. Psychological science has a critical role to play in working through the implications of this research and the challenges it presents. In particular, behavior-genetic, personality, and quantitative-psychological research perspectives can be melded to inform the development of an empirically based model of psychopathology that would constitute an evolution of the DSM. PMID:18392116
Auxiliary Parameter MCMC for Exponential Random Graph Models
NASA Astrophysics Data System (ADS)
Byshkin, Maksym; Stivala, Alex; Mira, Antonietta; Krause, Rolf; Robins, Garry; Lomi, Alessandro
2016-11-01
Exponential random graph models (ERGMs) are a well-established family of statistical models for analyzing social networks. Computational complexity has so far limited the appeal of ERGMs for the analysis of large social networks. Efficient computational methods are highly desirable in order to extend the empirical scope of ERGMs. In this paper we report results of a research project on the development of snowball sampling methods for ERGMs. We propose an auxiliary parameter Markov chain Monte Carlo (MCMC) algorithm for sampling from the relevant probability distributions. The method is designed to decrease the number of allowed network states without worsening the mixing of the Markov chains, and suggests a new approach for the developments of MCMC samplers for ERGMs. We demonstrate the method on both simulated and actual (empirical) network data and show that it reduces CPU time for parameter estimation by an order of magnitude compared to current MCMC methods.
On the galaxy-halo connection in the EAGLE simulation
NASA Astrophysics Data System (ADS)
Desmond, Harry; Mao, Yao-Yuan; Wechsler, Risa H.; Crain, Robert A.; Schaye, Joop
2017-10-01
Empirical models of galaxy formation require assumptions about the correlations between galaxy and halo properties. These may be calibrated against observations or inferred from physical models such as hydrodynamical simulations. In this Letter, we use the EAGLE simulation to investigate the correlation of galaxy size with halo properties. We motivate this analysis by noting that the common assumption of angular momentum partition between baryons and dark matter in rotationally supported galaxies overpredicts both the spread in the stellar mass-size relation and the anticorrelation of size and velocity residuals, indicating a problem with the galaxy-halo connection it implies. We find the EAGLE galaxy population to perform significantly better on both statistics, and trace this success to the weakness of the correlations of galaxy size with halo mass, concentration and spin at fixed stellar mass. Using these correlations in empirical models will enable fine-grained aspects of galaxy scalings to be matched.
2012-01-01
Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998
Human dynamics scaling characteristics for aerial inbound logistics operation
NASA Astrophysics Data System (ADS)
Wang, Qing; Guo, Jin-Li
2010-05-01
In recent years, the study of power-law scaling characteristics of real-life networks has attracted much interest from scholars; it deviates from the Poisson process. In this paper, we take the whole process of aerial inbound operation in a logistics company as the empirical object. The main aim of this work is to study the statistical scaling characteristics of the task-restricted work patterns. We found that the statistical variables have the scaling characteristics of unimodal distribution with a power-law tail in five statistical distributions - that is to say, there obviously exists a peak in each distribution, the shape of the left part closes to a Poisson distribution, and the right part has a heavy-tailed scaling statistics. Furthermore, to our surprise, there is only one distribution where the right parts can be approximated by the power-law form with exponent α=1.50. Others are bigger than 1.50 (three of four are about 2.50, one of four is about 3.00). We then obtain two inferences based on these empirical results: first, the human behaviors probably both close to the Poisson statistics and power-law distributions on certain levels, and the human-computer interaction behaviors may be the most common in the logistics operational areas, even in the whole task-restricted work pattern areas. Second, the hypothesis in Vázquez et al. (2006) [A. Vázquez, J. G. Oliveira, Z. Dezsö, K.-I. Goh, I. Kondor, A.-L. Barabási. Modeling burst and heavy tails in human dynamics, Phys. Rev. E 73 (2006) 036127] is probably not sufficient; it claimed that human dynamics can be classified as two discrete university classes. There may be a new human dynamics mechanism that is different from the classical Barabási models.
Styron, J D; Cooper, G W; Ruiz, C L; Hahn, K D; Chandler, G A; Nelson, A J; Torres, J A; McWatters, B R; Carpenter, Ken; Bonura, M A
2014-11-01
A methodology for obtaining empirical curves relating absolute measured scintillation light output to beta energy deposited is presented. Output signals were measured from thin plastic scintillator using NIST traceable beta and gamma sources and MCNP5 was used to model the energy deposition from each source. Combining the experimental and calculated results gives the desired empirical relationships. To validate, the sensitivity of a beryllium/scintillator-layer neutron activation detector was predicted and then exposed to a known neutron fluence from a Deuterium-Deuterium fusion plasma (DD). The predicted and the measured sensitivity were in statistical agreement.
Empirical equation for predicting the surface tension of some liquid metals at their melting point
NASA Astrophysics Data System (ADS)
Ceotto, D.
2014-07-01
A new empirical equation is proposed for predicting the surface tension of some pure metals at their melting point. The investigation has been conducted adopting a statistical approach using some of the most accredited data available in literature. It is found that for Ag, Al, Au, Co, Cu, Fe, Ni, and Pb the surface tension can be conveniently expressed in function of the latent heat of fusion and of the geometrical parameters of an ideal liquid spherical drop. The equation proposed has been compared also with the model proposed by Lu and Jiang giving satisfactory agreement for the metals considered.
Garcia, Ana Maria.; Alexander, Richard B.; Arnold, Jeffrey G.; Norfleet, Lee; White, Michael J.; Robertson, Dale M.; Schwarz, Gregory E.
2016-01-01
Despite progress in the implementation of conservation practices, related improvements in water quality have been challenging to measure in larger river systems. In this paper we quantify these downstream effects by applying the empirical U.S. Geological Survey water-quality model SPARROW to investigate whether spatial differences in conservation intensity were statistically correlated with variations in nutrient loads. In contrast to other forms of water quality data analysis, the application of SPARROW controls for confounding factors such as hydrologic variability, multiple sources and environmental processes. A measure of conservation intensity was derived from the USDA-CEAP regional assessment of the Upper Mississippi River and used as an explanatory variable in a model of the Upper Midwest. The spatial pattern of conservation intensity was negatively correlated (p = 0.003) with the total nitrogen loads in streams in the basin. Total phosphorus loads were weakly negatively correlated with conservation (p = 0.25). Regional nitrogen reductions were estimated to range from 5 to 34% and phosphorus reductions from 1 to 10% in major river basins of the Upper Mississippi region. The statistical associations between conservation and nutrient loads are consistent with hydrological and biogeochemical processes such as denitrification. The results provide empirical evidence at the regional scale that conservation practices have had a larger statistically detectable effect on nitrogen than on phosphorus loadings in streams and rivers of the Upper Mississippi Basin.
García, Ana María; Alexander, Richard B; Arnold, Jeffrey G; Norfleet, Lee; White, Michael J; Robertson, Dale M; Schwarz, Gregory
2016-07-05
Despite progress in the implementation of conservation practices, related improvements in water quality have been challenging to measure in larger river systems. In this paper we quantify these downstream effects by applying the empirical U.S. Geological Survey water-quality model SPARROW to investigate whether spatial differences in conservation intensity were statistically correlated with variations in nutrient loads. In contrast to other forms of water quality data analysis, the application of SPARROW controls for confounding factors such as hydrologic variability, multiple sources and environmental processes. A measure of conservation intensity was derived from the USDA-CEAP regional assessment of the Upper Mississippi River and used as an explanatory variable in a model of the Upper Midwest. The spatial pattern of conservation intensity was negatively correlated (p = 0.003) with the total nitrogen loads in streams in the basin. Total phosphorus loads were weakly negatively correlated with conservation (p = 0.25). Regional nitrogen reductions were estimated to range from 5 to 34% and phosphorus reductions from 1 to 10% in major river basins of the Upper Mississippi region. The statistical associations between conservation and nutrient loads are consistent with hydrological and biogeochemical processes such as denitrification. The results provide empirical evidence at the regional scale that conservation practices have had a larger statistically detectable effect on nitrogen than on phosphorus loadings in streams and rivers of the Upper Mississippi Basin.
NASA Astrophysics Data System (ADS)
Fatkullin, M. N.; Solodovnikov, G. K.; Trubitsyn, V. M.
2004-01-01
The results of developing the empirical model of parameters of radio signals propagating in the inhomogeneous ionosphere at middle and high latitudes are presented. As the initial data we took the homogeneous data obtained as a result of observations carried out at the Antarctic ``Molodezhnaya'' station by the method of continuous transmission probing of the ionosphere by signals of the satellite radionavigation ``Transit'' system at coherent frequencies of 150 and 400 MHz. The data relate to the summer season period in the Southern hemisphere of the Earth in 1988-1989 during high (F > 160) activity of the Sun. The behavior of the following statistical characteristics of radio signal parameters was analyzed: (a) the interval of correlation of fluctuations of amplitudes at a frequency of 150 MHz (τkA) (b) the interval of correlation of fluctuations of the difference phase (τkϕ) and (c) the parameter characterizing frequency spectra of amplitude (PA) and phase (Pϕ) fluctuations. A third-degree polynomial was used for modeling of propagation parameters. For all above indicated propagation parameters, the coefficients of the third-degree polynomial were calculated as a function of local time and magnetic activity. The results of calculations are tabulated.
NASA Technical Reports Server (NTRS)
Donnelly, R. E. (Editor)
1980-01-01
Papers about prediction of ionospheric and radio propagation conditions based primarily on empirical or statistical relations is discussed. Predictions of sporadic E, spread F, and scintillations generally involve statistical or empirical predictions. The correlation between solar-activity and terrestrial seismic activity and the possible relation between solar activity and biological effects is discussed.
Empirical performance of interpolation techniques in risk-neutral density (RND) estimation
NASA Astrophysics Data System (ADS)
Bahaludin, H.; Abdullah, M. H.
2017-03-01
The objective of this study is to evaluate the empirical performance of interpolation techniques in risk-neutral density (RND) estimation. Firstly, the empirical performance is evaluated by using statistical analysis based on the implied mean and the implied variance of RND. Secondly, the interpolation performance is measured based on pricing error. We propose using the leave-one-out cross-validation (LOOCV) pricing error for interpolation selection purposes. The statistical analyses indicate that there are statistical differences between the interpolation techniques:second-order polynomial, fourth-order polynomial and smoothing spline. The results of LOOCV pricing error shows that interpolation by using fourth-order polynomial provides the best fitting to option prices in which it has the lowest value error.
An empirical-statistical model for laser cladding of Ti-6Al-4V powder on Ti-6Al-4V substrate
NASA Astrophysics Data System (ADS)
Nabhani, Mohammad; Razavi, Reza Shoja; Barekat, Masoud
2018-03-01
In this article, Ti-6Al-4V powder alloy was directly deposited on Ti-6Al-4V substrate using laser cladding process. In this process, some key parameters such as laser power (P), laser scanning rate (V) and powder feeding rate (F) play important roles. Using linear regression analysis, this paper develops the empirical-statistical relation between these key parameters and geometrical characteristics of single clad tracks (i.e. clad height, clad width, penetration depth, wetting angle, and dilution) as a combined parameter (PαVβFγ). The results indicated that the clad width linearly depended on PV-1/3 and powder feeding rate had no effect on it. The dilution controlled by a combined parameter as VF-1/2 and laser power was a dispensable factor. However, laser power was the dominant factor for the clad height, penetration depth, and wetting angle so that they were proportional to PV-1F1/4, PVF-1/8, and P3/4V-1F-1/4, respectively. Based on the results of correlation coefficient (R > 0.9) and analysis of residuals, it was confirmed that these empirical-statistical relations were in good agreement with the measured values of single clad tracks. Finally, these relations led to the design of a processing map that can predict the geometrical characteristics of the single clad tracks based on the key parameters.
Statistical shear lag model - unraveling the size effect in hierarchical composites.
Wei, Xiaoding; Filleter, Tobin; Espinosa, Horacio D
2015-05-01
Numerous experimental and computational studies have established that the hierarchical structures encountered in natural materials, such as the brick-and-mortar structure observed in sea shells, are essential for achieving defect tolerance. Due to this hierarchy, the mechanical properties of natural materials have a different size dependence compared to that of typical engineered materials. This study aimed to explore size effects on the strength of bio-inspired staggered hierarchical composites and to define the influence of the geometry of constituents in their outstanding defect tolerance capability. A statistical shear lag model is derived by extending the classical shear lag model to account for the statistics of the constituents' strength. A general solution emerges from rigorous mathematical derivations, unifying the various empirical formulations for the fundamental link length used in previous statistical models. The model shows that the staggered arrangement of constituents grants composites a unique size effect on mechanical strength in contrast to homogenous continuous materials. The model is applied to hierarchical yarns consisting of double-walled carbon nanotube bundles to assess its predictive capabilities for novel synthetic materials. Interestingly, the model predicts that yarn gauge length does not significantly influence the yarn strength, in close agreement with experimental observations. Copyright © 2015 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Nonlinear multi-analysis of agent-based financial market dynamics by epidemic system
NASA Astrophysics Data System (ADS)
Lu, Yunfan; Wang, Jun; Niu, Hongli
2015-10-01
Based on the epidemic dynamical system, we construct a new agent-based financial time series model. In order to check and testify its rationality, we compare the statistical properties of the time series model with the real stock market indices, Shanghai Stock Exchange Composite Index and Shenzhen Stock Exchange Component Index. For analyzing the statistical properties, we combine the multi-parameter analysis with the tail distribution analysis, the modified rescaled range analysis, and the multifractal detrended fluctuation analysis. For a better perspective, the three-dimensional diagrams are used to present the analysis results. The empirical research in this paper indicates that the long-range dependence property and the multifractal phenomenon exist in the real returns and the proposed model. Therefore, the new agent-based financial model can recurrence some important features of real stock markets.
Analyzing Dyadic Sequence Data—Research Questions and Implied Statistical Models
Fuchs, Peter; Nussbeck, Fridtjof W.; Meuwly, Nathalie; Bodenmann, Guy
2017-01-01
The analysis of observational data is often seen as a key approach to understanding dynamics in romantic relationships but also in dyadic systems in general. Statistical models for the analysis of dyadic observational data are not commonly known or applied. In this contribution, selected approaches to dyadic sequence data will be presented with a focus on models that can be applied when sample sizes are of medium size (N = 100 couples or less). Each of the statistical models is motivated by an underlying potential research question, the most important model results are presented and linked to the research question. The following research questions and models are compared with respect to their applicability using a hands on approach: (I) Is there an association between a particular behavior by one and the reaction by the other partner? (Pearson Correlation); (II) Does the behavior of one member trigger an immediate reaction by the other? (aggregated logit models; multi-level approach; basic Markov model); (III) Is there an underlying dyadic process, which might account for the observed behavior? (hidden Markov model); and (IV) Are there latent groups of dyads, which might account for observing different reaction patterns? (mixture Markov; optimal matching). Finally, recommendations for researchers to choose among the different models, issues of data handling, and advises to apply the statistical models in empirical research properly are given (e.g., in a new r-package “DySeq”). PMID:28443037
Measurement of positive direct current corona pulse in coaxial wire-cylinder gap
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yin, Han, E-mail: hanyin1986@gmail.com; Zhang, Bo, E-mail: shizbcn@mail.tsinghua.edu.cn; He, Jinliang, E-mail: hejl@tsinghua.edu.cn
In this paper, a system is designed and developed to measure the positive corona current in coaxial wire-cylinder gaps. The characteristic parameters of corona current pulses, such as the amplitude, rise time, half-wave time, and repetition frequency, are statistically analyzed and a new set of empirical formulas are derived by numerical fitting. The influence of space charges on corona currents is tested by using three corona cages with different radii. A numerical method is used to solve a simplified ion-flow model to explain the influence of space charges. Based on the statistical results, a stochastic model is developed to simulatemore » the corona pulse trains. And this model is verified by comparing the simulated frequency-domain responses with the measured ones.« less
Investigation using data from ERTS to develop and implement utilization of living marine resources
NASA Technical Reports Server (NTRS)
Stevenson, W. H. (Principal Investigator); Pastula, E. J., Jr.
1973-01-01
The author has identified the following significant results. The feasibility of utilizing ERTS-1 data in conjunction with aerial remote sensing and sea truth information to predict the distribution of menhaden in the Mississippi Sound during a specific time frame has been demonstrated by employing a number of uniquely designed empirical regression models. The construction of these models was made possible through innovative statistical routines specifically developed to meet the stated objectives.
Estimating total maximum daily loads with the Stochastic Empirical Loading and Dilution Model
Granato, Gregory; Jones, Susan Cheung
2017-01-01
The Massachusetts Department of Transportation (DOT) and the Rhode Island DOT are assessing and addressing roadway contributions to total maximum daily loads (TMDLs). Example analyses for total nitrogen, total phosphorus, suspended sediment, and total zinc in highway runoff were done by the U.S. Geological Survey in cooperation with FHWA to simulate long-term annual loads for TMDL analyses with the stochastic empirical loading and dilution model known as SELDM. Concentration statistics from 19 highway runoff monitoring sites in Massachusetts were used with precipitation statistics from 11 long-term monitoring sites to simulate long-term pavement yields (loads per unit area). Highway sites were stratified by traffic volume or surrounding land use to calculate concentration statistics for rural roads, low-volume highways, high-volume highways, and ultraurban highways. The median of the event mean concentration statistics in each traffic volume category was used to simulate annual yields from pavement for a 29- or 30-year period. Long-term average yields for total nitrogen, phosphorus, and zinc from rural roads are lower than yields from the other categories, but yields of sediment are higher than for the low-volume highways. The average yields of the selected water quality constituents from high-volume highways are 1.35 to 2.52 times the associated yields from low-volume highways. The average yields of the selected constituents from ultraurban highways are 1.52 to 3.46 times the associated yields from high-volume highways. Example simulations indicate that both concentration reduction and flow reduction by structural best management practices are crucial for reducing runoff yields.
NASA Technical Reports Server (NTRS)
Rino, C. L.; Livingston, R. C.; Whitney, H. E.
1976-01-01
This paper presents an analysis of ionospheric scintillation data which shows that the underlying statistical structure of the signal can be accurately modeled by the additive complex Gaussian perturbation predicted by the Born approximation in conjunction with an application of the central limit theorem. By making use of this fact, it is possible to estimate the in-phase, phase quadrature, and cophased scattered power by curve fitting to measured intensity histograms. By using this procedure, it is found that typically more than 80% of the scattered power is in phase quadrature with the undeviated signal component. Thus, the signal is modeled by a Gaussian, but highly non-Rician process. From simultaneous UHF and VHF data, only a weak dependence of this statistical structure on changes in the Fresnel radius is deduced. The signal variance is found to have a nonquadratic wavelength dependence. It is hypothesized that this latter effect is a subtle manifestation of locally homogeneous irregularity structures, a mathematical model proposed by Kolmogorov (1941) in his early studies of incompressible fluid turbulence.
Occupational Injury and Illness Surveillance: Conceptual Filters Explain Underreporting
Azaroff, Lenore S.; Levenstein, Charles; Wegman, David H.
2002-01-01
Occupational health surveillance data are key to effective intervention. However, the US Bureau of Labor Statistics survey significantly underestimates the incidence of work-related injuries and illnesses. Researchers supplement these statistics with data from other systems not designed for surveillance. The authors apply the filter model of Webb et al. to underreporting by the Bureau of Labor Statistics, workers’ compensation wage-replacement documents, physician reporting systems, and medical records of treatment charged to workers’ compensation. Mechanisms are described for the loss of cases at successive steps of documentation. Empirical findings indicate that workers repeatedly risk adverse consequences for attempting to complete these steps, while systems for ensuring their completion are weak or absent. PMID:12197968
Multiscale volatility duration characteristics on financial multi-continuum percolation dynamics
NASA Astrophysics Data System (ADS)
Wang, Min; Wang, Jun
A random stock price model based on the multi-continuum percolation system is developed to investigate the nonlinear dynamics of stock price volatility duration, in an attempt to explain various statistical facts found in financial data, and have a deeper understanding of mechanisms in the financial market. The continuum percolation system is usually referred to be a random coverage process or a Boolean model, it is a member of a class of statistical physics systems. In this paper, the multi-continuum percolation (with different values of radius) is employed to model and reproduce the dispersal of information among the investors. To testify the rationality of the proposed model, the nonlinear analyses of return volatility duration series are preformed by multifractal detrending moving average analysis and Zipf analysis. The comparison empirical results indicate the similar nonlinear behaviors for the proposed model and the actual Chinese stock market.
NASA Astrophysics Data System (ADS)
Dieppois, B.; Pohl, B.; Eden, J.; Crétat, J.; Rouault, M.; Keenlyside, N.; New, M. G.
2017-12-01
The water management community has hitherto neglected or underestimated many of the uncertainties in climate impact scenarios, in particular, uncertainties associated with decadal climate variability. Uncertainty in the state-of-the-art global climate models (GCMs) is time-scale-dependant, e.g. stronger at decadal than at interannual timescales, in response to the different parameterizations and to internal climate variability. In addition, non-stationarity in statistical downscaling is widely recognized as a key problem, in which time-scale dependency of predictors plays an important role. As with global climate modelling, therefore, the selection of downscaling methods must proceed with caution to avoid unintended consequences of over-correcting the noise in GCMs (e.g. interpreting internal climate variability as a model bias). GCM outputs from the Coupled Model Intercomparison Project 5 (CMIP5) have therefore first been selected based on their ability to reproduce southern African summer rainfall variability and their teleconnections with Pacific sea-surface temperature across the dominant timescales. In observations, southern African summer rainfall has recently been shown to exhibit significant periodicities at the interannual timescale (2-8 years), quasi-decadal (8-13 years) and inter-decadal (15-28 years) timescales, which can be interpret as the signature of ENSO, the IPO, and the PDO over the region. Most of CMIP5 GCMs underestimate southern African summer rainfall variability and their teleconnections with Pacific SSTs at these three timescales. In addition, according to a more in-depth analysis of historical and pi-control runs, this bias is might result from internal climate variability in some of the CMIP5 GCMs, suggesting potential for bias-corrected prediction based empirical statistical downscaling. A multi-timescale regression based downscaling procedure, which determines the predictors across the different timescales, has thus been used to simulate southern African summer rainfall. This multi-timescale procedure shows much better skills in simulating decadal timescales of variability compared to commonly used statistical downscaling approaches.
GRAM 88 - 4D GLOBAL REFERENCE ATMOSPHERE MODEL-1988
NASA Technical Reports Server (NTRS)
Johnson, D. L.
1994-01-01
The Four-D Global Reference Atmosphere program was developed from an empirical atmospheric model which generates values for pressure, density, temperature, and winds from surface level to orbital altitudes. This program can generate altitude profiles of atmospheric parameters along any simulated trajectory through the atmosphere. The program was developed for design applications in the Space Shuttle program, such as the simulation of external tank re-entry trajectories. Other potential applications are global circulation and diffusion studies; also the generation of profiles for comparison with other atmospheric measurement techniques such as satellite measured temperature profiles and infrasonic measurement of wind profiles. GRAM-88 is the latest version of the software GRAM. The software GRAM-88 contains a number of changes that have improved the model statistics, in particular, the small scale density perturbation statistics. It also corrected a low latitude grid problem as well as the SCIDAT data base. Furthermore, GRAM-88 now uses the U.S. Standard Atmosphere 1976 as a comparison standard rather than the US62 used in other versions. The program is an amalgamation of two empirical atmospheric models for the low (25km) and the high (90km) atmosphere, with a newly developed latitude-longitude dependent model for the middle atmosphere. The Jacchia (1970) model simulates the high atmospheric region above 115km. The Jacchia program sections are in separate subroutines so that other thermosphericexospheric models could easily be adapted if required for special applications. The improved code eliminated the calculation of geostrophic winds above 125 km altitude from the model. The atmospheric region between 30km and 90km is simulated by a latitude-longitude dependent empirical model modification of the latitude dependent empirical model of Groves (1971). A fairing technique between 90km and 115km accomplished a smooth transition between the modified Groves values and the Jacchia values. Below 25km the atmospheric parameters are computed by the 4-D worldwide atmospheric model of Spiegler and Fowler (1972). This data set is not included. GRAM-88 incorporates a hydrostatic/gas law check in the 0-30 km altitude range to flag and change any bad data points. Between 5km and 30km, an interpolation scheme is used between the 4-D results and the modified Groves values. The output parameters consist of components for: (1) latitude, longitude, and altitude dependent monthly and annual means, (2) quasi-biennial oscillations (QBO), and (3) random perturbations to partially simulate the variability due to synoptic, diurnal, planetary wave, and gravity wave variations. Quasi-biennial and random variation perturbations are computed from parameters determined by various empirical studies and are added to the monthly mean values. The GRAM-88 program is for batch execution on the IBM 3084. It is written in STANDARD FORTRAN 77 under the MVS/XA operating system. The IBM DISPLA graphics routines are necessary for graphical output. The program was developed in 1988.
NASA Astrophysics Data System (ADS)
Scarpa, Riccardo; Thiene, Mara; Hensher, David A.
2012-01-01
Preferences for attributes of complex goods may differ substantially among members of households. Some of these goods, such as tap water, are jointly supplied at the household level. This issue of jointness poses a series of theoretical and empirical challenges to economists engaged in empirical nonmarket valuation studies. While a series of results have already been obtained in the literature, the issue of how to empirically measure these differences, and how sensitive the results are to choice of model specification from the same data, is yet to be clearly understood. In this paper we use data from a widely employed form of stated preference survey for multiattribute goods, namely choice experiments. The salient feature of the data collection is that the same choice experiment was applied to both partners of established couples. The analysis focuses on models that simultaneously handle scale as well as preference heterogeneity in marginal rates of substitution (MRS), thereby isolating true differences between members of couples in their MRS, by removing interpersonal variation in scale. The models employed are different parameterizations of the mixed logit model, including the willingness to pay (WTP)-space model and the generalized multinomial logit model. We find that in this sample there is some evidence of significant statistical differences in values between women and men, but these are of small magnitude and only apply to a few attributes.
ERIC Educational Resources Information Center
Chang, Wei-Lung; Liu, Hsiang-Te; Lin, Tai-An; Wen, Yung-Sung
2008-01-01
The purpose of this research was to study the relationship between family communication structure, vanity trait, and related consumption behavior. The study used an empirical method with adolescent students from the northern part of Taiwan as the subjects. Multiple statistical methods and the SEM model were used for testing the hypotheses. The…
ERIC Educational Resources Information Center
Yoder, Paul; Watson, Linda R.; Lambert, Warren
2015-01-01
Eighty-seven preschoolers with autism spectrum disorders who were initially nonverbal (under 6 words in language sample and under 21 parent-reported words said) were assessed at five time points over 16 months. Statistical models that accounted for the intercorrelation among nine theoretically- and empirically-motivated predictors, as well as two…
The Effect of Education on Economic Growth in Greece over the 1960-2000 Period
ERIC Educational Resources Information Center
Tsamadias, Constantinos; Prontzas, Panagiotis
2012-01-01
This paper examines the impact of education on economic growth in Greece over the period 1960-2000 by applying the model introduced by Mankiw, Romer, and Weil. The findings of the empirical analysis reveal that education had a positive and statistically significant effect on economic growth in Greece over the period 1960-2000. The econometric…
Sedgley, Norman; Elmslie, Bruce
2011-01-01
Evidence of the importance of urban agglomeration and the offsetting effects of congestion are provided in a number of studies of productivity and wages. Little attention has been paid to this evidence in the economic growth literature, where the recent focus is on technological change. We extend the idea of agglomeration and congestion effects to the area of innovation by empirically looking for a nonlinear link between population density and patent activity. A panel data set consisting of observations on 302 USA metropolitan statistical areas (MSAs) over a 10-year period from 1990 to 1999 is utilized. Following the patent and R&D literature, models that account for the discreet nature of the dependent variable are employed. Strong evidence is found that agglomeration and congestion are important in explaining the vast differences in patent rates across US cities. The most important reason cities continue to exist, given the dramatic drop in transportation costs for physical goods over the last century, is probably related to the forces of agglomeration as they apply to knowledge spillovers. Therefore, the empirical investigation proposed here is an important part of understanding the viability of urban areas in the future.
Seven-parameter statistical model for BRDF in the UV band.
Bai, Lu; Wu, Zhensen; Zou, Xiren; Cao, Yunhua
2012-05-21
A new semi-empirical seven-parameter BRDF model is developed in the UV band using experimentally measured data. The model is based on the five-parameter model of Wu and the fourteen-parameter model of Renhorn and Boreman. Surface scatter, bulk scatter and retro-reflection scatter are considered. An optimizing modeling method, the artificial immune network genetic algorithm, is used to fit the BRDF measurement data over a wide range of incident angles. The calculation time and accuracy of the five- and seven-parameter models are compared. After fixing the seven parameters, the model can well describe scattering data in the UV band.
Financial Time Series Prediction Using Elman Recurrent Random Neural Networks
Wang, Jie; Wang, Jun; Fang, Wen; Niu, Hongli
2016-01-01
In recent years, financial market dynamics forecasting has been a focus of economic research. To predict the price indices of stock markets, we developed an architecture which combined Elman recurrent neural networks with stochastic time effective function. By analyzing the proposed model with the linear regression, complexity invariant distance (CID), and multiscale CID (MCID) analysis methods and taking the model compared with different models such as the backpropagation neural network (BPNN), the stochastic time effective neural network (STNN), and the Elman recurrent neural network (ERNN), the empirical results show that the proposed neural network displays the best performance among these neural networks in financial time series forecasting. Further, the empirical research is performed in testing the predictive effects of SSE, TWSE, KOSPI, and Nikkei225 with the established model, and the corresponding statistical comparisons of the above market indices are also exhibited. The experimental results show that this approach gives good performance in predicting the values from the stock market indices. PMID:27293423
Financial Time Series Prediction Using Elman Recurrent Random Neural Networks.
Wang, Jie; Wang, Jun; Fang, Wen; Niu, Hongli
2016-01-01
In recent years, financial market dynamics forecasting has been a focus of economic research. To predict the price indices of stock markets, we developed an architecture which combined Elman recurrent neural networks with stochastic time effective function. By analyzing the proposed model with the linear regression, complexity invariant distance (CID), and multiscale CID (MCID) analysis methods and taking the model compared with different models such as the backpropagation neural network (BPNN), the stochastic time effective neural network (STNN), and the Elman recurrent neural network (ERNN), the empirical results show that the proposed neural network displays the best performance among these neural networks in financial time series forecasting. Further, the empirical research is performed in testing the predictive effects of SSE, TWSE, KOSPI, and Nikkei225 with the established model, and the corresponding statistical comparisons of the above market indices are also exhibited. The experimental results show that this approach gives good performance in predicting the values from the stock market indices.
Statistical modelling for recurrent events: an application to sports injuries
Ullah, Shahid; Gabbett, Tim J; Finch, Caroline F
2014-01-01
Background Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data. Objective This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data. Methods Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit. Results The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models. Conclusions Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury. PMID:22872683
Compounding approach for univariate time series with nonstationary variances
NASA Astrophysics Data System (ADS)
Schäfer, Rudi; Barkhofen, Sonja; Guhr, Thomas; Stöckmann, Hans-Jürgen; Kuhl, Ulrich
2015-12-01
A defining feature of nonstationary systems is the time dependence of their statistical parameters. Measured time series may exhibit Gaussian statistics on short time horizons, due to the central limit theorem. The sample statistics for long time horizons, however, averages over the time-dependent variances. To model the long-term statistical behavior, we compound the local distribution with the distribution of its parameters. Here, we consider two concrete, but diverse, examples of such nonstationary systems: the turbulent air flow of a fan and a time series of foreign exchange rates. Our main focus is to empirically determine the appropriate parameter distribution for the compounding approach. To this end, we extract the relevant time scales by decomposing the time signals into windows and determine the distribution function of the thus obtained local variances.
Compounding approach for univariate time series with nonstationary variances.
Schäfer, Rudi; Barkhofen, Sonja; Guhr, Thomas; Stöckmann, Hans-Jürgen; Kuhl, Ulrich
2015-12-01
A defining feature of nonstationary systems is the time dependence of their statistical parameters. Measured time series may exhibit Gaussian statistics on short time horizons, due to the central limit theorem. The sample statistics for long time horizons, however, averages over the time-dependent variances. To model the long-term statistical behavior, we compound the local distribution with the distribution of its parameters. Here, we consider two concrete, but diverse, examples of such nonstationary systems: the turbulent air flow of a fan and a time series of foreign exchange rates. Our main focus is to empirically determine the appropriate parameter distribution for the compounding approach. To this end, we extract the relevant time scales by decomposing the time signals into windows and determine the distribution function of the thus obtained local variances.
Bansal, Ravi; Peterson, Bradley S
2018-06-01
Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Manning, Robert M.
1986-01-01
A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.
Statistical fluctuations in pedestrian evacuation times and the effect of social contagion
NASA Astrophysics Data System (ADS)
Nicolas, Alexandre; Bouzat, Sebastián; Kuperman, Marcelo N.
2016-08-01
Mathematical models of pedestrian evacuation and the associated simulation software have become essential tools for the assessment of the safety of public facilities and buildings. While a variety of models is now available, their calibration and test against empirical data are generally restricted to global averaged quantities; the statistics compiled from the time series of individual escapes ("microscopic" statistics) measured in recent experiments are thus overlooked. In the same spirit, much research has primarily focused on the average global evacuation time, whereas the whole distribution of evacuation times over some set of realizations should matter. In the present paper we propose and discuss the validity of a simple relation between this distribution and the microscopic statistics, which is theoretically valid in the absence of correlations. To this purpose, we develop a minimal cellular automaton, with features that afford a semiquantitative reproduction of the experimental microscopic statistics. We then introduce a process of social contagion of impatient behavior in the model and show that the simple relation under test may dramatically fail at high contagion strengths, the latter being responsible for the emergence of strong correlations in the system. We conclude with comments on the potential practical relevance for safety science of calculations based on microscopic statistics.
Computational and Statistical Models: A Comparison for Policy Modeling of Childhood Obesity
NASA Astrophysics Data System (ADS)
Mabry, Patricia L.; Hammond, Ross; Ip, Edward Hak-Sing; Huang, Terry T.-K.
As systems science methodologies have begun to emerge as a set of innovative approaches to address complex problems in behavioral, social science, and public health research, some apparent conflicts with traditional statistical methodologies for public health have arisen. Computational modeling is an approach set in context that integrates diverse sources of data to test the plausibility of working hypotheses and to elicit novel ones. Statistical models are reductionist approaches geared towards proving the null hypothesis. While these two approaches may seem contrary to each other, we propose that they are in fact complementary and can be used jointly to advance solutions to complex problems. Outputs from statistical models can be fed into computational models, and outputs from computational models can lead to further empirical data collection and statistical models. Together, this presents an iterative process that refines the models and contributes to a greater understanding of the problem and its potential solutions. The purpose of this panel is to foster communication and understanding between statistical and computational modelers. Our goal is to shed light on the differences between the approaches and convey what kinds of research inquiries each one is best for addressing and how they can serve complementary (and synergistic) roles in the research process, to mutual benefit. For each approach the panel will cover the relevant "assumptions" and how the differences in what is assumed can foster misunderstandings. The interpretations of the results from each approach will be compared and contrasted and the limitations for each approach will be delineated. We will use illustrative examples from CompMod, the Comparative Modeling Network for Childhood Obesity Policy. The panel will also incorporate interactive discussions with the audience on the issues raised here.
Transaction costs and sequential bargaining in transferable discharge permit markets.
Netusil, N R; Braden, J B
2001-03-01
Market-type mechanisms have been introduced and are being explored for various environmental programs. Several existing programs, however, have not attained the cost savings that were initially projected. Modeling that acknowledges the role of transactions costs and the discrete, bilateral, and sequential manner in which trades are executed should provide a more realistic basis for calculating potential cost savings. This paper presents empirical evidence on potential cost savings by examining a market for the abatement of sediment from farmland. Empirical results based on a market simulation model find no statistically significant change in mean abatement costs under several transaction cost levels when contracts are randomly executed. An alternative method of contract execution, gain-ranked, yields similar results. At the highest transaction cost level studied, trading reduces the total cost of compliance relative to a uniform standard that reflects current regulations.
Nonparametric spirometry reference values for Hispanic Americans.
Glenn, Nancy L; Brown, Vanessa M
2011-02-01
Recent literature sites ethnic origin as a major factor in developing pulmonary function reference values. Extensive studies established reference values for European and African Americans, but not for Hispanic Americans. The Third National Health and Nutrition Examination Survey defines Hispanic as individuals of Spanish speaking cultures. While no group was excluded from the target population, sample size requirements only allowed inclusion of individuals who identified themselves as Mexican Americans. This research constructs nonparametric reference value confidence intervals for Hispanic American pulmonary function. The method is applicable to all ethnicities. We use empirical likelihood confidence intervals to establish normal ranges for reference values. Its major advantage: it is model free, but shares asymptotic properties of model based methods. Statistical comparisons indicate that empirical likelihood interval lengths are comparable to normal theory intervals. Power and efficiency studies agree with previously published theoretical results.
NASA Astrophysics Data System (ADS)
Zhu, Xiaowei; Iungo, G. Valerio; Leonardi, Stefano; Anderson, William
2017-02-01
For a horizontally homogeneous, neutrally stratified atmospheric boundary layer (ABL), aerodynamic roughness length, z_0, is the effective elevation at which the streamwise component of mean velocity is zero. A priori prediction of z_0 based on topographic attributes remains an open line of inquiry in planetary boundary-layer research. Urban topographies - the topic of this study - exhibit spatial heterogeneities associated with variability of building height, width, and proximity with adjacent buildings; such variability renders a priori, prognostic z_0 models appealing. Here, large-eddy simulation (LES) has been used in an extensive parametric study to characterize the ABL response (and z_0) to a range of synthetic, urban-like topographies wherein statistical moments of the topography have been systematically varied. Using LES results, we determined the hierarchical influence of topographic moments relevant to setting z_0. We demonstrate that standard deviation and skewness are important, while kurtosis is negligible. This finding is reconciled with a model recently proposed by Flack and Schultz (J Fluids Eng 132:041203-1-041203-10, 2010), who demonstrate that z_0 can be modelled with standard deviation and skewness, and two empirical coefficients (one for each moment). We find that the empirical coefficient related to skewness is not constant, but exhibits a dependence on standard deviation over certain ranges. For idealized, quasi-uniform cubic topographies and for complex, fully random urban-like topographies, we demonstrate strong performance of the generalized Flack and Schultz model against contemporary roughness correlations.
Modeling of dielectric properties of aqueous salt solutions with an equation of state.
Maribo-Mogensen, Bjørn; Kontogeorgis, Georgios M; Thomsen, Kaj
2013-09-12
The static permittivity is the most important physical property for thermodynamic models that account for the electrostatic interactions between ions. The measured static permittivity in mixtures containing electrolytes is reduced due to kinetic depolarization and reorientation of the dipoles in the electrical field surrounding ions. Kinetic depolarization may explain 25-75% of the observed decrease in the permittivity of solutions containing salts, but since this is a dynamic property, this effect should not be included in the thermodynamic modeling of electrolytes. Kinetic depolarization has, however, been ignored in relation to thermodynamic modeling, and authors have either neglected the effect of salts on permittivity or used empirical correlations fitted to the measured static permittivity, leading to an overestimation of the reduction in the thermodynamic static permittivity. We present a new methodology for obtaining the static permittivity over wide ranges of temperatures, pressures, and compositions for use within an equation of state for mixed solvents containing salts. The static permittivity is calculated from a new extension of the framework developed by Onsager, Kirkwood, and Fröhlich to associating mixtures. Wertheim's association model as formulated in the statistical associating fluid theory is used to account for hydrogen-bonding molecules and ion-solvent association. Finally, we compare the Debye-Hückel Helmholtz energy obtained using an empirical model with the new physical model and show that the empirical models may introduce unphysical behavior in the equation of state.
Testing the Predictive Power of Coulomb Stress on Aftershock Sequences
NASA Astrophysics Data System (ADS)
Woessner, J.; Lombardi, A.; Werner, M. J.; Marzocchi, W.
2009-12-01
Empirical and statistical models of clustered seismicity are usually strongly stochastic and perceived to be uninformative in their forecasts, since only marginal distributions are used, such as the Omori-Utsu and Gutenberg-Richter laws. In contrast, so-called physics-based aftershock models, based on seismic rate changes calculated from Coulomb stress changes and rate-and-state friction, make more specific predictions: anisotropic stress shadows and multiplicative rate changes. We test the predictive power of models based on Coulomb stress changes against statistical models, including the popular Short Term Earthquake Probabilities and Epidemic-Type Aftershock Sequences models: We score and compare retrospective forecasts on the aftershock sequences of the 1992 Landers, USA, the 1997 Colfiorito, Italy, and the 2008 Selfoss, Iceland, earthquakes. To quantify predictability, we use likelihood-based metrics that test the consistency of the forecasts with the data, including modified and existing tests used in prospective forecast experiments within the Collaboratory for the Study of Earthquake Predictability (CSEP). Our results indicate that a statistical model performs best. Moreover, two Coulomb model classes seem unable to compete: Models based on deterministic Coulomb stress changes calculated from a given fault-slip model, and those based on fixed receiver faults. One model of Coulomb stress changes does perform well and sometimes outperforms the statistical models, but its predictive information is diluted, because of uncertainties included in the fault-slip model. Our results suggest that models based on Coulomb stress changes need to incorporate stochastic features that represent model and data uncertainty.
Distribution of tsunami interevent times
NASA Astrophysics Data System (ADS)
Geist, Eric L.; Parsons, Tom
2008-01-01
The distribution of tsunami interevent times is analyzed using global and site-specific (Hilo, Hawaii) tsunami catalogs. An empirical probability density distribution is determined by binning the observed interevent times during a period in which the observation rate is approximately constant. The empirical distributions for both catalogs exhibit non-Poissonian behavior in which there is an abundance of short interevent times compared to an exponential distribution. Two types of statistical distributions are used to model this clustering behavior: (1) long-term clustering described by a universal scaling law, and (2) Omori law decay of aftershocks and triggered sources. The empirical and theoretical distributions all imply an increased hazard rate after a tsunami, followed by a gradual decrease with time approaching a constant hazard rate. Examination of tsunami sources suggests that many of the short interevent times are caused by triggered earthquakes, though the triggered events are not necessarily on the same fault.
NASA Astrophysics Data System (ADS)
Qi, Di
Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.
An Integrative Account of Constraints on Cross-Situational Learning
Yurovsky, Daniel; Frank, Michael C.
2015-01-01
Word-object co-occurrence statistics are a powerful information source for vocabulary learning, but there is considerable debate about how learners actually use them. While some theories hold that learners accumulate graded, statistical evidence about multiple referents for each word, others suggest that they track only a single candidate referent. In two large-scale experiments, we show that neither account is sufficient: Cross-situational learning involves elements of both. Further, the empirical data are captured by a computational model that formalizes how memory and attention interact with co-occurrence tracking. Together, the data and model unify opposing positions in a complex debate and underscore the value of understanding the interaction between computational and algorithmic levels of explanation. PMID:26302052
Distribution of lod scores in oligogenic linkage analysis.
Williams, J T; North, K E; Martin, L J; Comuzzie, A G; Göring, H H; Blangero, J
2001-01-01
In variance component oligogenic linkage analysis it can happen that the residual additive genetic variance bounds to zero when estimating the effect of the ith quantitative trait locus. Using quantitative trait Q1 from the Genetic Analysis Workshop 12 simulated general population data, we compare the observed lod scores from oligogenic linkage analysis with the empirical lod score distribution under a null model of no linkage. We find that zero residual additive genetic variance in the null model alters the usual distribution of the likelihood-ratio statistic.
Central Limit Theorems for Linear Statistics of Heavy Tailed Random Matrices
NASA Astrophysics Data System (ADS)
Benaych-Georges, Florent; Guionnet, Alice; Male, Camille
2014-07-01
We show central limit theorems (CLT) for the linear statistics of symmetric matrices with independent heavy tailed entries, including entries in the domain of attraction of α-stable laws and entries with moments exploding with the dimension, as in the adjacency matrices of Erdös-Rényi graphs. For the second model, we also prove a central limit theorem of the moments of its empirical eigenvalues distribution. The limit laws are Gaussian, but unlike the case of standard Wigner matrices, the normalization is the one of the classical CLT for independent random variables.
Counts-in-cylinders in the Sloan Digital Sky Survey with Comparisons to N-body Simulations
NASA Astrophysics Data System (ADS)
Berrier, Heather D.; Barton, Elizabeth J.; Berrier, Joel C.; Bullock, James S.; Zentner, Andrew R.; Wechsler, Risa H.
2011-01-01
Environmental statistics provide a necessary means of comparing the properties of galaxies in different environments, and a vital test of models of galaxy formation within the prevailing hierarchical cosmological model. We explore counts-in-cylinders, a common statistic defined as the number of companions of a particular galaxy found within a given projected radius and redshift interval. Galaxy distributions with the same two-point correlation functions do not necessarily have the same companion count distributions. We use this statistic to examine the environments of galaxies in the Sloan Digital Sky Survey Data Release 4 (SDSS DR4). We also make preliminary comparisons to four models for the spatial distributions of galaxies, based on N-body simulations and data from SDSS DR4, to study the utility of the counts-in-cylinders statistic. There is a very large scatter between the number of companions a galaxy has and the mass of its parent dark matter halo and the halo occupation, limiting the utility of this statistic for certain kinds of environmental studies. We also show that prevalent empirical models of galaxy clustering, that match observed two- and three-point clustering statistics well, fail to reproduce some aspects of the observed distribution of counts-in-cylinders on 1, 3, and 6 h -1 Mpc scales. All models that we explore underpredict the fraction of galaxies with few or no companions in 3 and 6 h -1 Mpc cylinders. Roughly 7% of galaxies in the real universe are significantly more isolated within a 6 h -1 Mpc cylinder than the galaxies in any of the models we use. Simple phenomenological models that map galaxies to dark matter halos fail to reproduce high-order clustering statistics in low-density environments.
Creative Destruction and Subjective Well-Being
Aghion, Philippe; Akcigit, Ufuk; Deaton, Angus; Roulet, Alexandra
2017-01-01
In this paper we analyze the relationship between turnover-driven growth and subjective wellbeing. Our model of innovation-led growth and unemployment predicts that: (i) the effect of creative destruction on expected individual welfare should be unambiguously positive if we control for unemployment, less so if we do not; (ii) job creation has a positive and job destruction has a negative impact on wellbeing; (iii) job destruction has a less negative impact in US Metropolitan Statistical Areas (MSA) within states with more generous unemployment insurance policies; (iv) job creation has a more positive effect on individuals that are more forward-looking. The empirical analysis using cross-sectional MSA-level and individual-level data provide empirical support to these predictions. PMID:28713168
Incorporating Yearly Derived Winter Wheat Maps Into Winter Wheat Yield Forecasting Model
NASA Technical Reports Server (NTRS)
Skakun, S.; Franch, B.; Roger, J.-C.; Vermote, E.; Becker-Reshef, I.; Justice, C.; Santamaría-Artigas, A.
2016-01-01
Wheat is one of the most important cereal crops in the world. Timely and accurate forecast of wheat yield and production at global scale is vital in implementing food security policy. Becker-Reshef et al. (2010) developed a generalized empirical model for forecasting winter wheat production using remote sensing data and official statistics. This model was implemented using static wheat maps. In this paper, we analyze the impact of incorporating yearly wheat masks into the forecasting model. We propose a new approach of producing in season winter wheat maps exploiting satellite data and official statistics on crop area only. Validation on independent data showed that the proposed approach reached 6% to 23% of omission error and 10% to 16% of commission error when mapping winter wheat 2-3 months before harvest. In general, we found a limited impact of using yearly winter wheat masks over a static mask for the study regions.
A General Model for Estimating Macroevolutionary Landscapes.
Boucher, Florian C; Démery, Vincent; Conti, Elena; Harmon, Luke J; Uyeda, Josef
2018-03-01
The evolution of quantitative characters over long timescales is often studied using stochastic diffusion models. The current toolbox available to students of macroevolution is however limited to two main models: Brownian motion and the Ornstein-Uhlenbeck process, plus some of their extensions. Here, we present a very general model for inferring the dynamics of quantitative characters evolving under both random diffusion and deterministic forces of any possible shape and strength, which can accommodate interesting evolutionary scenarios like directional trends, disruptive selection, or macroevolutionary landscapes with multiple peaks. This model is based on a general partial differential equation widely used in statistical mechanics: the Fokker-Planck equation, also known in population genetics as the Kolmogorov forward equation. We thus call the model FPK, for Fokker-Planck-Kolmogorov. We first explain how this model can be used to describe macroevolutionary landscapes over which quantitative traits evolve and, more importantly, we detail how it can be fitted to empirical data. Using simulations, we show that the model has good behavior both in terms of discrimination from alternative models and in terms of parameter inference. We provide R code to fit the model to empirical data using either maximum-likelihood or Bayesian estimation, and illustrate the use of this code with two empirical examples of body mass evolution in mammals. FPK should greatly expand the set of macroevolutionary scenarios that can be studied since it opens the way to estimating macroevolutionary landscapes of any conceivable shape. [Adaptation; bounds; diffusion; FPK model; macroevolution; maximum-likelihood estimation; MCMC methods; phylogenetic comparative data; selection.].
Graphic analysis and multifractal on percolation-based return interval series
NASA Astrophysics Data System (ADS)
Pei, A. Q.; Wang, J.
2015-05-01
A financial time series model is developed and investigated by the oriented percolation system (one of the statistical physics systems). The nonlinear and statistical behaviors of the return interval time series are studied for the proposed model and the real stock market by applying visibility graph (VG) and multifractal detrended fluctuation analysis (MF-DFA). We investigate the fluctuation behaviors of return intervals of the model for different parameter settings, and also comparatively study these fluctuation patterns with those of the real financial data for different threshold values. The empirical research of this work exhibits the multifractal features for the corresponding financial time series. Further, the VGs deviated from both of the simulated data and the real data show the behaviors of small-world, hierarchy, high clustering and power-law tail for the degree distributions.
ERIC Educational Resources Information Center
Al-Owidha, Amjed; Green, Kathy E.; Kroger, Jane
2009-01-01
The question of whether or not a developmental continuum underlies James Marcia's identity statuses has been a topic of debate among identity researchers for nearly 20 years. This study addressed the prefatory question of whether the identity statuses can be empirically ordered in a theoretically optimal way. This question was addressed via use of…
Texture classification using autoregressive filtering
NASA Technical Reports Server (NTRS)
Lawton, W. M.; Lee, M.
1984-01-01
A general theory of image texture models is proposed and its applicability to the problem of scene segmentation using texture classification is discussed. An algorithm, based on half-plane autoregressive filtering, which optimally utilizes second order statistics to discriminate between texture classes represented by arbitrary wide sense stationary random fields is described. Empirical results of applying this algorithm to natural and sysnthesized scenes are presented and future research is outlined.
Evaluation of cancer mortality in a cohort of workers exposed to low-level radiation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lea, C.S.
1995-12-01
The purpose of this dissertation was to re-analyze existing data to explore methodologic approaches that may determine whether excess cancer mortality in the ORNL cohort can be explained by time-related factors not previously considered; grouping of cancer outcomes; selection bias due to choice of method selected to incorporate an empirical induction period; or the type of statistical model chosen.
Teodoro, Douglas; Lovis, Christian
2013-01-01
Background Antibiotic resistance is a major worldwide public health concern. In clinical settings, timely antibiotic resistance information is key for care providers as it allows appropriate targeted treatment or improved empirical treatment when the specific results of the patient are not yet available. Objective To improve antibiotic resistance trend analysis algorithms by building a novel, fully data-driven forecasting method from the combination of trend extraction and machine learning models for enhanced biosurveillance systems. Methods We investigate a robust model for extraction and forecasting of antibiotic resistance trends using a decade of microbiology data. Our method consists of breaking down the resistance time series into independent oscillatory components via the empirical mode decomposition technique. The resulting waveforms describing intrinsic resistance trends serve as the input for the forecasting algorithm. The algorithm applies the delay coordinate embedding theorem together with the k-nearest neighbor framework to project mappings from past events into the future dimension and estimate the resistance levels. Results The algorithms that decompose the resistance time series and filter out high frequency components showed statistically significant performance improvements in comparison with a benchmark random walk model. We present further qualitative use-cases of antibiotic resistance trend extraction, where empirical mode decomposition was applied to highlight the specificities of the resistance trends. Conclusion The decomposition of the raw signal was found not only to yield valuable insight into the resistance evolution, but also to produce novel models of resistance forecasters with boosted prediction performance, which could be utilized as a complementary method in the analysis of antibiotic resistance trends. PMID:23637796
Christensen, Karl Bang; Makransky, Guido; Horton, Mike
2016-01-01
The assumption of local independence is central to all item response theory (IRT) models. Violations can lead to inflated estimates of reliability and problems with construct validity. For the most widely used fit statistic Q3, there are currently no well-documented suggestions of the critical values which should be used to indicate local dependence (LD), and for this reason, a variety of arbitrary rules of thumb are used. In this study, an empirical data example and Monte Carlo simulation were used to investigate the different factors that can influence the null distribution of residual correlations, with the objective of proposing guidelines that researchers and practitioners can follow when making decisions about LD during scale development and validation. A parametric bootstrapping procedure should be implemented in each separate situation to obtain the critical value of LD applicable to the data set, and provide example critical values for a number of data structure situations. The results show that for the Q3 fit statistic, no single critical value is appropriate for all situations, as the percentiles in the empirical null distribution are influenced by the number of items, the sample size, and the number of response categories. Furthermore, the results show that LD should be considered relative to the average observed residual correlation, rather than to a uniform value, as this results in more stable percentiles for the null distribution of an adjusted fit statistic. PMID:29881087
NASA Astrophysics Data System (ADS)
Batac, Rene C.; Paguirigan, Antonino A., Jr.; Tarun, Anjali B.; Longjas, Anthony G.
2017-04-01
We propose a cellular automata model for earthquake occurrences patterned after the sandpile model of self-organized criticality (SOC). By incorporating a single parameter describing the probability to target the most susceptible site, the model successfully reproduces the statistical signatures of seismicity. The energy distributions closely follow power-law probability density functions (PDFs) with a scaling exponent of around -1. 6, consistent with the expectations of the Gutenberg-Richter (GR) law, for a wide range of the targeted triggering probability values. Additionally, for targeted triggering probabilities within the range 0.004-0.007, we observe spatiotemporal distributions that show bimodal behavior, which is not observed previously for the original sandpile. For this critical range of values for the probability, model statistics show remarkable comparison with long-period empirical data from earthquakes from different seismogenic regions. The proposed model has key advantages, the foremost of which is the fact that it simultaneously captures the energy, space, and time statistics of earthquakes by just introducing a single parameter, while introducing minimal parameters in the simple rules of the sandpile. We believe that the critical targeting probability parameterizes the memory that is inherently present in earthquake-generating regions.
Right-Sizing Statistical Models for Longitudinal Data
Wood, Phillip K.; Steinley, Douglas; Jackson, Kristina M.
2015-01-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to “right-size” the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting overly parsimonious models to more complex better fitting alternatives, and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically under-identified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A three-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation/covariation patterns. The orthogonal, free-curve slope-intercept (FCSI) growth model is considered as a general model which includes, as special cases, many models including the Factor Mean model (FM, McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, Hierarchical Linear Models (HLM), Repeated Measures MANOVA, and the Linear Slope Intercept (LinearSI) Growth Model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparison of several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507
Right-sizing statistical models for longitudinal data.
Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M
2015-12-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
NASA Astrophysics Data System (ADS)
Baral, P.; Haq, M. A.; Mangan, P.
2017-12-01
The impacts of climate change on extent of permafrost degradation in the Himalayas and its effect upon the carbon cycle and ecosystem changes are not well understood due to lack of historical ground-based observations. We have used high resolution optical and satellite radar observations and applied empirical-statistical methods for the estimation of spatial and altitudinal limits of permafrost distribution in North-Western Himalayas. Visual interpretations of morphological characteristics using high resolution optical images have been used for mapping, identification and classification of distinctive geomorphological landforms. Subsequently, we have created a detail inventory of different types of rock glaciers and studied the contribution of topo climatic factors in their occurrence and distribution through Logistic Regression modelling. This model establishes the relationship between presence of permafrost and topo-climatic factors like Mean Annual Air Temperature (MAAT), Potential Incoming Solar Radiation (PISR), altitude, aspect and slope. This relationship has been used to estimate the distributed probability of permafrost occurrence, within a GIS environment. The ability of the model to predict permafrost occurrence has been tested using locations of mapped rock glaciers and the area under the Receiver Operating Characteristic (ROC) curve. Additionally, interferometric properties of Sentinel and ALOS PALSAR datasets are used for the identification and assessment of rock glacier activity in the region.
Ghosh, Sujit K
2010-01-01
Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.
A study on some urban bus transport networks
NASA Astrophysics Data System (ADS)
Chen, Yong-Zhou; Li, Nan; He, Da-Ren
2007-03-01
In this paper, we present the empirical investigation results on the urban bus transport networks (BTNs) of four major cities in China. In BTN, nodes are bus stops. Two nodes are connected by an edge when the stops are serviced by a common bus route. The empirical results show that the degree distributions of BTNs take exponential function forms. Other two statistical properties of BTNs are also considered, and they are suggested as the distributions of so-called “the number of stops in a bus route” (represented by S) and “the number of bus routes a stop joins” (by R). The distributions of R also show exponential function forms, while the distributions of S follow asymmetric, unimodal functions. To explain these empirical results and attempt to simulate a possible evolution process of BTN, we introduce a model by which the analytic and numerical result obtained agrees well with the empirical facts. Finally, we also discuss some other possible evolution cases, where the degree distribution shows a power law or an interpolation between the power law and the exponential decay.
Nowcasting sunshine number using logistic modeling
NASA Astrophysics Data System (ADS)
Brabec, Marek; Badescu, Viorel; Paulescu, Marius
2013-04-01
In this paper, we present a formalized approach to statistical modeling of the sunshine number, binary indicator of whether the Sun is covered by clouds introduced previously by Badescu (Theor Appl Climatol 72:127-136, 2002). Our statistical approach is based on Markov chain and logistic regression and yields fully specified probability models that are relatively easily identified (and their unknown parameters estimated) from a set of empirical data (observed sunshine number and sunshine stability number series). We discuss general structure of the model and its advantages, demonstrate its performance on real data and compare its results to classical ARIMA approach as to a competitor. Since the model parameters have clear interpretation, we also illustrate how, e.g., their inter-seasonal stability can be tested. We conclude with an outlook to future developments oriented to construction of models allowing for practically desirable smooth transition between data observed with different frequencies and with a short discussion of technical problems that such a goal brings.
Protein and gene model inference based on statistical modeling in k-partite graphs.
Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter
2010-07-06
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.
McCallum, Kenneth J; Ionita-Laza, Iuliana
2015-12-01
Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.
A simple model for calculating air pollution within street canyons
NASA Astrophysics Data System (ADS)
Venegas, Laura E.; Mazzeo, Nicolás A.; Dezzutti, Mariana C.
2014-04-01
This paper introduces the Semi-Empirical Urban Street (SEUS) model. SEUS is a simple mathematical model based on the scaling of air pollution concentration inside street canyons employing the emission rate, the width of the canyon, the dispersive velocity scale and the background concentration. Dispersive velocity scale depends on turbulent motions related to wind and traffic. The parameterisations of these turbulent motions include two dimensionless empirical parameters. Functional forms of these parameters have been obtained from full scale data measured in street canyons at four European cities. The sensitivity of SEUS model is studied analytically. Results show that relative errors in the evaluation of the two dimensionless empirical parameters have less influence on model uncertainties than uncertainties in other input variables. The model estimates NO2 concentrations using a simple photochemistry scheme. SEUS is applied to estimate NOx and NO2 hourly concentrations in an irregular and busy street canyon in the city of Buenos Aires. The statistical evaluation of results shows that there is a good agreement between estimated and observed hourly concentrations (e.g. fractional bias are -10.3% for NOx and +7.8% for NO2). The agreement between the estimated and observed values has also been analysed in terms of its dependence on wind speed and direction. The model shows a better performance for wind speeds >2 m s-1 than for lower wind speeds and for leeward situations than for others. No significant discrepancies have been found between the results of the proposed model and that of a widely used operational dispersion model (OSPM), both using the same input information.
Reconstruction of stochastic temporal networks through diffusive arrival times
NASA Astrophysics Data System (ADS)
Li, Xun; Li, Xiang
2017-06-01
Temporal networks have opened a new dimension in defining and quantification of complex interacting systems. Our ability to identify and reproduce time-resolved interaction patterns is, however, limited by the restricted access to empirical individual-level data. Here we propose an inverse modelling method based on first-arrival observations of the diffusion process taking place on temporal networks. We describe an efficient coordinate-ascent implementation for inferring stochastic temporal networks that builds in particular but not exclusively on the null model assumption of mutually independent interaction sequences at the dyadic level. The results of benchmark tests applied on both synthesized and empirical network data sets confirm the validity of our algorithm, showing the feasibility of statistically accurate inference of temporal networks only from moderate-sized samples of diffusion cascades. Our approach provides an effective and flexible scheme for the temporally augmented inverse problems of network reconstruction and has potential in a broad variety of applications.
Empirical Investigations of the Opportunity Limits of Automatic Residential Electric Load Shaping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cruickshank, Robert F.; Henze, Gregor P.; Balaji, Rajagopalan
Residential electric load shaping is often modeled as infrequent, utility-initiated, short-duration deferral of peak demand through direct load control. In contrast, modeled herein is the potential for frequent, transactive, intraday, consumer-configurable load shaping for storage-capable thermostatically controlled electric loads (TCLs), including refrigerators, freezers, and hot water heaters. Unique to this study are 28 months of 15-minute-interval observations of usage in 101 homes in the Pacific Northwest United States that specify exact start, duration, and usage patterns of approximately 25 submetered loads per home. The magnitudes of the load shift from voluntarily-participating TCL appliances are aggregated to form hourly upper andmore » lower load-shaping limits for the coordination of electrical generation, transmission, distribution, storage, and demand. Empirical data are statistically analyzed to define metrics that help quantify load-shaping opportunities.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cruickshank, Robert F.; Henze, Gregor P.; Balaji, Rajagopalan
Residential electric load shaping is often modeled as infrequent, utility-initiated, short-duration deferral of peak demand through direct load control. In contrast, modeled herein is the potential for frequent, transactive, intraday, consumer-configurable load shaping for storage-capable thermostatically controlled electric loads (TCLs), including refrigerators, freezers, and hot water heaters. Unique to this study are 28 months of 15-minute-interval observations of usage in 101 homes in the Pacific Northwest United States that specify exact start, duration, and usage patterns of approximately 25 submetered loads per home. The magnitudes of the load shift from voluntarily-participating TCL appliances are aggregated to form hourly upper andmore » lower load-shaping limits for the coordination of electrical generation, transmission, distribution, storage, and demand. Empirical data are statistically analyzed to define metrics that help quantify load-shaping opportunities.« less
Reconstruction of stochastic temporal networks through diffusive arrival times
Li, Xun; Li, Xiang
2017-01-01
Temporal networks have opened a new dimension in defining and quantification of complex interacting systems. Our ability to identify and reproduce time-resolved interaction patterns is, however, limited by the restricted access to empirical individual-level data. Here we propose an inverse modelling method based on first-arrival observations of the diffusion process taking place on temporal networks. We describe an efficient coordinate-ascent implementation for inferring stochastic temporal networks that builds in particular but not exclusively on the null model assumption of mutually independent interaction sequences at the dyadic level. The results of benchmark tests applied on both synthesized and empirical network data sets confirm the validity of our algorithm, showing the feasibility of statistically accurate inference of temporal networks only from moderate-sized samples of diffusion cascades. Our approach provides an effective and flexible scheme for the temporally augmented inverse problems of network reconstruction and has potential in a broad variety of applications. PMID:28604687
Woodward, Albert; Das, Abhik; Raskin, Ira E; Morgan-Lopez, Antonio A
2006-11-01
Data from the Alcohol and Drug Services Study (ADSS) are used to analyze the structure and operation of the substance abuse treatment industry in the United States. Published literature contains little systematic empirical analysis of the interaction between organizational characteristics and treatment outcomes. This paper addresses that deficit. It develops and tests a hierarchical linear model (HLM) to address questions about the empirical relationship between treatment inputs (industry costs, types and use of counseling and medical personnel, diagnosis mix, patient demographics, and the nature and level of services used in substance abuse treatment), and patient outcomes (retention and treatment completion rates). The paper adds to the literature by demonstrating a direct and statistically significant link between treatment completion and the organizational and staffing structure of the treatment setting. Related reimbursement issues, questions for future analysis, and limitations of the ADSS for this analysis are discussed.
On the galaxy–halo connection in the EAGLE simulation
Desmond, Harry; Mao, Yao -Yuan; Wechsler, Risa H.; ...
2017-06-13
Empirical models of galaxy formation require assumptions about the correlations between galaxy and halo properties. These may be calibrated against observations or inferred from physical models such as hydrodynamical simulations. In this Letter, we use the EAGLE simulation to investigate the correlation of galaxy size with halo properties. We motivate this analysis by noting that the common assumption of angular momentum partition between baryons and dark matter in rotationally supported galaxies overpredicts both the spread in the stellar mass–size relation and the anticorrelation of size and velocity residuals, indicating a problem with the galaxy–halo connection it implies. We find themore » EAGLE galaxy population to perform significantly better on both statistics, and trace this success to the weakness of the correlations of galaxy size with halo mass, concentration and spin at fixed stellar mass. Here by, using these correlations in empirical models will enable fine-grained aspects of galaxy scalings to be matched.« less
An empirical model for estimating solar radiation in the Algerian Sahara
NASA Astrophysics Data System (ADS)
Benatiallah, Djelloul; Benatiallah, Ali; Bouchouicha, Kada; Hamouda, Messaoud; Nasri, Bahous
2018-05-01
The present work aims to determine the empirical model R.sun that will allow us to evaluate the solar radiation flues on a horizontal plane and in clear-sky on the located Adrar city (27°18 N and 0°11 W) of Algeria and compare with the results measured at the localized site. The expected results of this comparison are of importance for the investment study of solar systems (solar power plants for electricity production, CSP) and also for the design and performance analysis of any system using the solar energy. Statistical indicators used to evaluate the accuracy of the model where the mean bias error (MBE), root mean square error (RMSE) and coefficient of determination. The results show that for global radiation, the daily correlation coefficient is 0.9984. The mean absolute percentage error is 9.44 %. The daily mean bias error is -7.94 %. The daily root mean square error is 12.31 %.
Samuel, Douglas B.; Widiger, Thomas A.
2008-01-01
Theory and research have suggested that the personality disorders contained within the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR) can be understood as maladaptive variants of the personality traits included within the five-factor model (FFM). The current meta-analysis of FFM personality disorder research both replicated and extended the 2004 work of Saulsman and Page (The five-factor model and personality disorder empirical literature: A meta-analytic review. Clinical Psychology Review, 23, 1055-1085) through a facet-level analysis that provides a more specific and nuanced description of each DSM-IV-TR personality disorder. The empirical FFM profiles generated for each personality disorder were generally congruent at the facet level with hypothesized FFM translations of the DSM-IV-TR personality disorders. However, notable exceptions to the hypotheses did occur and even some findings that were consistent with FFM theory could be said to be instrument specific. PMID:18708274
Modeling, simulation, and estimation of optical turbulence
NASA Astrophysics Data System (ADS)
Formwalt, Byron Paul
This dissertation documents three new contributions to simulation and modeling of optical turbulence. The first contribution is the formalization, optimization, and validation of a modeling technique called successively conditioned rendering (SCR). The SCR technique is empirically validated by comparing the statistical error of random phase screens generated with the technique. The second contribution is the derivation of the covariance delineation theorem, which provides theoretical bounds on the error associated with SCR. It is shown empirically that the theoretical bound may be used to predict relative algorithm performance. Therefore, the covariance delineation theorem is a powerful tool for optimizing SCR algorithms. For the third contribution, we introduce a new method for passively estimating optical turbulence parameters, and demonstrate the method using experimental data. The technique was demonstrated experimentally, using a 100 m horizontal path at 1.25 m above sun-heated tarmac on a clear afternoon. For this experiment, we estimated C2n ≈ 6.01 · 10-9 m-23 , l0 ≈ 17.9 mm, and L0 ≈ 15.5 m.
Estimating and Identifying Unspecified Correlation Structure for Longitudinal Data
Hu, Jianhua; Wang, Peng; Qu, Annie
2014-01-01
Identifying correlation structure is important to achieving estimation efficiency in analyzing longitudinal data, and is also crucial for drawing valid statistical inference for large size clustered data. In this paper, we propose a nonparametric method to estimate the correlation structure, which is applicable for discrete longitudinal data. We utilize eigenvector-based basis matrices to approximate the inverse of the empirical correlation matrix and determine the number of basis matrices via model selection. A penalized objective function based on the difference between the empirical and model approximation of the correlation matrices is adopted to select an informative structure for the correlation matrix. The eigenvector representation of the correlation estimation is capable of reducing the risk of model misspecification, and also provides useful information on the specific within-cluster correlation pattern of the data. We show that the proposed method possesses the oracle property and selects the true correlation structure consistently. The proposed method is illustrated through simulations and two data examples on air pollution and sonar signal studies. PMID:26361433
On the galaxy–halo connection in the EAGLE simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desmond, Harry; Mao, Yao -Yuan; Wechsler, Risa H.
Empirical models of galaxy formation require assumptions about the correlations between galaxy and halo properties. These may be calibrated against observations or inferred from physical models such as hydrodynamical simulations. In this Letter, we use the EAGLE simulation to investigate the correlation of galaxy size with halo properties. We motivate this analysis by noting that the common assumption of angular momentum partition between baryons and dark matter in rotationally supported galaxies overpredicts both the spread in the stellar mass–size relation and the anticorrelation of size and velocity residuals, indicating a problem with the galaxy–halo connection it implies. We find themore » EAGLE galaxy population to perform significantly better on both statistics, and trace this success to the weakness of the correlations of galaxy size with halo mass, concentration and spin at fixed stellar mass. Here by, using these correlations in empirical models will enable fine-grained aspects of galaxy scalings to be matched.« less
EMERGE - an empirical model for the formation of galaxies since z ˜ 10
NASA Astrophysics Data System (ADS)
Moster, Benjamin P.; Naab, Thorsten; White, Simon D. M.
2018-06-01
We present EMERGE, an Empirical ModEl for the foRmation of GalaxiEs, describing the evolution of individual galaxies in large volumes from z ˜ 10 to the present day. We assign a star formation rate to each dark matter halo based on its growth rate, which specifies how much baryonic material becomes available, and the instantaneous baryon conversion efficiency, which determines how efficiently this material is converted to stars, thereby capturing the baryonic physics. Satellites are quenched following the delayed-then-rapid model, and they are tidally disrupted once their subhalo has lost a significant fraction of its mass. The model is constrained with observed data extending out to high redshift. The empirical relations are very flexible, and the model complexity is increased only if required by the data, assessed by several model selection statistics. We find that for the same final halo mass galaxies can have very different star formation histories. Galaxies that are quenched at z = 0 typically have a higher peak star formation rate compared to their star-forming counterparts. EMERGE predicts stellar-to-halo mass ratios for individual galaxies and introduces scatter self-consistently. We find that at fixed halo mass, passive galaxies have a higher stellar mass on average. The intracluster mass in massive haloes can be up to eight times larger than the mass of the central galaxy. Clustering for star-forming and quenched galaxies is in good agreement with observational constraints, indicating a realistic assignment of galaxies to haloes.
Teaching Integrity in Empirical Research: A Protocol for Documenting Data Management and Analysis
ERIC Educational Resources Information Center
Ball, Richard; Medeiros, Norm
2012-01-01
This article describes a protocol the authors developed for teaching undergraduates to document their statistical analyses for empirical research projects so that their results are completely reproducible and verifiable. The protocol is guided by the principle that the documentation prepared to accompany an empirical research project should be…
Empirical investigation into depth-resolution of Magnetotelluric data
NASA Astrophysics Data System (ADS)
Piana Agostinetti, N.; Ogaya, X.
2017-12-01
We investigate the depth-resolution of MT data comparing reconstructed 1D resistivity profiles with measured resistivity and lithostratigraphy from borehole data. Inversion of MT data has been widely used to reconstruct the 1D fine-layered resistivity structure beneath an isolated Magnetotelluric (MT) station. Uncorrelated noise is generally assumed to be associated to MT data. However, wrong assumptions on error statistics have been proved to strongly bias the results obtained in geophysical inversions. In particular the number of resolved layers at depth strongly depends on error statistics. In this study, we applied a trans-dimensional McMC algorithm for reconstructing the 1D resistivity profile near-by the location of a 1500 m-deep borehole, using MT data. We resolve the MT inverse problem imposing different models for the error statistics associated to the MT data. Following a Hierachical Bayes' approach, we also inverted for the hyper-parameters associated to each error statistics model. Preliminary results indicate that assuming un-correlated noise leads to a number of resolved layers larger than expected from the retrieved lithostratigraphy. Moreover, comparing the inversion of synthetic resistivity data obtained from the "true" resistivity stratification measured along the borehole shows that a consistent number of resistivity layers can be obtained using a Gaussian model for the error statistics, with substantial correlation length.
NASA Astrophysics Data System (ADS)
Balthazar, Vincent; Vanacker, Veerle; Lambin, Eric F.
2012-08-01
A topographic correction of optical remote sensing data is necessary to improve the quality of quantitative forest cover change analyses in mountainous terrain. The implementation of semi-empirical correction methods requires the calibration of model parameters that are empirically defined. This study develops a method to improve the performance of topographic corrections for forest cover change detection in mountainous terrain through an iterative tuning method of model parameters based on a systematic evaluation of the performance of the correction. The latter was based on: (i) the general matching of reflectances between sunlit and shaded slopes and (ii) the occurrence of abnormal reflectance values, qualified as statistical outliers, in very low illuminated areas. The method was tested on Landsat ETM+ data for rough (Ecuadorian Andes) and very rough mountainous terrain (Bhutan Himalayas). Compared to a reference level (no topographic correction), the ATCOR3 semi-empirical correction method resulted in a considerable reduction of dissimilarities between reflectance values of forested sites in different topographic orientations. Our results indicate that optimal parameter combinations are depending on the site, sun elevation and azimuth and spectral conditions. We demonstrate that the results of relatively simple topographic correction methods can be greatly improved through a feedback loop between parameter tuning and evaluation of the performance of the correction model.
Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar
2016-03-01
Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.
Visual aftereffects and sensory nonlinearities from a single statistical framework
Laparra, Valero; Malo, Jesús
2015-01-01
When adapted to a particular scenery our senses may fool us: colors are misinterpreted, certain spatial patterns seem to fade out, and static objects appear to move in reverse. A mere empirical description of the mechanisms tuned to color, texture, and motion may tell us where these visual illusions come from. However, such empirical models of gain control do not explain why these mechanisms work in this apparently dysfunctional manner. Current normative explanations of aftereffects based on scene statistics derive gain changes by (1) invoking decorrelation and linear manifold matching/equalization, or (2) using nonlinear divisive normalization obtained from parametric scene models. These principled approaches have different drawbacks: the first is not compatible with the known saturation nonlinearities in the sensors and it cannot fully accomplish information maximization due to its linear nature. In the second, gain change is almost determined a priori by the assumed parametric image model linked to divisive normalization. In this study we show that both the response changes that lead to aftereffects and the nonlinear behavior can be simultaneously derived from a single statistical framework: the Sequential Principal Curves Analysis (SPCA). As opposed to mechanistic models, SPCA is not intended to describe how physiological sensors work, but it is focused on explaining why they behave as they do. Nonparametric SPCA has two key advantages as a normative model of adaptation: (i) it is better than linear techniques as it is a flexible equalization that can be tuned for more sensible criteria other than plain decorrelation (either full information maximization or error minimization); and (ii) it makes no a priori functional assumption regarding the nonlinearity, so the saturations emerge directly from the scene data and the goal (and not from the assumed function). It turns out that the optimal responses derived from these more sensible criteria and SPCA are consistent with dysfunctional behaviors such as aftereffects. PMID:26528165
Probabilistic empirical prediction of seasonal climate: evaluation and potential applications
NASA Astrophysics Data System (ADS)
Dieppois, B.; Eden, J.; van Oldenborgh, G. J.
2017-12-01
Preparing for episodes with risks of anomalous weather a month to a year ahead is an important challenge for governments, non-governmental organisations, and private companies and is dependent on the availability of reliable forecasts. The majority of operational seasonal forecasts are made using process-based dynamical models, which are complex, computationally challenging and prone to biases. Empirical forecast approaches built on statistical models to represent physical processes offer an alternative to dynamical systems and can provide either a benchmark for comparison or independent supplementary forecasts. Here, we present a new evaluation of an established empirical system used to predict seasonal climate across the globe. Forecasts for surface air temperature, precipitation and sea level pressure are produced by the KNMI Probabilistic Empirical Prediction (K-PREP) system every month and disseminated via the KNMI Climate Explorer (climexp.knmi.nl). K-PREP is based on multiple linear regression and built on physical principles to the fullest extent with predictive information taken from the global CO2-equivalent concentration, large-scale modes of variability in the climate system and regional-scale information. K-PREP seasonal forecasts for the period 1981-2016 will be compared with corresponding dynamically generated forecasts produced by operational forecast systems. While there are many regions of the world where empirical forecast skill is extremely limited, several areas are identified where K-PREP offers comparable skill to dynamical systems. We discuss two key points in the future development and application of the K-PREP system: (a) the potential for K-PREP to provide a more useful basis for reference forecasts than those based on persistence or climatology, and (b) the added value of including K-PREP forecast information in multi-model forecast products, at least for known regions of good skill. We also discuss the potential development of stakeholder-driven applications of the K-PREP system, including empirical forecasts for circumboreal fire activity.
Schwindt, Adam R; Winkelman, Dana L
2016-09-01
Urban freshwater streams in arid climates are wastewater effluent dominated ecosystems particularly impacted by bioactive chemicals including steroid estrogens that disrupt vertebrate reproduction. However, more understanding of the population and ecological consequences of exposure to wastewater effluent is needed. We used empirically derived vital rate estimates from a mesocosm study to develop a stochastic stage-structured population model and evaluated the effect of 17α-ethinylestradiol (EE2), the estrogen in human contraceptive pills, on fathead minnow Pimephales promelas stochastic population growth rate. Tested EE2 concentrations ranged from 3.2 to 10.9 ng L(-1) and produced stochastic population growth rates (λ S ) below 1 at the lowest concentration, indicating potential for population decline. Declines in λ S compared to controls were evident in treatments that were lethal to adult males despite statistically insignificant effects on egg production and juvenile recruitment. In fact, results indicated that λ S was most sensitive to the survival of juveniles and female egg production. More broadly, our results document that population model results may differ even when empirically derived estimates of vital rates are similar among experimental treatments, and demonstrate how population models integrate and project the effects of stressors throughout the life cycle. Thus, stochastic population models can more effectively evaluate the ecological consequences of experimentally derived vital rates.
Synthetic Earthquake Statistics From Physical Fault Models for the Lower Rhine Embayment
NASA Astrophysics Data System (ADS)
Brietzke, G. B.; Hainzl, S.; Zöller, G.
2012-04-01
As of today, seismic risk and hazard estimates mostly use pure empirical, stochastic models of earthquake fault systems tuned specifically to the vulnerable areas of interest. Although such models allow for reasonable risk estimates they fail to provide a link between the observed seismicity and the underlying physical processes. Solving a state-of-the-art fully dynamic description set of all relevant physical processes related to earthquake fault systems is likely not useful since it comes with a large number of degrees of freedom, poor constraints on its model parameters and a huge computational effort. Here, quasi-static and quasi-dynamic physical fault simulators provide a compromise between physical completeness and computational affordability and aim at providing a link between basic physical concepts and statistics of seismicity. Within the framework of quasi-static and quasi-dynamic earthquake simulators we investigate a model of the Lower Rhine Embayment (LRE) that is based upon seismological and geological data. We present and discuss statistics of the spatio-temporal behavior of generated synthetic earthquake catalogs with respect to simplification (e.g. simple two-fault cases) as well as to complication (e.g. hidden faults, geometric complexity, heterogeneities of constitutive parameters).
Improved estimation of PM2.5 using Lagrangian satellite-measured aerosol optical depth
NASA Astrophysics Data System (ADS)
Olivas Saunders, Rolando
Suspended particulate matter (aerosols) with aerodynamic diameters less than 2.5 mum (PM2.5) has negative effects on human health, plays an important role in climate change and also causes the corrosion of structures by acid deposition. Accurate estimates of PM2.5 concentrations are thus relevant in air quality, epidemiology, cloud microphysics and climate forcing studies. Aerosol optical depth (AOD) retrieved by the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument has been used as an empirical predictor to estimate ground-level concentrations of PM2.5 . These estimates usually have large uncertainties and errors. The main objective of this work is to assess the value of using upwind (Lagrangian) MODIS-AOD as predictors in empirical models of PM2.5. The upwind locations of the Lagrangian AOD were estimated using modeled backward air trajectories. Since the specification of an arrival elevation is somewhat arbitrary, trajectories were calculated to arrive at four different elevations at ten measurement sites within the continental United States. A systematic examination revealed trajectory model calculations to be sensitive to starting elevation. With a 500 m difference in starting elevation, the 48-hr mean horizontal separation of trajectory endpoints was 326 km. When the difference in starting elevation was doubled and tripled to 1000 m and 1500m, the mean horizontal separation of trajectory endpoints approximately doubled and tripled to 627 km and 886 km, respectively. A seasonal dependence of this sensitivity was also found: the smallest mean horizontal separation of trajectory endpoints was exhibited during the summer and the largest separations during the winter. A daily average AOD product was generated and coupled to the trajectory model in order to determine AOD values upwind of the measurement sites during the period 2003-2007. Empirical models that included in situ AOD and upwind AOD as predictors of PM2.5 were generated by multivariate linear regressions using the least squares method. The multivariate models showed improved performance over the single variable regression (PM2.5 and in situ AOD) models. The statistical significance of the improvement of the multivariate models over the single variable regression models was tested using the extra sum of squares principle. In many cases, even when the R-squared was high for the multivariate models, the improvement over the single models was not statistically significant. The R-squared of these multivariate models varied with respect to seasons, with the best performance occurring during the summer months. A set of seasonal categorical variables was included in the regressions to exploit this variability. The multivariate regression models that included these categorical seasonal variables performed better than the models that didn't account for seasonal variability. Furthermore, 71% of these regressions exhibited improvement over the single variable models that was statistically significant at a 95% confidence level.
Model of mobile agents for sexual interactions networks
NASA Astrophysics Data System (ADS)
González, M. C.; Lind, P. G.; Herrmann, H. J.
2006-02-01
We present a novel model to simulate real social networks of complex interactions, based in a system of colliding particles (agents). The network is build by keeping track of the collisions and evolves in time with correlations which emerge due to the mobility of the agents. Therefore, statistical features are a consequence only of local collisions among its individual agents. Agent dynamics is realized by an event-driven algorithm of collisions where energy is gained as opposed to physical systems which have dissipation. The model reproduces empirical data from networks of sexual interactions, not previously obtained with other approaches.
Stochastic modelling of non-stationary financial assets
NASA Astrophysics Data System (ADS)
Estevens, Joana; Rocha, Paulo; Boto, João P.; Lind, Pedro G.
2017-11-01
We model non-stationary volume-price distributions with a log-normal distribution and collect the time series of its two parameters. The time series of the two parameters are shown to be stationary and Markov-like and consequently can be modelled with Langevin equations, which are derived directly from their series of values. Having the evolution equations of the log-normal parameters, we reconstruct the statistics of the first moments of volume-price distributions which fit well the empirical data. Finally, the proposed framework is general enough to study other non-stationary stochastic variables in other research fields, namely, biology, medicine, and geology.
The non-equilibrium nature of culinary evolution
NASA Astrophysics Data System (ADS)
Kinouchi, Osame; Diez-Garcia, Rosa W.; Holanda, Adriano J.; Zambianchi, Pedro; Roque, Antonio C.
2008-07-01
Food is an essential part of civilization, with a scope that ranges from the biological to the economic and cultural levels. Here, we study the statistics of ingredients and recipes taken from Brazilian, British, French and Medieval cookery books. We find universal distributions with scale invariant behaviour. We propose a copy-mutate process to model culinary evolution that fits our empirical data very well. We find a cultural 'founder effect' produced by the non-equilibrium dynamics of the model. Both the invariant and idiosyncratic aspects of culture are accounted for by our model, which may have applications in other kinds of evolutionary processes.
NASA Astrophysics Data System (ADS)
Mohite, A. R.; Beria, H.; Behera, A. K.; Chatterjee, C.; Singh, R.
2016-12-01
Flood forecasting using hydrological models is an important and cost-effective non-structural flood management measure. For forecasting at short lead times, empirical models using real-time precipitation estimates have proven to be reliable. However, their skill depreciates with increasing lead time. Coupling a hydrologic model with real-time rainfall forecasts issued from numerical weather prediction (NWP) systems could increase the lead time substantially. In this study, we compared 1-5 days precipitation forecasts from India Meteorological Department (IMD) Multi-Model Ensemble (MME) with European Center for Medium Weather forecast (ECMWF) NWP forecasts for over 86 major river basins in India. We then evaluated the hydrologic utility of these forecasts over Basantpur catchment (approx. 59,000 km2) of the Mahanadi River basin. Coupled MIKE 11 RR (NAM) and MIKE 11 hydrodynamic (HD) models were used for the development of flood forecast system (FFS). RR model was calibrated using IMD station rainfall data. Cross-sections extracted from SRTM 30 were used as input to the MIKE 11 HD model. IMD started issuing operational MME forecasts from the year 2008, and hence, both the statistical and hydrologic evaluation were carried out from 2008-2014. The performance of FFS was evaluated using both the NWP datasets separately for the year 2011, which was a large flood year in Mahanadi River basin. We will present figures and metrics for statistical (threshold based statistics, skill in terms of correlation and bias) and hydrologic (Nash Sutcliffe efficiency, mean and peak error statistics) evaluation. The statistical evaluation will be at pan-India scale for all the major river basins and the hydrologic evaluation will be for the Basantpur catchment of the Mahanadi River basin.
Colloquium: Statistical mechanics of money, wealth, and income
NASA Astrophysics Data System (ADS)
Yakovenko, Victor M.; Rosser, J. Barkley, Jr.
2009-10-01
This Colloquium reviews statistical models for money, wealth, and income distributions developed in the econophysics literature since the late 1990s. By analogy with the Boltzmann-Gibbs distribution of energy in physics, it is shown that the probability distribution of money is exponential for certain classes of models with interacting economic agents. Alternative scenarios are also reviewed. Data analysis of the empirical distributions of wealth and income reveals a two-class distribution. The majority of the population belongs to the lower class, characterized by the exponential (“thermal”) distribution, whereas a small fraction of the population in the upper class is characterized by the power-law (“superthermal”) distribution. The lower part is very stable, stationary in time, whereas the upper part is highly dynamical and out of equilibrium.
What Can Be Learned from Inverse Statistics?
NASA Astrophysics Data System (ADS)
Ahlgren, Peter Toke Heden; Dahl, Henrik; Jensen, Mogens Høgh; Simonsen, Ingve
One stylized fact of financial markets is an asymmetry between the most likely time to profit and to loss. This gain-loss asymmetry is revealed by inverse statistics, a method closely related to empirically finding first passage times. Many papers have presented evidence about the asymmetry, where it appears and where it does not. Also, various interpretations and explanations for the results have been suggested. In this chapter, we review the published results and explanations. We also examine the results and show that some are at best fragile. Similarly, we discuss the suggested explanations and propose a new model based on Gaussian mixtures. Apart from explaining the gain-loss asymmetry, this model also has the potential to explain other stylized facts such as volatility clustering, fat tails, and power law behavior of returns.
Power laws in citation distributions: evidence from Scopus.
Brzezinski, Michal
Modeling distributions of citations to scientific papers is crucial for understanding how science develops. However, there is a considerable empirical controversy on which statistical model fits the citation distributions best. This paper is concerned with rigorous empirical detection of power-law behaviour in the distribution of citations received by the most highly cited scientific papers. We have used a large, novel data set on citations to scientific papers published between 1998 and 2002 drawn from Scopus. The power-law model is compared with a number of alternative models using a likelihood ratio test. We have found that the power-law hypothesis is rejected for around half of the Scopus fields of science. For these fields of science, the Yule, power-law with exponential cut-off and log-normal distributions seem to fit the data better than the pure power-law model. On the other hand, when the power-law hypothesis is not rejected, it is usually empirically indistinguishable from most of the alternative models. The pure power-law model seems to be the best model only for the most highly cited papers in "Physics and Astronomy". Overall, our results seem to support theories implying that the most highly cited scientific papers follow the Yule, power-law with exponential cut-off or log-normal distribution. Our findings suggest also that power laws in citation distributions, when present, account only for a very small fraction of the published papers (less than 1 % for most of science fields) and that the power-law scaling parameter (exponent) is substantially higher (from around 3.2 to around 4.7) than found in the older literature.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Gallagher, Dennis L.; Craven, Paul D.; Comfort, Richard H.
1999-01-01
Over 40 years of ground and spacecraft plasmaspheric measurements have resulted in many statistical descriptions of plasmaspheric properties. In some cases, these properties have been represented as analytical descriptions that are valid for specific regions or conditions. For the most part, what has not been done is to extend regional empirical descriptions or models to the plasmasphere as a whole. In contrast, many related investigations depend on the use of representative plasmaspheric conditions throughout the inner magnetosphere. Wave propagation, involving the transport of energy through the magnetosphere, is strongly affected by thermal plasma density and its composition. Ring current collisional and wave particle losses also strongly depend on these quantities. Plasmaspheric also plays a secondary role in influencing radio signals from the Global Positioning System satellites. The Global Core Plasma Model (GCPM) is an attempt to assimilate previous empirical evidence and regional models for plasmaspheric density into a continuous, smooth model of thermal plasma density in the inner magnetosphere. In that spirit, the International Reference Ionosphere is currently used to complete the low altitude description of density and composition in the model. The models and measurements on which the GCPM is currently based and its relationship to IRI will be discussed.
Empirical membrane lifetime model for heavy duty fuel cell systems
NASA Astrophysics Data System (ADS)
Macauley, Natalia; Watson, Mark; Lauritzen, Michael; Knights, Shanna; Wang, G. Gary; Kjeang, Erik
2016-12-01
Heavy duty fuel cells used in transportation system applications such as transit buses expose the fuel cell membranes to conditions that can lead to lifetime-limiting membrane failure via combined chemical and mechanical degradation. Highly durable membranes and reliable predictive models are therefore needed in order to achieve the ultimate heavy duty fuel cell lifetime target of 25,000 h. In the present work, an empirical membrane lifetime model was developed based on laboratory data from a suite of accelerated membrane durability tests. The model considers the effects of cell voltage, temperature, oxygen concentration, humidity cycling, humidity level, and platinum in the membrane using inverse power law and exponential relationships within the framework of a general log-linear Weibull life-stress statistical distribution. The obtained model is capable of extrapolating the membrane lifetime from accelerated test conditions to use level conditions during field operation. Based on typical conditions for the Whistler, British Columbia fuel cell transit bus fleet, the model predicts a stack lifetime of 17,500 h and a membrane leak initiation time of 9200 h. Validation performed with the aid of a field operated stack confirmed the initial goal of the model to predict membrane lifetime within 20% of the actual operating time.
Statistical prediction of space motion sickness
NASA Technical Reports Server (NTRS)
Reschke, Millard F.
1990-01-01
Studies designed to empirically examine the etiology of motion sickness to develop a foundation for enhancing its prediction are discussed. Topics addressed include early attempts to predict space motion sickness, multiple test data base that uses provocative and vestibular function tests, and data base subjects; reliability of provocative tests of motion sickness susceptibility; prediction of space motion sickness using linear discriminate analysis; and prediction of space motion sickness susceptibility using the logistic model.
SEGR in SiO$${}_2$$ –Si$$_3$$ N$$_4$$ Stacks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Javanainen, Arto; Ferlet-Cavrois, Veronique; Bosser, Alexandre
2014-04-17
This work presents experimental SEGR data for MOS-devices, where the gate dielectrics are are made of stacked SiO 2–Si 3N 4 structures. Also a semi-empirical model for predicting the critical gate voltage in these structures under heavy-ion exposure is proposed. Then statistical interrelationship between SEGR cross-section data and simulated energy deposition probabilities in thin dielectric layers is discussed.
Ozone data and mission sampling analysis
NASA Technical Reports Server (NTRS)
Robbins, J. L.
1980-01-01
A methodology was developed to analyze discrete data obtained from the global distribution of ozone. Statistical analysis techniques were applied to describe the distribution of data variance in terms of empirical orthogonal functions and components of spherical harmonic models. The effects of uneven data distribution and missing data were considered. Data fill based on the autocorrelation structure of the data is described. Computer coding of the analysis techniques is included.
NASA Astrophysics Data System (ADS)
Ahmad, Sajid Rashid
With the understanding that far more research remains to be done on the development and use of innovative and functional geospatial techniques and procedures to investigate coastline changes this thesis focussed on the integration of remote sensing, geographical information systems (GIS) and modelling techniques to provide meaningful insights on the spatial and temporal dynamics of coastline changes. One of the unique strengths of this research was the parameterization of the GIS with long-term empirical and remote sensing data. Annual empirical data from 1941--2007 were analyzed by the GIS, and then modelled with statistical techniques. Data were also extracted from Landsat TM and ETM+ images. The band ratio method was used to extract the coastlines. Topographic maps were also used to extract digital map data. All data incorporated into ArcGIS 9.2 were analyzed with various modules, including Spatial Analyst, 3D Analyst, and Triangulated Irregular Networks. The Digital Shoreline Analysis System was used to analyze and predict rates of coastline change. GIS results showed the spatial locations along the coast that will either advance or retreat over time. The linear regression results highlighted temporal changes which are likely to occur along the coastline. Box-Jenkins modelling procedures were utilized to determine statistical models which best described the time series (1941--2007) of coastline change data. After several iterations and goodness-of-fit tests, second-order spatial cyclic autoregressive models, first-order autoregressive models and autoregressive moving average models were identified as being appropriate for describing the deterministic and random processes operating in Guyana's coastal system. The models highlighted not only cyclical patterns in advance and retreat of the coastline, but also the existence of short and long-term memory processes. Long-term memory processes could be associated with mudshoal propagation and stabilization while short-term memory processes were indicative of transitory hydrodynamic and other processes. An innovative framework for a spatio-temporal information-based system (STIBS) was developed. STIBS incorporated diverse datasets within a GIS, dynamic computer-based simulation models, and a spatial information query and graphical subsystem. Tests of the STIBS proved that it could be used to simulate and visualize temporal variability in shifting morphological states of the coastline.
Ipsen, Andreas
2017-02-03
Here, the mass peak centroid is a quantity that is at the core of mass spectrometry (MS). However, despite its central status in the field, models of its statistical distribution are often chosen quite arbitrarily and without attempts at establishing a proper theoretical justification for their use. Recent work has demonstrated that for mass spectrometers employing analog-to-digital converters (ADCs) and electron multipliers, the statistical distribution of the mass peak intensity can be described via a relatively simple model derived essentially from first principles. Building on this result, the following article derives the corresponding statistical distribution for the mass peak centroidsmore » of such instruments. It is found that for increasing signal strength, the centroid distribution converges to a Gaussian distribution whose mean and variance are determined by physically meaningful parameters and which in turn determine bias and variability of the m/z measurements of the instrument. Through the introduction of the concept of “pulse-peak correlation”, the model also elucidates the complicated relationship between the shape of the voltage pulses produced by the preamplifier and the mean and variance of the centroid distribution. The predictions of the model are validated with empirical data and with Monte Carlo simulations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ipsen, Andreas
Here, the mass peak centroid is a quantity that is at the core of mass spectrometry (MS). However, despite its central status in the field, models of its statistical distribution are often chosen quite arbitrarily and without attempts at establishing a proper theoretical justification for their use. Recent work has demonstrated that for mass spectrometers employing analog-to-digital converters (ADCs) and electron multipliers, the statistical distribution of the mass peak intensity can be described via a relatively simple model derived essentially from first principles. Building on this result, the following article derives the corresponding statistical distribution for the mass peak centroidsmore » of such instruments. It is found that for increasing signal strength, the centroid distribution converges to a Gaussian distribution whose mean and variance are determined by physically meaningful parameters and which in turn determine bias and variability of the m/z measurements of the instrument. Through the introduction of the concept of “pulse-peak correlation”, the model also elucidates the complicated relationship between the shape of the voltage pulses produced by the preamplifier and the mean and variance of the centroid distribution. The predictions of the model are validated with empirical data and with Monte Carlo simulations.« less
Stochastic Spatial Models in Ecology: A Statistical Physics Approach
NASA Astrophysics Data System (ADS)
Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.
2018-07-01
Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.
Combining forecast weights: Why and how?
NASA Astrophysics Data System (ADS)
Yin, Yip Chee; Kok-Haur, Ng; Hock-Eam, Lim
2012-09-01
This paper proposes a procedure called forecast weight averaging which is a specific combination of forecast weights obtained from different methods of constructing forecast weights for the purpose of improving the accuracy of pseudo out of sample forecasting. It is found that under certain specified conditions, forecast weight averaging can lower the mean squared forecast error obtained from model averaging. In addition, we show that in a linear and homoskedastic environment, this superior predictive ability of forecast weight averaging holds true irrespective whether the coefficients are tested by t statistic or z statistic provided the significant level is within the 10% range. By theoretical proofs and simulation study, we have shown that model averaging like, variance model averaging, simple model averaging and standard error model averaging, each produces mean squared forecast error larger than that of forecast weight averaging. Finally, this result also holds true marginally when applied to business and economic empirical data sets, Gross Domestic Product (GDP growth rate), Consumer Price Index (CPI) and Average Lending Rate (ALR) of Malaysia.
Stochastic Spatial Models in Ecology: A Statistical Physics Approach
NASA Astrophysics Data System (ADS)
Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.
2017-11-01
Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.
Tasker, Gary D.; Granato, Gregory E.
2000-01-01
Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques, and to use tools and techniques that account for the unique nature of water-resources data sets. Populations of data on stormwater-runoff quantity and quality are often best modeled as logarithmic transformations. Therefore, these factors need to be considered to form valid, current, and technically defensible stormwater-runoff models. Regression analysis is an accepted method for interpretation of water-resources data and for prediction of current or future conditions at sites that fit the input data model. Regression analysis is designed to provide an estimate of the average response of a system as it relates to variation in one or more known variables. To produce valid models, however, regression analysis should include visual analysis of scatterplots, an examination of the regression equation, evaluation of the method design assumptions, and regression diagnostics. A number of statistical techniques are described in the text and in the appendixes to provide information necessary to interpret data by use of appropriate methods. Uncertainty is an important part of any decisionmaking process. In order to deal with uncertainty problems, the analyst needs to know the severity of the statistical uncertainty of the methods used to predict water quality. Statistical models need to be based on information that is meaningful, representative, complete, precise, accurate, and comparable to be deemed valid, up to date, and technically supportable. To assess uncertainty in the analytical tools, the modeling methods, and the underlying data set, all of these components need be documented and communicated in an accessible format within project publications.
Effect of clustering on the emission of light charged particles
NASA Astrophysics Data System (ADS)
Kundu, Samir; Bhattacharya, C.; Rana, T. K.; Bhattacharya, S.; Pandey, R.; Banerjee, K.; Roy, Pratap; Meena, J. K.; Mukherjee, G.; Ghosh, T. K.; Mukhopadhyay, S.; Saha, A. K.; Sahoo, J. K.; Mandal Saha, R.; Srivastava, V.; Sinha, M.; Asgar, Md. A.
2018-04-01
Energy spectra of light charged particles emitted in the reaction p + {}^{27}Al → {}^{28}Si^{\\ast} have been studied and compared with statistical model calculation. Unlike 16O + 12C where a large deformation was observed in 28Si*, the energy spectra of α-particles were well explained by the statistical model calculation with standard "deformability parameters" obtained using the rotating liquid drop model. It seems that the α-clustering in the entrance channel causes extra deformation in 28Si* in the case of 16O + 12C, but the reanalysis of other published data shows that there are several cases where extra deformation was observed for composites produced via non-α-cluster entrance channels also. An empirical relation was found between mass-asymmetry in the entrance channel and deformation, which indicates that along with α-clustering, mass-asymmetry may also affect the emission of a light charged particle.
Uniting statistical and individual-based approaches for animal movement modelling.
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.
Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047
Predicting the stability of nanodevices
NASA Astrophysics Data System (ADS)
Lin, Z. Z.; Yu, W. F.; Wang, Y.; Ning, X. J.
2011-05-01
A simple model based on the statistics of single atoms is developed to predict the stability or lifetime of nanodevices without empirical parameters. Under certain conditions, the model produces the Arrhenius law and the Meyer-Neldel compensation rule. Compared with the classical molecular-dynamics simulations for predicting the stability of monatomic carbon chain at high temperature, the model is proved to be much more accurate than the transition state theory. Based on the ab initio calculation of the static potential, the model can give out a corrected lifetime of monatomic carbon and gold chains at higher temperature, and predict that the monatomic chains are very stable at room temperature.
Punctuated equilibrium dynamics in human communications
NASA Astrophysics Data System (ADS)
Peng, Dan; Han, Xiao-Pu; Wei, Zong-Wen; Wang, Bing-Hong
2015-10-01
A minimal model based on network incorporating individual interactions is proposed to study the non-Poisson statistical properties of human behavior: individuals in system interact with their neighbors, the probability of an individual acting correlates to its activity, and all the individuals involved in action will change their activities randomly. The model reproduces varieties of spatial-temporal patterns observed in empirical studies of human daily communications, providing insight into various human activities and embracing a range of realistic social interacting systems, particularly, intriguing bimodal phenomenon. This model bridges priority queueing theory and punctuated equilibrium dynamics, and our modeling and analysis is likely to shed light on non-Poisson phenomena in many complex systems.
Interest-Driven Model for Human Dynamics
NASA Astrophysics Data System (ADS)
Shang, Ming-Sheng; Chen, Guan-Xiong; Dai, Shuang-Xing; Wang, Bing-Hong; Zhou, Tao
2010-04-01
Empirical observations indicate that the interevent time distribution of human actions exhibits heavy-tailed features. The queuing model based on task priorities is to some extent successful in explaining the origin of such heavy tails, however, it cannot explain all the temporal statistics of human behavior especially for the daily entertainments. We propose an interest-driven model, which can reproduce the power-law distribution of interevent time. The exponent can be analytically obtained and is in good accordance with the simulations. This model well explains the observed relationship between activities and power-law exponents, as reported recently for web-based behavior and the instant message communications.
An accurate behavioral model for single-photon avalanche diode statistical performance simulation
NASA Astrophysics Data System (ADS)
Xu, Yue; Zhao, Tingchen; Li, Ding
2018-01-01
An accurate behavioral model is presented to simulate important statistical performance of single-photon avalanche diodes (SPADs), such as dark count and after-pulsing noise. The derived simulation model takes into account all important generation mechanisms of the two kinds of noise. For the first time, thermal agitation, trap-assisted tunneling and band-to-band tunneling mechanisms are simultaneously incorporated in the simulation model to evaluate dark count behavior of SPADs fabricated in deep sub-micron CMOS technology. Meanwhile, a complete carrier trapping and de-trapping process is considered in afterpulsing model and a simple analytical expression is derived to estimate after-pulsing probability. In particular, the key model parameters of avalanche triggering probability and electric field dependence of excess bias voltage are extracted from Geiger-mode TCAD simulation and this behavioral simulation model doesn't include any empirical parameters. The developed SPAD model is implemented in Verilog-A behavioral hardware description language and successfully operated on commercial Cadence Spectre simulator, showing good universality and compatibility. The model simulation results are in a good accordance with the test data, validating high simulation accuracy.
NASA Astrophysics Data System (ADS)
Halbrügge, Marc
2010-12-01
This paper describes the creation of a cognitive model submitted to the ‘Dynamic Stocks and Flows’ (DSF) modeling challenge. This challenge aims at comparing computational cognitive models for human behavior during an open ended control task. Participants in the modeling competition were provided with a simulation environment and training data for benchmarking their models while the actual specification of the competition task was withheld. To meet this challenge, the cognitive model described here was designed and optimized for generalizability. Only two simple assumptions about human problem solving were used to explain the empirical findings of the training data. In-depth analysis of the data set prior to the development of the model led to the dismissal of correlations or other parametric statistics as goodness-of-fit indicators. A new statistical measurement based on rank orders and sequence matching techniques is being proposed instead. This measurement, when being applied to the human sample, also identifies clusters of subjects that use different strategies for the task. The acceptability of the fits achieved by the model is verified using permutation tests.
Modeling and forecasting the distribution of Vibrio vulnificus in Chesapeake Bay
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacobs, John M.; Rhodes, M.; Brown, C. W.
The aim is to construct statistical models to predict the presence, abundance and potential virulence of Vibrio vulnificus in surface waters. A variety of statistical techniques were used in concert to identify water quality parameters associated with V. vulnificus presence, abundance and virulence markers in the interest of developing strong predictive models for use in regional oceanographic modeling systems. A suite of models are provided to represent the best model fit and alternatives using environmental variables that allow them to be put to immediate use in current ecological forecasting efforts. Conclusions: Environmental parameters such as temperature, salinity and turbidity aremore » capable of accurately predicting abundance and distribution of V. vulnificus in Chesapeake Bay. Forcing these empirical models with output from ocean modeling systems allows for spatially explicit forecasts for up to 48 h in the future. This study uses one of the largest data sets compiled to model Vibrio in an estuary, enhances our understanding of environmental correlates with abundance, distribution and presence of potentially virulent strains and offers a method to forecast these pathogens that may be replicated in other regions.« less
Empirical Reference Distributions for Networks of Different Size
Smith, Anna; Calder, Catherine A.; Browning, Christopher R.
2016-01-01
Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although “normalized” versions of some network statistics exist, we demonstrate via simulation why direct comparison is often inappropriate. We consider normalizing network statistics relative to a simple fully parameterized reference distribution and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of ecological networks derived from the Los Angeles Family and Neighborhood Survey activity location data. PMID:27721556
Etzioni, Ruth; Gulati, Roman
2013-04-01
In our article about limitations of basing screening policy on screening trials, we offered several examples of ways in which modeling, using data from large screening trials and population trends, provided insights that differed somewhat from those based only on empirical trial results. In this editorial, we take a step back and consider the general question of whether randomized screening trials provide the strongest evidence for clinical guidelines concerning population screening programs. We argue that randomized trials provide a process that is designed to protect against certain biases but that this process does not guarantee that inferences based on empirical results from screening trials will be unbiased. Appropriate quantitative methods are key to obtaining unbiased inferences from screening trials. We highlight several studies in the statistical literature demonstrating that conventional survival analyses of screening trials can be misleading and list a number of key questions concerning screening harms and benefits that cannot be answered without modeling. Although we acknowledge the centrality of screening trials in the policy process, we maintain that modeling constitutes a powerful tool for screening trial interpretation and screening policy development.
NASA Astrophysics Data System (ADS)
Ribeiro, Haroldo V.
2015-03-01
Since the seminal works of Wilson and Kelling [1] in 1982, the "broken windows theory" seems to have been widely accepted among the criminologists and, in fact, empirical findings actually point out that criminals tend to return to previously visited locations. Crime has always been part of the urban society's agenda and has also attracted the attention of scholars from social sciences ever since. Furthermore, over the past six decades the world has experienced a quick and notorious urbanization process: by the eighties the urban population was about 40% of total population, and today more than half (54%) of the world population is urban [2]. The urbanization has brought us many benefits such as better working opportunities and health care, but has also created several problems such as pollution and a considerable rise in the criminal activities. In this context of urban problems, crime deserves a special attention because there is a huge necessity of empirical and mathematical (modeling) investigations which, apart from the natural academic interest, may find direct implications for the organization of our society by improving political decisions and resource allocation.
Variety and volatility in financial markets
NASA Astrophysics Data System (ADS)
Lillo, Fabrizio; Mantegna, Rosario N.
2000-11-01
We study the price dynamics of stocks traded in a financial market by considering the statistical properties of both a single time series and an ensemble of stocks traded simultaneously. We use the n stocks traded on the New York Stock Exchange to form a statistical ensemble of daily stock returns. For each trading day of our database, we study the ensemble return distribution. We find that a typical ensemble return distribution exists in most of the trading days with the exception of crash and rally days and of the days following these extreme events. We analyze each ensemble return distribution by extracting its first two central moments. We observe that these moments fluctuate in time and are stochastic processes, themselves. We characterize the statistical properties of ensemble return distribution central moments by investigating their probability density functions and temporal correlation properties. In general, time-averaged and portfolio-averaged price returns have different statistical properties. We infer from these differences information about the relative strength of correlation between stocks and between different trading days. Last, we compare our empirical results with those predicted by the single-index model and we conclude that this simple model cannot explain the statistical properties of the second moment of the ensemble return distribution.
Sampling methods to the statistical control of the production of blood components.
Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo
2017-12-01
The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.
Komilis, Dimitrios; Evangelou, Alexandros; Giannakis, Georgios; Lymperis, Constantinos
2012-03-01
In this work, the elemental content (C, N, H, S, O), the organic matter content and the calorific value of various organic components that are commonly found in the municipal solid waste stream were measured. The objective of this work was to develop an empirical equation to describe the calorific value of the organic fraction of municipal solid waste as a function of its elemental composition. The MSW components were grouped into paper wastes, food wastes, yard wastes and plastics. Sample sizes ranged from 0.2 to 0.5 kg. In addition to the above individual components, commingled municipal solid wastes were sampled from a bio-drying facility located in Crete (sample sizes ranged from 8 to 15 kg) and were analyzed for the same parameters. Based on the results of this work, an improved empirical model was developed that revealed that carbon, hydrogen and oxygen were the only statistically significant predictors of calorific value. Total organic carbon was statistically similar to total carbon for most materials in this work. The carbon to organic matter ratio of 26 municipal solid waste substrates and of 18 organic composts varied from 0.40 to 0.99. An approximate chemical empirical formula calculated for the organic fraction of commingled municipal solid wastes was C(32)NH(55)O(16). Copyright © 2011 Elsevier Ltd. All rights reserved.
Empirical methods for assessing meaningful neuropsychological change following epilepsy surgery.
Sawrie, S M; Chelune, G J; Naugle, R I; Lüders, H O
1996-11-01
Traditional methods for assessing the neurocognitive effects of epilepsy surgery are confounded by practice effects, test-retest reliability issues, and regression to the mean. This study employs 2 methods for assessing individual change that allow direct comparison of changes across both individuals and test measures. Fifty-one medically intractable epilepsy patients completed a comprehensive neuropsychological battery twice, approximately 8 months apart, prior to any invasive monitoring or surgical intervention. First, a Reliable Change (RC) index score was computed for each test score to take into account the reliability of that measure, and a cutoff score was empirically derived to establish the limits of statistically reliable change. These indices were subsequently adjusted for expected practice effects. The second approach used a regression technique to establish "change norms" along a common metric that models both expected practice effects and regression to the mean. The RC index scores provide the clinician with a statistical means of determining whether a patient's retest performance is "significantly" changed from baseline. The regression norms for change allow the clinician to evaluate the magnitude of a given patient's change on 1 or more variables along a common metric that takes into account the reliability and stability of each test measure. Case data illustrate how these methods provide an empirically grounded means for evaluating neurocognitive outcomes following medical interventions such as epilepsy surgery.
NASA Astrophysics Data System (ADS)
Tonitto, C.; Gurwick, N. P.
2012-12-01
Policy initiatives to reduce greenhouse gas emissions (GHG) have promoted the development of agricultural management protocols to increase SOC storage and reduce GHG emissions. We review approaches for quantifying N2O flux from agricultural landscapes. We summarize the temporal and spatial extent of observations across representative soil classes, climate zones, cropping systems, and management scenarios. We review applications of simulation and empirical modeling approaches and compare validation outcomes across modeling tools. Subsequently, we review current model application in agricultural management protocols. In particular, we compare approaches adapted for compliance with the California Global Warming Solutions Act, the Alberta Climate Change and Emissions Management Act, and by the American Carbon Registry. In the absence of regional data to drive model development, policies that require GHG quantification often use simple empirical models based on highly aggregated data of N2O flux as a function of applied N - Tier 1 models according to IPCC categorization. As participants in development of protocols that could be used in carbon offset markets, we observed that stakeholders outside of the biogeochemistry community favored outcomes from simulation modeling (Tier 3) rather than empirical modeling (Tier 2). In contrast, scientific advisors were more accepting of outcomes based on statistical approaches that rely on local observations, and their views sometimes swayed policy practitioners over the course of policy development. Both Tier 2 and Tier 3 approaches have been implemented in current policy development, and it is important that the strengths and limitations of both approaches, in the face of available data, be well-understood by those drafting and adopting policies and protocols. The reliability of all models is contingent on sufficient observations for model development and validation. Simulation models applied without site-calibration generally result in poor validation results, and this point particularly needs to be emphasized during policy development. For cases where sufficient calibration data are available, simulation models have demonstrated the ability to capture seasonal patterns of N2O flux. The reliability of statistical models likewise depends on data availability. Because soil moisture is a significant driver of N2O flux, the best outcomes occur when empirical models are applied to systems with relevant soil classification and climate. The structure of current carbon offset protocols is not well-aligned with a budgetary approach to GHG accounting. Current protocols credit field-scale reduction in N2O flux as a result of reduced fertilizer use. Protocols do not award farmers credit for reductions in CO2 emissions resulting from reduced production of synthetic N fertilizer. To achieve the greatest GHG emission reductions through reduced synthetic N production and reduced landscape N saturation requires a re-envisioning of the agricultural landscape to include cropping systems with legume and manure N sources. The current focus on on-farm GHG sources focuses credits on simple reductions of N applied in conventional systems rather than on developing cropping systems which promote higher recycling and retention of N.
Statistical self-similarity of width function maxima with implications to floods
Veitzer, S.A.; Gupta, V.K.
2001-01-01
Recently a new theory of random self-similar river networks, called the RSN model, was introduced to explain empirical observations regarding the scaling properties of distributions of various topologic and geometric variables in natural basins. The RSN model predicts that such variables exhibit statistical simple scaling, when indexed by Horton-Strahler order. The average side tributary structure of RSN networks also exhibits Tokunaga-type self-similarity which is widely observed in nature. We examine the scaling structure of distributions of the maximum of the width function for RSNs for nested, complete Strahler basins by performing ensemble simulations. The maximum of the width function exhibits distributional simple scaling, when indexed by Horton-Strahler order, for both RSNs and natural river networks extracted from digital elevation models (DEMs). We also test a powerlaw relationship between Horton ratios for the maximum of the width function and drainage areas. These results represent first steps in formulating a comprehensive physical statistical theory of floods at multiple space-time scales for RSNs as discrete hierarchical branching structures. ?? 2001 Published by Elsevier Science Ltd.
NASA Astrophysics Data System (ADS)
Dickson, N. C.; Gierens, K. M.; Rogers, H. L.; Jones, R. L.
2010-02-01
The global observation, assimilation and prediction in numerical models of ice super-saturated (ISS) regions (ISSR) are crucial if the climate impact of aircraft condensations trails (contrails) is to be fully understood, and if, for example, contrail formation is to be avoided through aircraft operational measures. A robust assessment of the global distribution of ISSR will further this debate, and ISS event occurrence, frequency and spatial scales have recently attracted significant attention. The mean horizontal path length through ISSR as observed by MOZAIC aircraft is 150 km (±250 km). The average vertical thickness of ISS layers is 600-800 m (±575 m) but layers ranging from 25 m to 3000 m have been observed, with up to one third of ISS layers thought to be less than 100 m deep. Given their small scales compared to typical atmospheric model grid sizes, statistical representations of the spatial scales of ISSR are required, in both horizontal and vertical dimensions, if global occurrence of ISSR is to be adequately represented in climate models. This paper uses radiosonde launches made by the UK Meteorological Office, from the British Isles, Gibraltar, St. Helena and the Falkland Islands between January 2002 and December 2006, to investigate the probabilistic occurrence of ISSR. Specifically each radiosonde profile is divided into 50- and 100-hPa pressure layers, to emulate the coarse vertical resolution of some atmospheric models. Then the high resolution observations contained within each thick pressure layer are used to calculate an average relative humidity and an ISS fraction for each individual thick pressure layer. These relative humidity pressure layer descriptions are then linked through a probability function to produce an s-shaped curve describing the ISS fraction in any average relative humidity pressure layer. An empirical investigation has shown that this one curve is statistically valid for mid-latitude locations, irrespective of season and altitude, however, pressure layer depth is an important variable. Using this empirical understanding of the s-shaped relationship a mathematical model was developed to represent the ISS fraction within any arbitrary thick pressure layer. Here the statistical distributions of actual high resolution RHi observations in any thick pressure layer, along with an error function, are used to mathematically describe the s-shape. Two models were developed to represent both 50- and 100-hPa pressure layers with each reconstructing their respective s-shapes within 8-10% of the empirical curves. These new models can be used, to represent the small scale structures of ISS events, in modelled data where only low vertical resolution is available. This will be useful in understanding, and improving the global distribution, both observed and forecasted, of ice super-saturation.
Microscopic saw mark analysis: an empirical approach.
Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles
2015-01-01
Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Shevade, Abhijit V.; Ryan, Margaret A.; Homer, Margie L.; Zhou, Hanying; Manfreda, Allison M.; Lara, Liana M.; Yen, Shiao-Pin S.; Jewell, April D.; Manatt, Kenneth S.; Kisor, Adam K.
We have developed a Quantitative Structure-Activity Relationships (QSAR) based approach to correlate the response of chemical sensors in an array with molecular descriptors. A novel molecular descriptor set has been developed; this set combines descriptors of sensing film-analyte interactions, representing sensor response, with a basic analyte descriptor set commonly used in QSAR studies. The descriptors are obtained using a combination of molecular modeling tools and empirical and semi-empirical Quantitative Structure-Property Relationships (QSPR) methods. The sensors under investigation are polymer-carbon sensing films which have been exposed to analyte vapors at parts-per-million (ppm) concentrations; response is measured as change in film resistance. Statistically validated QSAR models have been developed using Genetic Function Approximations (GFA) for a sensor array for a given training data set. The applicability of the sensor response models has been tested by using it to predict the sensor activities for test analytes not considered in the training set for the model development. The validated QSAR sensor response models show good predictive ability. The QSAR approach is a promising computational tool for sensing materials evaluation and selection. It can also be used to predict response of an existing sensing film to new target analytes.
Empirical STORM-E Model. [I. Theoretical and Observational Basis
NASA Technical Reports Server (NTRS)
Mertens, Christopher J.; Xu, Xiaojing; Bilitza, Dieter; Mlynczak, Martin G.; Russell, James M., III
2013-01-01
Auroral nighttime infrared emission observed by the Sounding of the Atmosphere using Broadband Emission Radiometry (SABER) instrument onboard the Thermosphere-Ionosphere-Mesosphere Energetics and Dynamics (TIMED) satellite is used to develop an empirical model of geomagnetic storm enhancements to E-region peak electron densities. The empirical model is called STORM-E and will be incorporated into the 2012 release of the International Reference Ionosphere (IRI). The proxy for characterizing the E-region response to geomagnetic forcing is NO+(v) volume emission rates (VER) derived from the TIMED/SABER 4.3 lm channel limb radiance measurements. The storm-time response of the NO+(v) 4.3 lm VER is sensitive to auroral particle precipitation. A statistical database of storm-time to climatological quiet-time ratios of SABER-observed NO+(v) 4.3 lm VER are fit to widely available geomagnetic indices using the theoretical framework of linear impulse-response theory. The STORM-E model provides a dynamic storm-time correction factor to adjust a known quiescent E-region electron density peak concentration for geomagnetic enhancements due to auroral particle precipitation. Part II of this series describes the explicit development of the empirical storm-time correction factor for E-region peak electron densities, and shows comparisons of E-region electron densities between STORM-E predictions and incoherent scatter radar measurements. In this paper, Part I of the series, the efficacy of using SABER-derived NO+(v) VER as a proxy for the E-region response to solar-geomagnetic disturbances is presented. Furthermore, a detailed description of the algorithms and methodologies used to derive NO+(v) VER from SABER 4.3 lm limb emission measurements is given. Finally, an assessment of key uncertainties in retrieving NO+(v) VER is presented
A global empirical system for probabilistic seasonal climate prediction
NASA Astrophysics Data System (ADS)
Eden, J. M.; van Oldenborgh, G. J.; Hawkins, E.; Suckling, E. B.
2015-12-01
Preparing for episodes with risks of anomalous weather a month to a year ahead is an important challenge for governments, non-governmental organisations, and private companies and is dependent on the availability of reliable forecasts. The majority of operational seasonal forecasts are made using process-based dynamical models, which are complex, computationally challenging and prone to biases. Empirical forecast approaches built on statistical models to represent physical processes offer an alternative to dynamical systems and can provide either a benchmark for comparison or independent supplementary forecasts. Here, we present a simple empirical system based on multiple linear regression for producing probabilistic forecasts of seasonal surface air temperature and precipitation across the globe. The global CO2-equivalent concentration is taken as the primary predictor; subsequent predictors, including large-scale modes of variability in the climate system and local-scale information, are selected on the basis of their physical relationship with the predictand. The focus given to the climate change signal as a source of skill and the probabilistic nature of the forecasts produced constitute a novel approach to global empirical prediction. Hindcasts for the period 1961-2013 are validated against observations using deterministic (correlation of seasonal means) and probabilistic (continuous rank probability skill scores) metrics. Good skill is found in many regions, particularly for surface air temperature and most notably in much of Europe during the spring and summer seasons. For precipitation, skill is generally limited to regions with known El Niño-Southern Oscillation (ENSO) teleconnections. The system is used in a quasi-operational framework to generate empirical seasonal forecasts on a monthly basis.
An empirical system for probabilistic seasonal climate prediction
NASA Astrophysics Data System (ADS)
Eden, Jonathan; van Oldenborgh, Geert Jan; Hawkins, Ed; Suckling, Emma
2016-04-01
Preparing for episodes with risks of anomalous weather a month to a year ahead is an important challenge for governments, non-governmental organisations, and private companies and is dependent on the availability of reliable forecasts. The majority of operational seasonal forecasts are made using process-based dynamical models, which are complex, computationally challenging and prone to biases. Empirical forecast approaches built on statistical models to represent physical processes offer an alternative to dynamical systems and can provide either a benchmark for comparison or independent supplementary forecasts. Here, we present a simple empirical system based on multiple linear regression for producing probabilistic forecasts of seasonal surface air temperature and precipitation across the globe. The global CO2-equivalent concentration is taken as the primary predictor; subsequent predictors, including large-scale modes of variability in the climate system and local-scale information, are selected on the basis of their physical relationship with the predictand. The focus given to the climate change signal as a source of skill and the probabilistic nature of the forecasts produced constitute a novel approach to global empirical prediction. Hindcasts for the period 1961-2013 are validated against observations using deterministic (correlation of seasonal means) and probabilistic (continuous rank probability skill scores) metrics. Good skill is found in many regions, particularly for surface air temperature and most notably in much of Europe during the spring and summer seasons. For precipitation, skill is generally limited to regions with known El Niño-Southern Oscillation (ENSO) teleconnections. The system is used in a quasi-operational framework to generate empirical seasonal forecasts on a monthly basis.
Lovejoy, S; de Lima, M I P
2015-07-01
Over the range of time scales from about 10 days to 30-100 years, in addition to the familiar weather and climate regimes, there is an intermediate "macroweather" regime characterized by negative temporal fluctuation exponents: implying that fluctuations tend to cancel each other out so that averages tend to converge. We show theoretically and numerically that macroweather precipitation can be modeled by a stochastic weather-climate model (the Climate Extended Fractionally Integrated Flux, model, CEFIF) first proposed for macroweather temperatures and we show numerically that a four parameter space-time CEFIF model can approximately reproduce eight or so empirical space-time exponents. In spite of this success, CEFIF is theoretically and numerically difficult to manage. We therefore propose a simplified stochastic model in which the temporal behavior is modeled as a fractional Gaussian noise but the spatial behaviour as a multifractal (climate) cascade: a spatial extension of the recently introduced ScaLIng Macroweather Model, SLIMM. Both the CEFIF and this spatial SLIMM model have a property often implicitly assumed by climatologists that climate statistics can be "homogenized" by normalizing them with the standard deviation of the anomalies. Physically, it means that the spatial macroweather variability corresponds to different climate zones that multiplicatively modulate the local, temporal statistics. This simplified macroweather model provides a framework for macroweather forecasting that exploits the system's long range memory and spatial correlations; for it, the forecasting problem has been solved. We test this factorization property and the model with the help of three centennial, global scale precipitation products that we analyze jointly in space and in time.
2011-01-01
Background Meta-analysis is a popular methodology in several fields of medical research, including genetic association studies. However, the methods used for meta-analysis of association studies that report haplotypes have not been studied in detail. In this work, methods for performing meta-analysis of haplotype association studies are summarized, compared and presented in a unified framework along with an empirical evaluation of the literature. Results We present multivariate methods that use summary-based data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here avoid the inflation of the type I error rate that could be the result of the traditional approach of comparing a haplotype against the remaining ones, whereas, they can be fitted using standard software. Moreover, formal global tests are presented for assessing the statistical significance of the overall association. Although the methods presented here assume that the haplotypes are directly observed, they can be easily extended to allow for such an uncertainty by weighting the haplotypes by their probability. Conclusions An empirical evaluation of the published literature and a comparison against the meta-analyses that use single nucleotide polymorphisms, suggests that the studies reporting meta-analysis of haplotypes contain approximately half of the included studies and produce significant results twice more often. We show that this excess of statistically significant results, stems from the sub-optimal method of analysis used and, in approximately half of the cases, the statistical significance is refuted if the data are properly re-analyzed. Illustrative examples of code are given in Stata and it is anticipated that the methods developed in this work will be widely applied in the meta-analysis of haplotype association studies. PMID:21247440
Mair, Patrick; Hofmann, Eva; Gruber, Kathrin; Hatzinger, Reinhold; Zeileis, Achim; Hornik, Kurt
2015-01-01
One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation. PMID:26554005
Mair, Patrick; Hofmann, Eva; Gruber, Kathrin; Hatzinger, Reinhold; Zeileis, Achim; Hornik, Kurt
2015-12-01
One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation.
NASA Astrophysics Data System (ADS)
Wang, M.; Peng, Y.; Xie, X.; Liu, Y.
2017-12-01
Aerosol cloud interaction continues to constitute one of the most significant uncertainties for anthropogenic climate perturbations. The parameterization of cloud droplet size distribution and autoconversion process from large scale cloud to rain can influence the estimation of first and second aerosol indirect effects in global climate models. We design a series of experiments focusing on the microphysical cloud scheme of NCAR CAM5 (Community Atmospheric Model Version 5) in transient historical run with realistic sea surface temperature and sea ice. We investigate the effect of three empirical, two semi-empirical and one analytical expressions for droplet size distribution on cloud properties and explore the statistical relationships between aerosol optical thickness (AOT) and simulated cloud variables, including cloud top droplet effective radius (CDER), cloud optical depth (COD), cloud water path (CWP). We also introduce the droplet spectral shape parameter into the autoconversion process to incorporate the effect of droplet size distribution on second aerosol indirect effect. Three satellite datasets (MODIS Terra/ MODIS Aqua/ AVHRR) are used to evaluate the simulated aerosol indirect effect from the model. Evident CDER decreasing with significant AOT increasing is found in the east coast of China to the North Pacific Ocean and the east coast of USA to the North Atlantic Ocean. Analytical and semi-empirical expressions for spectral shape parameterization show stronger first aerosol indirect effect but weaker second aerosol indirect effect than empirical expressions because of the narrower droplet size distribution.
An Empirical Cumulus Parameterization Scheme for a Global Spectral Model
NASA Technical Reports Server (NTRS)
Rajendran, K.; Krishnamurti, T. N.; Misra, V.; Tao, W.-K.
2004-01-01
Realistic vertical heating and drying profiles in a cumulus scheme is important for obtaining accurate weather forecasts. A new empirical cumulus parameterization scheme based on a procedure to improve the vertical distribution of heating and moistening over the tropics is developed. The empirical cumulus parameterization scheme (ECPS) utilizes profiles of Tropical Rainfall Measuring Mission (TRMM) based heating and moistening derived from the European Centre for Medium- Range Weather Forecasts (ECMWF) analysis. A dimension reduction technique through rotated principal component analysis (RPCA) is performed on the vertical profiles of heating (Q1) and drying (Q2) over the convective regions of the tropics, to obtain the dominant modes of variability. Analysis suggests that most of the variance associated with the observed profiles can be explained by retaining the first three modes. The ECPS then applies a statistical approach in which Q1 and Q2 are expressed as a linear combination of the first three dominant principal components which distinctly explain variance in the troposphere as a function of the prevalent large-scale dynamics. The principal component (PC) score which quantifies the contribution of each PC to the corresponding loading profile is estimated through a multiple screening regression method which yields the PC score as a function of the large-scale variables. The profiles of Q1 and Q2 thus obtained are found to match well with the observed profiles. The impact of the ECPS is investigated in a series of short range (1-3 day) prediction experiments using the Florida State University global spectral model (FSUGSM, T126L14). Comparisons between short range ECPS forecasts and those with the modified Kuo scheme show a very marked improvement in the skill in ECPS forecasts. This improvement in the forecast skill with ECPS emphasizes the importance of incorporating realistic vertical distributions of heating and drying in the model cumulus scheme. This also suggests that in the absence of explicit models for convection, the proposed statistical scheme improves the modeling of the vertical distribution of heating and moistening in areas of deep convection.
Use of transport models for wildfire behavior simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Linn, R.R.; Harlow, F.H.
1998-01-01
Investigators have attempted to describe the behavior of wildfires for over fifty years. Current models for numerical description are mainly algebraic and based on statistical or empirical ideas. The authors have developed a transport model called FIRETEC. The use of transport formulations connects the propagation rates to the full conservation equations for energy, momentum, species concentrations, mass, and turbulence. In this paper, highlights of the model formulation and results are described. The goal of the FIRETEC model is to describe most probable average behavior of wildfires in a wide variety of conditions. FIRETEC represents the essence of the combination ofmore » many small-scale processes without resolving each process in complete detail.« less
Explorations in Statistics: the Bootstrap
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2009-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This fourth installment of Explorations in Statistics explores the bootstrap. The bootstrap gives us an empirical approach to estimate the theoretical variability among possible values of a sample statistic such as the…
EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks.
Jenness, Samuel M; Goodreau, Steven M; Morris, Martina
2018-04-01
Package EpiModel provides tools for building, simulating, and analyzing mathematical models for the population dynamics of infectious disease transmission in R. Several classes of models are included, but the unique contribution of this software package is a general stochastic framework for modeling the spread of epidemics on networks. EpiModel integrates recent advances in statistical methods for network analysis (temporal exponential random graph models) that allow the epidemic modeling to be grounded in empirical data on contacts that can spread infection. This article provides an overview of both the modeling tools built into EpiModel , designed to facilitate learning for students new to modeling, and the application programming interface for extending package EpiModel , designed to facilitate the exploration of novel research questions for advanced modelers.
EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks
Jenness, Samuel M.; Goodreau, Steven M.; Morris, Martina
2018-01-01
Package EpiModel provides tools for building, simulating, and analyzing mathematical models for the population dynamics of infectious disease transmission in R. Several classes of models are included, but the unique contribution of this software package is a general stochastic framework for modeling the spread of epidemics on networks. EpiModel integrates recent advances in statistical methods for network analysis (temporal exponential random graph models) that allow the epidemic modeling to be grounded in empirical data on contacts that can spread infection. This article provides an overview of both the modeling tools built into EpiModel, designed to facilitate learning for students new to modeling, and the application programming interface for extending package EpiModel, designed to facilitate the exploration of novel research questions for advanced modelers. PMID:29731699
Linking Mechanics and Statistics in Epidermal Tissues
NASA Astrophysics Data System (ADS)
Kim, Sangwoo; Hilgenfeldt, Sascha
2015-03-01
Disordered cellular structures, such as foams, polycrystals, or living tissues, can be characterized by quantitative measurements of domain size and topology. In recent work, we showed that correlations between size and topology in 2D systems are sensitive to the shape (eccentricity) of the individual domains: From a local model of neighbor relations, we derived an analytical justification for the famous empirical Lewis law, confirming the theory with experimental data from cucumber epidermal tissue. Here, we go beyond this purely geometrical model and identify mechanical properties of the tissue as the root cause for the domain eccentricity and thus the statistics of tissue structure. The simple model approach is based on the minimization of an interfacial energy functional. Simulations with Surface Evolver show that the domain statistics depend on a single mechanical parameter, while parameter fluctuations from cell to cell play an important role in simultaneously explaining the shape distribution of cells. The simulations are in excellent agreement with experiments and analytical theory, and establish a general link between the mechanical properties of a tissue and its structure. The model is relevant to diagnostic applications in a variety of animal and plant tissues.
NASA Astrophysics Data System (ADS)
Baró, Jordi; Davidsen, Jörn
2018-03-01
The hypothesis of critical failure relates the presence of an ultimate stability point in the structural constitutive equation of materials to a divergence of characteristic scales in the microscopic dynamics responsible for deformation. Avalanche models involving critical failure have determined common universality classes for stick-slip processes and fracture. However, not all empirical failure processes exhibit the trademarks of criticality. The rheological properties of materials introduce dissipation, usually reproduced in conceptual models as a hardening of the coarse grained elements of the system. Here, we investigate the effects of transient hardening on (i) the activity rate and (ii) the statistical properties of avalanches. We find the explicit representation of transient hardening in the presence of generalized viscoelasticity and solve the corresponding mean-field model of fracture. In the quasistatic limit, the accelerated energy release is invariant with respect to rheology and the avalanche propagation can be reinterpreted in terms of a stochastic counting process. A single universality class can be defined from such analogy, and all statistical properties depend only on the distance to criticality. We also prove that interevent correlations emerge due to the hardening—even in the quasistatic limit—that can be interpreted as "aftershocks" and "foreshocks."
Forecasting of Radiation Belts: Results From the PROGRESS Project.
NASA Astrophysics Data System (ADS)
Balikhin, M. A.; Arber, T. D.; Ganushkina, N. Y.; Walker, S. N.
2017-12-01
Forecasting of Radiation Belts: Results from the PROGRESS Project. The overall goal of the PROGRESS project, funded in frame of EU Horizon2020 programme, is to combine first principles based models with the systems science methodologies to achieve reliable forecasts of the geo-space particle radiation environment.The PROGRESS incorporates three themes : The propagation of the solar wind to L1, Forecast of geomagnetic indices, and forecast of fluxes of energetic electrons within the magnetosphere. One of the important aspects of the PROGRESS project is the development of statistical wave models for magnetospheric waves that affect the dynamics of energetic electrons such as lower band chorus, hiss and equatorial noise. The error reduction ratio (ERR) concept has been used to optimise the set of solar wind and geomagnetic parameters for organisation of statistical wave models for these emissions. The resulting sets of parameters and statistical wave models will be presented and discussed. However the ERR analysis also indicates that the combination of solar wind and geomagnetic parameters accounts for only part of the variance of the emissions under investigation (lower band chorus, hiss and equatorial noise). In addition, advances in the forecast of fluxes of energetic electrons, exploiting empirical models and the first principles IMPTAM model achieved by the PROGRESS project is presented.
A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.
Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique
2018-01-01
This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.
IMPACT: a generic tool for modelling and simulating public health policy.
Ainsworth, J D; Carruthers, E; Couch, P; Green, N; O'Flaherty, M; Sperrin, M; Williams, R; Asghar, Z; Capewell, S; Buchan, I E
2011-01-01
Populations are under-served by local health policies and management of resources. This partly reflects a lack of realistically complex models to enable appraisal of a wide range of potential options. Rising computing power coupled with advances in machine learning and healthcare information now enables such models to be constructed and executed. However, such models are not generally accessible to public health practitioners who often lack the requisite technical knowledge or skills. To design and develop a system for creating, executing and analysing the results of simulated public health and healthcare policy interventions, in ways that are accessible and usable by modellers and policy-makers. The system requirements were captured and analysed in parallel with the statistical method development for the simulation engine. From the resulting software requirement specification the system architecture was designed, implemented and tested. A model for Coronary Heart Disease (CHD) was created and validated against empirical data. The system was successfully used to create and validate the CHD model. The initial validation results show concordance between the simulation results and the empirical data. We have demonstrated the ability to connect health policy-modellers and policy-makers in a unified system, thereby making population health models easier to share, maintain, reuse and deploy.
Jesus, Tiago S; Papadimitriou, Christina; Pinho, Cátia S; Hoenig, Helen
2018-06-01
To characterize the peer-reviewed quality improvement (QI) literature in rehabilitation. Five electronic databases were searched for English-language articles from 2010 to 2016. Keywords for QI and safety management were searched for in combination with keywords for rehabilitation content and journals. Secondary searches (eg, references-list scanning) were also performed. Two reviewers independently selected articles using working definitions of rehabilitation and QI study types; of 1016 references, 112 full texts were assessed for eligibility. Reported study characteristics including study focus, study setting, use of inferential statistics, stated limitations, and use of improvement cycles and theoretical models were extracted by 1 reviewer, with a second reviewer consulted whenever inferences or interpretation were involved. Fifty-nine empirical rehabilitation QI studies were found: 43 reporting on local QI activities, 7 reporting on QI effectiveness research, 8 reporting on QI facilitators or barriers, and 1 systematic review of a specific topic. The number of publications had significant yearly growth between 2010 and 2016 (P=.03). Among the 43 reports on local QI activities, 23.3% did not explicitly report any study limitations; 39.5% did not used inferential statistics to measure the QI impact; 95.3% did not cite/mention the appropriate reporting guidelines; only 18.6% reported multiple QI cycles; just over 50% reported using a model to guide the QI activity; and only 7% reported the use of a particular theoretical model. Study sites and focuses were diverse; however, nearly a third (30.2%) examined early mobilization in intensive care units. The number of empirical, peer-reviewed rehabilitation QI publications is growing but remains a tiny fraction of rehabilitation research publications. Rehabilitation QI studies could be strengthened by greater use of extant models and theory to guide the QI work, consistent reporting of study limitations, and use of inferential statistics. Copyright © 2017 American Congress of Rehabilitation Medicine. All rights reserved.
So, Jiyeon; Jeong, Se-Hoon; Hwang, Yoori
2017-04-01
The extant empirical research examining the effectiveness of statistical and exemplar-based health information is largely inconsistent. Under the premise that the inconsistency may be due to an unacknowledged moderator (O'Keefe, 2002), this study examined a moderating role of outcome-relevant involvement (Johnson & Eagly, 1989) in the effects of statistical and exemplified risk information on risk perception. Consistent with predictions based on elaboration likelihood model (Petty & Cacioppo, 1984), findings from an experiment (N = 237) concerning alcohol consumption risks showed that statistical risk information predicted risk perceptions of individuals with high, rather than low, involvement, while exemplified risk information predicted risk perceptions of those with low, rather than high, involvement. Moreover, statistical risk information contributed to negative attitude toward drinking via increased risk perception only for highly involved individuals, while exemplified risk information influenced the attitude through the same mechanism only for individuals with low involvement. Theoretical and practical implications for health risk communication are discussed.
Dendritic growth model of multilevel marketing
NASA Astrophysics Data System (ADS)
Pang, James Christopher S.; Monterola, Christopher P.
2017-02-01
Biologically inspired dendritic network growth is utilized to model the evolving connections of a multilevel marketing (MLM) enterprise. Starting from agents at random spatial locations, a network is formed by minimizing a distance cost function controlled by a parameter, termed the balancing factor bf, that weighs the wiring and the path length costs of connection. The paradigm is compared to an actual MLM membership data and is shown to be successful in statistically capturing the membership distribution, better than the previously reported agent based preferential attachment or analytic branching process models. Moreover, it recovers the known empirical statistics of previously studied MLM, specifically: (i) a membership distribution characterized by the existence of peak levels indicating limited growth, and (ii) an income distribution obeying the 80 - 20 Pareto principle. Extensive types of income distributions from uniform to Pareto to a "winner-take-all" kind are also modeled by varying bf. Finally, the robustness of our dendritic growth paradigm to random agent removals is explored and its implications to MLM income distributions are discussed.
Data-driven modeling of background and mine-related acidity and metals in river basins
Friedel, Michael J
2013-01-01
A novel application of self-organizing map (SOM) and multivariate statistical techniques is used to model the nonlinear interaction among basin mineral-resources, mining activity, and surface-water quality. First, the SOM is trained using sparse measurements from 228 sample sites in the Animas River Basin, Colorado. The model performance is validated by comparing stochastic predictions of basin-alteration assemblages and mining activity at 104 independent sites. The SOM correctly predicts (>98%) the predominant type of basin hydrothermal alteration and presence (or absence) of mining activity. Second, application of the Davies–Bouldin criteria to k-means clustering of SOM neurons identified ten unique environmental groups. Median statistics of these groups define a nonlinear water-quality response along the spatiotemporal hydrothermal alteration-mining gradient. These results reveal that it is possible to differentiate among the continuum between inputs of background and mine-related acidity and metals, and it provides a basis for future research and empirical model development.
NASA Astrophysics Data System (ADS)
Ghnimi, Thouraya; Hassini, Lamine; Bagane, Mohamed
2016-12-01
The aim of this work is to determine the desorption isotherms and the drying kinetics of bay laurel leaves ( Laurus Nobilis L.). The desorption isotherms were performed at three temperature levels: 50, 60 and 70 °C and at water activity ranging from 0.057 to 0.88 using the statistic gravimetric method. Five sorption models were used to fit desorption experimental isotherm data. It was found that Kuhn model offers the best fitting of experimental moisture isotherms in the mentioned investigated ranges of temperature and water activity. The Net isosteric heat of water desorption was evaluated using The Clausius-Clapeyron equation and was then best correlated to equilibrium moisture content by the empirical Tsami's equation. Thin layer convective drying curves of bay laurel leaves were obtained for temperatures of 45, 50, 60 and 70 °C, relative humidity of 5, 15, 30 and 45 % and air velocities of 1, 1.5 and 2 m/s. A non linear regression procedure of Levenberg-Marquardt was used to fit drying curves with five semi empirical mathematical models available in the literature, The R2 and χ2 were used to evaluate the goodness of fit of models to data. Based on the experimental drying curves the drying characteristic curve (DCC) has been established and fitted with a third degree polynomial function. It was found that the Midilli Kucuk model was the best semi-empirical model describing thin layer drying kinetics of bay laurel leaves. The bay laurel leaves effective moisture diffusivity and activation energy were also identified.
NASA Astrophysics Data System (ADS)
Chen, Lei; Kou, Yingxin; Li, Zhanwu; Xu, An; Wu, Cheng
2018-01-01
We build a complex networks model of combat System-of-Systems (SoS) based on empirical data from a real war-game, this model is a combination of command & control (C2) subnetwork, sensors subnetwork, influencers subnetwork and logistical support subnetwork, each subnetwork has idiographic components and statistical characteristics. The C2 subnetwork is the core of whole combat SoS, it has a hierarchical structure with no modularity, of which robustness is strong enough to maintain normal operation after any two nodes is destroyed; the sensors subnetwork and influencers subnetwork are like sense organ and limbs of whole combat SoS, they are both flat modular networks of which degree distribution obey GEV distribution and power-law distribution respectively. The communication network is the combination of all subnetworks, it is an assortative Small-World network with core-periphery structure, the Intelligence & Communication Stations/Command Center integrated with C2 nodes in the first three level act as the hub nodes in communication network, and all the fourth-level C2 nodes, sensors, influencers and logistical support nodes have communication capability, they act as the periphery nodes in communication network, its degree distribution obeys exponential distribution in the beginning, Gaussian distribution in the middle, and power-law distribution in the end, and its path length obeys GEV distribution. The betweenness centrality distribution, closeness centrality distribution and eigenvector centrality are also been analyzed to measure the vulnerability of nodes.
Zhang, Kui; Wiener, Howard; Beasley, Mark; George, Varghese; Amos, Christopher I; Allison, David B
2006-08-01
Individual genome scans for quantitative trait loci (QTL) mapping often suffer from low statistical power and imprecise estimates of QTL location and effect. This lack of precision yields large confidence intervals for QTL location, which are problematic for subsequent fine mapping and positional cloning. In prioritizing areas for follow-up after an initial genome scan and in evaluating the credibility of apparent linkage signals, investigators typically examine the results of other genome scans of the same phenotype and informally update their beliefs about which linkage signals in their scan most merit confidence and follow-up via a subjective-intuitive integration approach. A method that acknowledges the wisdom of this general paradigm but formally borrows information from other scans to increase confidence in objectivity would be a benefit. We developed an empirical Bayes analytic method to integrate information from multiple genome scans. The linkage statistic obtained from a single genome scan study is updated by incorporating statistics from other genome scans as prior information. This technique does not require that all studies have an identical marker map or a common estimated QTL effect. The updated linkage statistic can then be used for the estimation of QTL location and effect. We evaluate the performance of our method by using extensive simulations based on actual marker spacing and allele frequencies from available data. Results indicate that the empirical Bayes method can account for between-study heterogeneity, estimate the QTL location and effect more precisely, and provide narrower confidence intervals than results from any single individual study. We also compared the empirical Bayes method with a method originally developed for meta-analysis (a closely related but distinct purpose). In the face of marked heterogeneity among studies, the empirical Bayes method outperforms the comparator.
NASA Astrophysics Data System (ADS)
Alexander, J. S.; McElroy, B. J.
2015-12-01
Bar forms in wide sandy rivers store sediment, control channel hydraulics, and are fundamental units of riverine ecosystems. Bar form height is often used as a measure of channel depth in ancient fluvial deposits and is also a crucially important measure of habitat quality in modern rivers. In the Great Plains of North America, priority bird species use emergent bars to nest, and sandbar heights are a direct predictor of flood hazard for bird nests. Our current understanding of controls on bar height are limited to few datasets and ad hoc observations from specific settings. We here examine a new dataset of bar heights and explore models of bar growth. We present bar a height dataset from the Platte and Niobrara Rivers in Nebraska, and an unchannelized reach of the Missouri River along the Nebraska-South Dakota border. Bar height data are normalized by flow frequency, and we examine parsimonious statistical models between expected controls (depth, stage, discharge, flow duration, work etc.) and maximum bar heights. From this we generate empirical-statistical models of maximum bar height for wide, sand-bedded rivers in the Great Plains of the United States and rivers of similar morphology elsewhere. Migration of bar forms is driven by downstream slip-face additions of sediment sourced from their stoss sides, but bars also sequester sediment and grow vertically and longitudinally. We explore our empirical data with a geometric-kinematic model of bar growth driven by sediment transport from smaller-scale bedforms. Our goal is to understand physical limitations on bar growth and geometry, with implications for interpreting the rock record and predicting physically-driven riverine habitat variables.
NASA Technical Reports Server (NTRS)
Smith, Eric A.; Mugnai, Alberto; Cooper, Harry J.; Tripoli, Gregory J.; Xiang, Xuwu
1992-01-01
The relationship between emerging microwave brightness temperatures (T(B)s) and vertically distributed mixtures of liquid and frozen hydrometeors was investigated, using a cloud-radiation model, in order to establish the framework for a hybrid statistical-physical rainfall retrieval algorithm. Although strong relationships were found between the T(B) values and various rain parameters, these correlations are misleading in that the T(B)s are largely controlled by fluctuations in the ice-particle mixing ratios, which in turn are highly correlated to fluctuations in liquid-particle mixing ratios. However, the empirically based T(B)-rain-rate (T(B)-RR) algorithms can still be used as tools for estimating precipitation if the hydrometeor profiles used for T(B)-RR algorithms are not specified in an ad hoc fashion.
Improving Marine Ecosystem Models with Biochemical Tracers
NASA Astrophysics Data System (ADS)
Pethybridge, Heidi R.; Choy, C. Anela; Polovina, Jeffrey J.; Fulton, Elizabeth A.
2018-01-01
Empirical data on food web dynamics and predator-prey interactions underpin ecosystem models, which are increasingly used to support strategic management of marine resources. These data have traditionally derived from stomach content analysis, but new and complementary forms of ecological data are increasingly available from biochemical tracer techniques. Extensive opportunities exist to improve the empirical robustness of ecosystem models through the incorporation of biochemical tracer data and derived indices, an area that is rapidly expanding because of advances in analytical developments and sophisticated statistical techniques. Here, we explore the trophic information required by ecosystem model frameworks (species, individual, and size based) and match them to the most commonly used biochemical tracers (bulk tissue and compound-specific stable isotopes, fatty acids, and trace elements). Key quantitative parameters derived from biochemical tracers include estimates of diet composition, niche width, and trophic position. Biochemical tracers also provide powerful insight into the spatial and temporal variability of food web structure and the characterization of dominant basal and microbial food web groups. A major challenge in incorporating biochemical tracer data into ecosystem models is scale and data type mismatches, which can be overcome with greater knowledge exchange and numerical approaches that transform, integrate, and visualize data.
Photoresist and stochastic modeling
NASA Astrophysics Data System (ADS)
Hansen, Steven G.
2018-01-01
Analysis of physical modeling results can provide unique insights into extreme ultraviolet stochastic variation, which augment, and sometimes refute, conclusions based on physical intuition and even wafer experiments. Simulations verify the primacy of "imaging critical" counting statistics (photons, electrons, and net acids) and the image/blur-dependent dose sensitivity in describing the local edge or critical dimension variation. But the failure of simple counting when resist thickness is varied highlights a limitation of this exact analytical approach, so a calibratable empirical model offers useful simplicity and convenience. Results presented here show that a wide range of physical simulation results can be well matched by an empirical two-parameter model based on blurred image log-slope (ILS) for lines/spaces and normalized ILS for holes. These results are largely consistent with a wide range of published experimental results; however, there is some disagreement with the recently published dataset of De Bisschop. The present analysis suggests that the origin of this model failure is an unexpected blurred ILS:dose-sensitivity relationship failure in that resist process. It is shown that a photoresist mechanism based on high photodecomposable quencher loading and high quencher diffusivity can give rise to pitch-dependent blur, which may explain the discrepancy.
Zipf’s Law Arises Naturally When There Are Underlying, Unobserved Variables
Corradi, Nicola
2016-01-01
Zipf’s law, which states that the probability of an observation is inversely proportional to its rank, has been observed in many domains. While there are models that explain Zipf’s law in each of them, those explanations are typically domain specific. Recently, methods from statistical physics were used to show that a fairly broad class of models does provide a general explanation of Zipf’s law. This explanation rests on the observation that real world data is often generated from underlying causes, known as latent variables. Those latent variables mix together multiple models that do not obey Zipf’s law, giving a model that does. Here we extend that work both theoretically and empirically. Theoretically, we provide a far simpler and more intuitive explanation of Zipf’s law, which at the same time considerably extends the class of models to which this explanation can apply. Furthermore, we also give methods for verifying whether this explanation applies to a particular dataset. Empirically, these advances allowed us extend this explanation to important classes of data, including word frequencies (the first domain in which Zipf’s law was discovered), data with variable sequence length, and multi-neuron spiking activity. PMID:27997544
Lu, Ji; Pan, Junhao; Zhang, Qiang; Dubé, Laurette; Ip, Edward H
2015-01-01
With intensively collected longitudinal data, recent advances in the experience-sampling method (ESM) benefit social science empirical research, but also pose important methodological challenges. As traditional statistical models are not generally well equipped to analyze a system of variables that contain feedback loops, this paper proposes the utility of an extended hidden Markov model to model reciprocal the relationship between momentary emotion and eating behavior. This paper revisited an ESM data set (Lu, Huet, & Dube, 2011) that observed 160 participants' food consumption and momentary emotions 6 times per day in 10 days. Focusing on the analyses on feedback loop between mood and meal-healthiness decision, the proposed reciprocal Markov model (RMM) can accommodate both hidden ("general" emotional states: positive vs. negative state) and observed states (meal: healthier, same or less healthy than usual) without presuming independence between observations and smooth trajectories of mood or behavior changes. The results of RMM analyses illustrated the reciprocal chains of meal consumption and mood as well as the effect of contextual factors that moderate the interrelationship between eating and emotion. A simulation experiment that generated data consistent with the empirical study further demonstrated that the procedure is promising in terms of recovering the parameters.
Bagby, R Michael; Widiger, Thomas A
2018-01-01
The Five-Factor Model (FFM) is a dimensional model of general personality structure, consisting of the domains of neuroticism (or emotional instability), extraversion versus introversion, openness (or unconventionality), agreeableness versus antagonism, and conscientiousness (or constraint). The FFM is arguably the most commonly researched dimensional model of general personality structure. However, a notable limitation of existing measures of the FFM has been a lack of coverage of its maladaptive variants. A series of self-report inventories has been developed to assess for the maladaptive personality traits that define Diagnostic and Statistical Manual of Mental Disorders (fifth edition; DSM-5) Section II personality disorders (American Psychiatric Association [APA], 2013) from the perspective of the FFM. In this paper, we provide an introduction to this Special Section, presenting the rationale and empirical support for these measures and placing them in the historical context of the recent revision to the APA diagnostic manual. This introduction is followed by 5 papers that provide further empirical support for these measures and address current issues within the personality assessment literature. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.
Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep
2017-01-01
The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section.
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.
Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao
2015-08-01
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.
Life Prediction of Large Lithium-Ion Battery Packs with Active and Passive Balancing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Ying; Smith, Kandler A; Zane, Regan
Lithium-ion battery packs take a major part of large-scale stationary energy storage systems. One challenge in reducing battery pack cost is to reduce pack size without compromising pack service performance and lifespan. Prognostic life model can be a powerful tool to handle the state of health (SOH) estimate and enable active life balancing strategy to reduce cell imbalance and extend pack life. This work proposed a life model using both empirical and physical-based approaches. The life model described the compounding effect of different degradations on the entire cell with an empirical model. Then its lower-level submodels considered the complex physicalmore » links between testing statistics (state of charge level, C-rate level, duty cycles, etc.) and the degradation reaction rates with respect to specific aging mechanisms. The hybrid approach made the life model generic, robust and stable regardless of battery chemistry and application usage. The model was validated with a custom pack with both passive and active balancing systems implemented, which created four different aging paths in the pack. The life model successfully captured the aging trajectories of all four paths. The life model prediction errors on capacity fade and resistance growth were within +/-3% and +/-5% of the experiment measurements.« less
What You Learn is What You See: Using Eye Movements to Study Infant Cross-Situational Word Learning
Smith, Linda
2016-01-01
Recent studies show that both adults and young children possess powerful statistical learning capabilities to solve the word-to-world mapping problem. However, the underlying mechanisms that make statistical learning possible and powerful are not yet known. With the goal of providing new insights into this issue, the research reported in this paper used an eye tracker to record the moment-by-moment eye movement data of 14-month-old babies in statistical learning tasks. Various measures are applied to such fine-grained temporal data, such as looking duration and shift rate (the number of shifts in gaze from one visual object to the other) trial by trial, showing different eye movement patterns between strong and weak statistical learners. Moreover, an information-theoretic measure is developed and applied to gaze data to quantify the degree of learning uncertainty trial by trial. Next, a simple associative statistical learning model is applied to eye movement data and these simulation results are compared with empirical results from young children, showing strong correlations between these two. This suggests that an associative learning mechanism with selective attention can provide a cognitively plausible model of cross-situational statistical learning. The work represents the first steps to use eye movement data to infer underlying real-time processes in statistical word learning. PMID:22213894
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2010-07-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2013-01-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286
NASA Astrophysics Data System (ADS)
Boning, Duane S.; Chung, James E.
1998-11-01
Advanced process technology will require more detailed understanding and tighter control of variation in devices and interconnects. The purpose of statistical metrology is to provide methods to measure and characterize variation, to model systematic and random components of that variation, and to understand the impact of variation on both yield and performance of advanced circuits. Of particular concern are spatial or pattern-dependencies within individual chips; such systematic variation within the chip can have a much larger impact on performance than wafer-level random variation. Statistical metrology methods will play an important role in the creation of design rules for advanced technologies. For example, a key issue in multilayer interconnect is the uniformity of interlevel dielectric (ILD) thickness within the chip. For the case of ILD thickness, we describe phases of statistical metrology development and application to understanding and modeling thickness variation arising from chemical-mechanical polishing (CMP). These phases include screening experiments including design of test structures and test masks to gather electrical or optical data, techniques for statistical decomposition and analysis of the data, and approaches to calibrating empirical and physical variation models. These models can be integrated with circuit CAD tools to evaluate different process integration or design rule strategies. One focus for the generation of interconnect design rules are guidelines for the use of "dummy fill" or "metal fill" to improve the uniformity of underlying metal density and thus improve the uniformity of oxide thickness within the die. Trade-offs that can be evaluated via statistical metrology include the improvements to uniformity possible versus the effect of increased capacitance due to additional metal.
Yilmaz, A Erdem; Boncukcuoğlu, Recep; Kocakerim, M Muhtar
2007-06-01
In this study, it was investigated parameters affecting energy consumption in boron removal from boron containing wastewaters prepared synthetically, via electrocoagulation method. The solution pH, initial boron concentration, dose of supporting electrolyte, current density and temperature of solution were selected as experimental parameters affecting energy consumption. The obtained experimental results showed that boron removal efficiency reached up to 99% under optimum conditions, in which solution pH was 8.0, current density 6.0 mA/cm(2), initial boron concentration 100mg/L and solution temperature 293 K. The current density was an important parameter affecting energy consumption too. High current density applied to electrocoagulation cell increased energy consumption. Increasing solution temperature caused to decrease energy consumption that high temperature decreased potential applied under constant current density. That increasing initial boron concentration and dose of supporting electrolyte caused to increase specific conductivity of solution decreased energy consumption. As a result, it was seen that energy consumption for boron removal via electrocoagulation method could be minimized at optimum conditions. An empirical model was predicted by statistically. Experimentally obtained values were fitted with values predicted from empirical model being as following; [formula in text]. Unfortunately, the conditions obtained for optimum boron removal were not the conditions obtained for minimum energy consumption. It was determined that support electrolyte must be used for increase boron removal and decrease electrical energy consumption.
From empirical Bayes to full Bayes : methods for analyzing traffic safety data.
DOT National Transportation Integrated Search
2004-10-24
Traffic safety engineers are among the early adopters of Bayesian statistical tools for : analyzing crash data. As in many other areas of application, empirical Bayes methods were : their first choice, perhaps because they represent an intuitively ap...
ERIC Educational Resources Information Center
Wagler, Amy E.; Lesser, Lawrence M.
2018-01-01
The interaction between language and the learning of statistical concepts has been receiving increased attention. The Communication, Language, And Statistics Survey (CLASS) was developed in response to the need to focus on dynamics of language in light of the culturally and linguistically diverse environments of introductory statistics classrooms.…
Schwindt, Adam R.; Winkelman, Dana L.
2016-01-01
Urban freshwater streams in arid climates are wastewater effluent dominated ecosystems particularly impacted by bioactive chemicals including steroid estrogens that disrupt vertebrate reproduction. However, more understanding of the population and ecological consequences of exposure to wastewater effluent is needed. We used empirically derived vital rate estimates from a mesocosm study to develop a stochastic stage-structured population model and evaluated the effect of 17α-ethinylestradiol (EE2), the estrogen in human contraceptive pills, on fathead minnow Pimephales promelas stochastic population growth rate. Tested EE2 concentrations ranged from 3.2 to 10.9 ng L−1 and produced stochastic population growth rates (λ S ) below 1 at the lowest concentration, indicating potential for population decline. Declines in λ S compared to controls were evident in treatments that were lethal to adult males despite statistically insignificant effects on egg production and juvenile recruitment. In fact, results indicated that λ S was most sensitive to the survival of juveniles and female egg production. More broadly, our results document that population model results may differ even when empirically derived estimates of vital rates are similar among experimental treatments, and demonstrate how population models integrate and project the effects of stressors throughout the life cycle. Thus, stochastic population models can more effectively evaluate the ecological consequences of experimentally derived vital rates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dierauf, Timothy; Kurtz, Sarah; Riley, Evan
This paper provides a recommended method for evaluating the AC capacity of a photovoltaic (PV) generating station. It also presents companion guidance on setting the facilitys capacity guarantee value. This is a principles-based approach that incorporates plant fundamental design parameters such as loss factors, module coefficients, and inverter constraints. This method has been used to prove contract guarantees for over 700 MW of installed projects. The method is transparent, and the results are deterministic. In contrast, current industry practices incorporate statistical regression where the empirical coefficients may only characterize the collected data. Though these methods may work well when extrapolationmore » is not required, there are other situations where the empirical coefficients may not adequately model actual performance.This proposed Fundamentals Approach method provides consistent results even where regression methods start to lose fidelity.« less
Atlas of susceptibility to pollution in marinas. Application to the Spanish coast.
Gómez, Aina G; Ondiviela, Bárbara; Fernández, María; Juanes, José A
2017-01-15
An atlas of susceptibility to pollution of 320 Spanish marinas is provided. Susceptibility is assessed through a simple, fast and low cost empirical method estimating the flushing capacity of marinas. The Complexity Tidal Range Index (CTRI) was selected among eleven empirical methods. The CTRI method was selected by means of statistical analyses because: it contributes to explain the system's variance; it is highly correlated to numerical model results; and, it is sensitive to marinas' location and typology. The process of implementation to the Spanish coast confirmed its usefulness, versatility and adaptability as a tool for the environmental management of marinas worldwide. The atlas of susceptibility, assessed through CTRI values, is an appropriate instrument to prioritize environmental and planning strategies at a regional scale. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wootten, A.; Dixon, K. W.; Lanzante, J. R.; Mcpherson, R. A.
2017-12-01
Empirical statistical downscaling (ESD) approaches attempt to refine global climate model (GCM) information via statistical relationships between observations and GCM simulations. The aim of such downscaling efforts is to create added-value climate projections by adding finer spatial detail and reducing biases. The results of statistical downscaling exercises are often used in impact assessments under the assumption that past performance provides an indicator of future results. Given prior research describing the danger of this assumption with regards to temperature, this study expands the perfect model experimental design from previous case studies to test the stationarity assumption with respect to precipitation. Assuming stationarity implies the performance of ESD methods are similar between the future projections and historical training. Case study results from four quantile-mapping based ESD methods demonstrate violations of the stationarity assumption for both central tendency and extremes of precipitation. These violations vary geographically and seasonally. For the four ESD methods tested the greatest challenges for downscaling of daily total precipitation projections occur in regions with limited precipitation and for extremes of precipitation along Southeast coastal regions. We conclude with a discussion of future expansion of the perfect model experimental design and the implications for improving ESD methods and providing guidance on the use of ESD techniques for impact assessments and decision-support.
Channel fading for mobile satellite communications using spread spectrum signaling and TDRSS
NASA Technical Reports Server (NTRS)
Jenkins, Jeffrey D.; Fan, Yiping; Osborne, William P.
1995-01-01
This paper will present some preliminary results from a propagation experiment which employed NASA's TDRSS and an 8 MHz chip rate spread spectrum signal. Channel fade statistics were measured and analyzed in 21 representative geographical locations covering urban/suburban, open plain, and forested areas. Cumulative distribution Functions (CDF's) of 12 individual locations are presented and classified based on location. Representative CDF's from each of these three types of terrain are summarized. These results are discussed, and the fade depths exceeded 10 percent of the time in three types of environments are tabulated. The spread spectrum fade statistics for tree-lined roads are compared with the Empirical Roadside Shadowing Model.
The Empirical Low Energy Ion Flux Model for the Terrestrial Magnetosphere
NASA Technical Reports Server (NTRS)
Blackwell, William C.; Minow, Joseph I.; Diekmann, Anne M.
2007-01-01
This document includes a viewgraph presentation plus the full paper presented at the conference. The Living With a Star Ion Flux Model (IFM) is a radiation environment risk mitigation tool that provides magnetospheric ion flux values for varying geomagnetic disturbance levels in the geospace environment. IFM incorporates flux observations from the Polar and Geotail spacecraft in a single statistical flux model. IFM is an engineering environment model which predicts the proton flux not only in the magnetosphere, but also in the solar wind and magnetosheath phenomenological regions. This paper describes the ion flux databases that allows for IFM output to be correlated with the geomagnetic activity level, as represented by the Kp index.
Fung, Monica; Kim, Jane; Marty, Francisco M; Schwarzinger, Michaël; Koo, Sophia
2015-01-01
Invasive fungal disease (IFD) causes significant morbidity and mortality in hematologic malignancy patients with high-risk febrile neutropenia (FN). These patients therefore often receive empirical antifungal therapy. Diagnostic test-guided pre-emptive antifungal therapy has been evaluated as an alternative treatment strategy in these patients. We conducted an electronic search for literature comparing empirical versus pre-emptive antifungal strategies in FN among adult hematologic malignancy patients. We systematically reviewed 9 studies, including randomized-controlled trials, cohort studies, and feasibility studies. Random and fixed-effect models were used to generate pooled relative risk estimates of IFD detection, IFD-related mortality, overall mortality, and rates and duration of antifungal therapy. Heterogeneity was measured via Cochran's Q test, I2 statistic, and between study τ2. Incorporating these parameters and direct costs of drugs and diagnostic testing, we constructed a comparative costing model for the two strategies. We conducted probabilistic sensitivity analysis on pooled estimates and one-way sensitivity analyses on other key parameters with uncertain estimates. Nine published studies met inclusion criteria. Compared to empirical antifungal therapy, pre-emptive strategies were associated with significantly lower antifungal exposure (RR 0.48, 95% CI 0.27-0.85) and duration without an increase in IFD-related mortality (RR 0.82, 95% CI 0.36-1.87) or overall mortality (RR 0.95, 95% CI 0.46-1.99). The pre-emptive strategy cost $324 less (95% credible interval -$291.88 to $418.65 pre-emptive compared to empirical) than the empirical approach per FN episode. However, the cost difference was influenced by relatively small changes in costs of antifungal therapy and diagnostic testing. Compared to empirical antifungal therapy, pre-emptive antifungal therapy in patients with high-risk FN may decrease antifungal use without increasing mortality. We demonstrate a state of economic equipoise between empirical and diagnostic-directed pre-emptive antifungal treatment strategies, influenced by small changes in cost of antifungal therapy and diagnostic testing, in the current literature. This work emphasizes the need for optimization of existing fungal diagnostic strategies, development of more efficient diagnostic strategies, and less toxic and more cost-effective antifungals.
Chenel, Marylore; Bouzom, François; Cazade, Fanny; Ogungbenro, Kayode; Aarons, Leon; Mentré, France
2008-12-01
To compare results of population PK analyses obtained with a full empirical design (FD) and an optimal sparse design (MD) in a Drug-Drug Interaction (DDI) study aiming to evaluate the potential CYP3A4 inhibitory effect of a drug in development, SX, on a reference substrate, midazolam (MDZ). Secondary aim was to evaluate the interaction of SX on MDZ in the in vivo study. Methods To compare designs, real data were analysed by population PK modelling technique using either FD or MD with NONMEM FOCEI for SX and with NONMEM FOCEI and MONOLIX SAEM for MDZ. When applicable a Wald test was performed to compare model parameter estimates, such as apparent clearance (CL/F), across designs. To conclude on the potential interaction of SX on MDZ PK, a Student paired test was applied to compare the individual PK parameters (i.e. log(AUC) and log(C(max))) obtained either by a non-compartmental approach (NCA) using FD or from empirical Bayes estimates (EBE) obtained after fitting the model separately on each treatment group using either FD or MD. For SX, whatever the design, CL/F was well estimated and no statistical differences were found between CL/F estimated values obtained with FD (CL/F = 8.2 l/h) and MD (CL/F = 8.2 l/h). For MDZ, only MONOLIX was able to estimate CL/F and to provide its standard error of estimation with MD. With MONOLIX, whatever the design and the administration setting, MDZ CL/F was well estimated and there were no statistical differences between CL/F estimated values obtained with FD (72 l/h and 40 l/h for MDZ alone and for MDZ with SX, respectively) and MD (77 l/h and 45 l/h for MDZ alone and for MDZ with SX, respectively). Whatever the approach, NCA or population PK modelling, and for the latter approach, whatever the design, MD or FD, comparison tests showed that there was a statistical difference (P < 0.0001) between individual MDZ log(AUC) obtained after MDZ administration alone and co-administered with SX. Regarding C(max), there was a statistical difference (P < 0.05) between individual MDZ log(C(max)) obtained under the 2 administration settings in all cases, except with the sparse design with MONOLIX. However, the effect on C(max) was small. Finally, SX was shown to be a moderate CYP3A4 inhibitor, which at therapeutic doses increased MDZ exposure by a factor of 2 in average and almost did not affect the C(max). The optimal sparse design enabled the estimation of CL/F of a CYP3A4 substrate and inhibitor when co-administered together and to show the interaction leading to the same conclusion as the full empirical design.
Chenel, Marylore; Bouzom, François; Cazade, Fanny; Ogungbenro, Kayode; Aarons, Leon; Mentré, France
2008-01-01
Purpose To compare results of population PK analyses obtained with a full empirical design (FD) and an optimal sparse design (MD) in a Drug-Drug Interaction (DDI) study aiming to evaluate the potential CYP3A4 inhibitory effect of a drug in development, SX, on a reference substrate, midazolam (MDZ). Secondary aim was to evaluate the interaction of SX on MDZ in the in vivo study. Methods To compare designs, real data were analysed by population PK modelling using either FD or MD with NONMEM FOCEI for SX and with NONMEM FOCEI and MONOLIX SAEM for MDZ. When applicable a Wald’s test was performed to compare model parameter estimates, such as apparent clearance (CL/F), across designs. To conclude on the potential interaction of SX on MDZ PK, a Student paired test was applied to compare the individual PK parameters (i.e. log(AUC) and log(Cmax)) obtained either by a non-compartmental approach (NCA) using FD or from empirical Bayes estimates (EBE) obtained after fitting the model separately on each treatment group using either FD or MD. Results For SX, whatever the design, CL/F was well estimated and no statistical differences were found between CL/F estimated values obtained with FD (CL/F = 8.2 L/h) and MD (CL/F = 8.2 L/h). For MDZ, only MONOLIX was able to estimate CL/F and to provide its standard error of estimation with MD. With MONOLIX, whatever the design and the administration setting, MDZ CL/F was well estimated and there were no statistical differences between CL/F estimated values obtained with FD (72 L/h and 40 L/h for MDZ alone and for MDZ with SX, respectively) and MD (77 L/h and 45 L/h for MDZ alone and for MDZ with SX, respectively). Whatever the approach, NCA or population PK modelling, and for the latter approach, whatever the design, MD or FD, comparison tests showed that there was a statistical difference (p<0.0001) between individual MDZ log(AUC) obtained after MDZ administration alone and co-administered with SX. Regarding Cmax, there was a statistical difference (p<0.05) between individual MDZ log(Cmax) obtained under the 2 administration settings in all cases, except with the sparse design with MONOLIX. However, the effect on Cmax was small. Finally, SX was shown to be a moderate CYP3A4 inhibitor, which at therapeutic doses increased MDZ exposure by a factor 2 in average and almost did not affect the Cmax. Conclusion The optimal sparse design enabled the estimation of CL/F of a CYP3A4 substrate and inhibitor when co-administered together and to show the interaction leading to the same conclusion than the full empirical design. PMID:19130187
NASA Astrophysics Data System (ADS)
Neumann, D. W.; Zagona, E. A.; Rajagopalan, B.
2005-12-01
Warm summer stream temperatures due to low flows and high air temperatures are a critical water quality problem in many western U.S. river basins because they impact threatened fish species' habitat. Releases from storage reservoirs and river diversions are typically driven by human demands such as irrigation, municipal and industrial uses and hydropower production. Historically, fish needs have not been formally incorporated in the operating procedures, which do not supply adequate flows for fish in the warmest, driest periods. One way to address this problem is for local and federal organizations to purchase water rights to be used to increase flows, hence decrease temperatures. A statistical model-predictive technique for efficient and effective use of a limited supply of fish water has been developed and incorporated in a Decision Support System (DSS) that can be used in an operations mode to effectively use water acquired to mitigate warm stream temperatures. The DSS is a rule-based system that uses the empirical, statistical predictive model to predict maximum daily stream temperatures based on flows that meet the non-fish operating criteria, and to compute reservoir releases of allocated fish water when predicted temperatures exceed fish habitat temperature targets with a user specified confidence of the temperature predictions. The empirical model is developed using a step-wise linear regression procedure to select significant predictors, and includes the computation of a prediction confidence interval to quantify the uncertainty of the prediction. The DSS also includes a strategy for managing a limited amount of water throughout the season based on degree-days in which temperatures are allowed to exceed the preferred targets for a limited number of days that can be tolerated by the fish. The DSS is demonstrated by an example application to the Truckee River near Reno, Nevada using historical flows from 1988 through 1994. In this case, the statistical model predicts maximum daily Truckee River stream temperatures in June, July, and August using predicted maximum daily air temperature and modeled average daily flow. The empirical relationship was created using a step-wise linear regression selection process using 1993 and 1994 data. The adjusted R2 value for this relationship is 0.91. The model is validated using historic data and demonstrated in a predictive mode with a prediction confidence interval to quantify the uncertainty. Results indicate that the DSS could substantially reduce the number of target temperature violations, i.e., stream temperatures exceeding the target temperature levels detrimental to fish habitat. The results show that large volumes of water are necessary to meet a temperature target with a high degree of certainty and violations may still occur if all of the stored water is depleted. A lower degree of certainty requires less water but there is a higher probability that the temperature targets will be exceeded. Addition of the rules that consider degree-days resulted in a reduction of the number of temperature violations without increasing the amount of water used. This work is described in detail in publications referenced in the URL below.
Inferring the parameters of a Markov process from snapshots of the steady state
NASA Astrophysics Data System (ADS)
Dettmer, Simon L.; Berg, Johannes
2018-02-01
We seek to infer the parameters of an ergodic Markov process from samples taken independently from the steady state. Our focus is on non-equilibrium processes, where the steady state is not described by the Boltzmann measure, but is generally unknown and hard to compute, which prevents the application of established equilibrium inference methods. We propose a quantity we call propagator likelihood, which takes on the role of the likelihood in equilibrium processes. This propagator likelihood is based on fictitious transitions between those configurations of the system which occur in the samples. The propagator likelihood can be derived by minimising the relative entropy between the empirical distribution and a distribution generated by propagating the empirical distribution forward in time. Maximising the propagator likelihood leads to an efficient reconstruction of the parameters of the underlying model in different systems, both with discrete configurations and with continuous configurations. We apply the method to non-equilibrium models from statistical physics and theoretical biology, including the asymmetric simple exclusion process (ASEP), the kinetic Ising model, and replicator dynamics.
Improving PAGER's real-time earthquake casualty and loss estimation toolkit: a challenge
Jaiswal, K.S.; Wald, D.J.
2012-01-01
We describe the on-going developments of PAGER’s loss estimation models, and discuss value-added web content that can be generated related to exposure, damage and loss outputs for a variety of PAGER users. These developments include identifying vulnerable building types in any given area, estimating earthquake-induced damage and loss statistics by building type, and developing visualization aids that help locate areas of concern for improving post-earthquake response efforts. While detailed exposure and damage information is highly useful and desirable, significant improvements are still necessary in order to improve underlying building stock and vulnerability data at a global scale. Existing efforts with the GEM’s GED4GEM and GVC consortia will help achieve some of these objectives. This will benefit PAGER especially in regions where PAGER’s empirical model is less-well constrained; there, the semi-empirical and analytical models will provide robust estimates of damage and losses. Finally, we outline some of the challenges associated with rapid casualty and loss estimation that we experienced while responding to recent large earthquakes worldwide.
Mediation analysis in nursing research: a methodological review.
Liu, Jianghong; Ulrich, Connie
2016-12-01
Mediation statistical models help clarify the relationship between independent predictor variables and dependent outcomes of interest by assessing the impact of third variables. This type of statistical analysis is applicable for many clinical nursing research questions, yet its use within nursing remains low. Indeed, mediational analyses may help nurse researchers develop more effective and accurate prevention and treatment programs as well as help bridge the gap between scientific knowledge and clinical practice. In addition, this statistical approach allows nurse researchers to ask - and answer - more meaningful and nuanced questions that extend beyond merely determining whether an outcome occurs. Therefore, the goal of this paper is to provide a brief tutorial on the use of mediational analyses in clinical nursing research by briefly introducing the technique and, through selected empirical examples from the nursing literature, demonstrating its applicability in advancing nursing science.
Statistical Learning and Language: An Individual Differences Study
ERIC Educational Resources Information Center
Misyak, Jennifer B.; Christiansen, Morten H.
2012-01-01
Although statistical learning and language have been assumed to be intertwined, this theoretical presupposition has rarely been tested empirically. The present study investigates the relationship between statistical learning and language using a within-subject design embedded in an individual-differences framework. Participants were administered…
Parks, Sean A; Parisien, Marc-André; Miller, Carol; Dobrowski, Solomon Z
2014-01-01
Numerous theoretical and empirical studies have shown that wildfire activity (e.g., area burned) at regional to global scales may be limited at the extremes of environmental gradients such as productivity or moisture. Fire activity, however, represents only one component of the fire regime, and no studies to date have characterized fire severity along such gradients. Given the importance of fire severity in dictating ecological response to fire, this is a considerable knowledge gap. For the western US, we quantify relationships between climate and the fire regime by empirically describing both fire activity and severity along two climatic water balance gradients, actual evapotranspiration (AET) and water deficit (WD), that can be considered proxies for fuel amount and fuel moisture, respectively. We also concurrently summarize fire activity and severity among ecoregions, providing an empirically based description of the geographic distribution of fire regimes. Our results show that fire activity in the western US increases with fuel amount (represented by AET) but has a unimodal (i.e., humped) relationship with fuel moisture (represented by WD); fire severity increases with fuel amount and fuel moisture. The explicit links between fire regime components and physical environmental gradients suggest that multivariable statistical models can be generated to produce an empirically based fire regime map for the western US. Such models will potentially enable researchers to anticipate climate-mediated changes in fire recurrence and its impacts based on gridded spatial data representing future climate scenarios.
Parks, Sean A.; Parisien, Marc-André; Miller, Carol; Dobrowski, Solomon Z.
2014-01-01
Numerous theoretical and empirical studies have shown that wildfire activity (e.g., area burned) at regional to global scales may be limited at the extremes of environmental gradients such as productivity or moisture. Fire activity, however, represents only one component of the fire regime, and no studies to date have characterized fire severity along such gradients. Given the importance of fire severity in dictating ecological response to fire, this is a considerable knowledge gap. For the western US, we quantify relationships between climate and the fire regime by empirically describing both fire activity and severity along two climatic water balance gradients, actual evapotranspiration (AET) and water deficit (WD), that can be considered proxies for fuel amount and fuel moisture, respectively. We also concurrently summarize fire activity and severity among ecoregions, providing an empirically based description of the geographic distribution of fire regimes. Our results show that fire activity in the western US increases with fuel amount (represented by AET) but has a unimodal (i.e., humped) relationship with fuel moisture (represented by WD); fire severity increases with fuel amount and fuel moisture. The explicit links between fire regime components and physical environmental gradients suggest that multivariable statistical models can be generated to produce an empirically based fire regime map for the western US. Such models will potentially enable researchers to anticipate climate-mediated changes in fire recurrence and its impacts based on gridded spatial data representing future climate scenarios. PMID:24941290
Key issues review: evolution on rugged adaptive landscapes
NASA Astrophysics Data System (ADS)
Obolski, Uri; Ram, Yoav; Hadany, Lilach
2018-01-01
Adaptive landscapes represent a mapping between genotype and fitness. Rugged adaptive landscapes contain two or more adaptive peaks: allele combinations with higher fitness than any of their neighbors in the genetic space. How do populations evolve on such rugged landscapes? Evolutionary biologists have struggled with this question since it was first introduced in the 1930s by Sewall Wright. Discoveries in the fields of genetics and biochemistry inspired various mathematical models of adaptive landscapes. The development of landscape models led to numerous theoretical studies analyzing evolution on rugged landscapes under different biological conditions. The large body of theoretical work suggests that adaptive landscapes are major determinants of the progress and outcome of evolutionary processes. Recent technological advances in molecular biology and microbiology allow experimenters to measure adaptive values of large sets of allele combinations and construct empirical adaptive landscapes for the first time. Such empirical landscapes have already been generated in bacteria, yeast, viruses, and fungi, and are contributing to new insights about evolution on adaptive landscapes. In this Key Issues Review we will: (i) introduce the concept of adaptive landscapes; (ii) review the major theoretical studies of evolution on rugged landscapes; (iii) review some of the recently obtained empirical adaptive landscapes; (iv) discuss recent mathematical and statistical analyses motivated by empirical adaptive landscapes, as well as provide the reader with instructions and source code to implement simulations of evolution on adaptive landscapes; and (v) discuss possible future directions for this exciting field.
Modelling Size Structured Food Webs Using a Modified Niche Model with Two Predator Traits
Klecka, Jan
2014-01-01
The structure of food webs is frequently described using phenomenological stochastic models. A prominent example, the niche model, was found to produce artificial food webs resembling real food webs according to a range of summary statistics. However, the size structure of food webs generated by the niche model and real food webs has not yet been rigorously compared. To fill this void, I use a body mass based version of the niche model and compare prey-predator body mass allometry and predator-prey body mass ratios predicted by the model to empirical data. The results show that the model predicts weaker size structure than observed in many real food webs. I introduce a modified version of the niche model which allows to control the strength of size-dependence of predator-prey links. In this model, optimal prey body mass depends allometrically on predator body mass and on a second trait, such as foraging mode. These empirically motivated extensions of the model allow to represent size structure of real food webs realistically and can be used to generate artificial food webs varying in several aspects of size structure in a controlled way. Hence, by explicitly including the role of species traits, this model provides new opportunities for simulating the consequences of size structure for food web dynamics and stability. PMID:25119999
NASA Astrophysics Data System (ADS)
Aochi, Hideo; Douglas, John; Ulrich, Thomas
2017-03-01
We compare ground motions simulated from dynamic rupture scenarios, for the seismic gap along the North Anatolian Fault under the Marmara Sea (Turkey), to estimates from empirical ground motion prediction equations (GMPEs). Ground motions are simulated using a finite difference method and a 3-D model of the local crustal structure. They are analyzed at more than a thousand locations in terms of horizontal peak ground velocity. Characteristics of probable earthquake scenarios are strongly dependent on the hypothesized level of accumulated stress, in terms of a normalized stress parameter T. With respect to the GMPEs, it is found that simulations for many scenarios systematically overestimate the ground motions at all distances. Simulations for only some scenarios, corresponding to moderate stress accumulation, match the estimates from the GMPEs. The difference between the simulations and the GMPEs is used to quantify the relative probabilities of each scenario and, therefore, to revise the probability of the stress field. A magnitude Mw7+ operating at moderate prestress field (0.6 < T ≤ 0.7) is statistically more probable, as previously assumed in the logic tree of probabilistic assessment of rupture scenarios. This approach of revising the mechanical hypothesis by means of comparison to an empirical statistical model (e.g., a GMPE) is useful not only for practical seismic hazard assessments but also to understand crustal dynamics.
An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis.
ERIC Educational Resources Information Center
Zwick, Rebecca; Thayer, Dorothy T.; Lewis, Charles
1999-01-01
Developed an empirical Bayes enhancement to Mantel-Haenszel (MH) analysis of differential item functioning (DIF) in which it is assumed that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. (Author/SLD)
NASA Astrophysics Data System (ADS)
Loomis, John
2003-04-01
Past recreation studies have noted that on-site or visitor intercept surveys are subject to over-sampling of avid users (i.e., endogenous stratification) and have offered econometric solutions to correct for this. However, past papers do not estimate the empirical magnitude of the bias in benefit estimates with a real data set, nor do they compare the corrected estimates to benefit estimates derived from a population sample. This paper empirically examines the magnitude of the recreation benefits per trip bias by comparing estimates from an on-site river visitor intercept survey to a household survey. The difference in average benefits is quite large, with the on-site visitor survey yielding 24 per day trip, while the household survey yields 9.67 per day trip. A simple econometric correction for endogenous stratification in our count data model lowers the benefit estimate to $9.60 per day trip, a mean value nearly identical and not statistically different from the household survey estimate.
Turbulent Statistics From Time-Resolved PIV Measurements of a Jet Using Empirical Mode Decomposition
NASA Technical Reports Server (NTRS)
Dahl, Milo D.
2013-01-01
Empirical mode decomposition is an adaptive signal processing method that when applied to a broadband signal, such as that generated by turbulence, acts as a set of band-pass filters. This process was applied to data from time-resolved, particle image velocimetry measurements of subsonic jets prior to computing the second-order, two-point, space-time correlations from which turbulent phase velocities and length and time scales could be determined. The application of this method to large sets of simultaneous time histories is new. In this initial study, the results are relevant to acoustic analogy source models for jet noise prediction. The high frequency portion of the results could provide the turbulent values for subgrid scale models for noise that is missed in large-eddy simulations. The results are also used to infer that the cross-correlations between different components of the decomposed signals at two points in space, neglected in this initial study, are important.
Turbulent Statistics from Time-Resolved PIV Measurements of a Jet Using Empirical Mode Decomposition
NASA Technical Reports Server (NTRS)
Dahl, Milo D.
2012-01-01
Empirical mode decomposition is an adaptive signal processing method that when applied to a broadband signal, such as that generated by turbulence, acts as a set of band-pass filters. This process was applied to data from time-resolved, particle image velocimetry measurements of subsonic jets prior to computing the second-order, two-point, space-time correlations from which turbulent phase velocities and length and time scales could be determined. The application of this method to large sets of simultaneous time histories is new. In this initial study, the results are relevant to acoustic analogy source models for jet noise prediction. The high frequency portion of the results could provide the turbulent values for subgrid scale models for noise that is missed in large-eddy simulations. The results are also used to infer that the cross-correlations between different components of the decomposed signals at two points in space, neglected in this initial study, are important.
NASA Astrophysics Data System (ADS)
Li, Hao; Lei, Ming
2018-02-01
For the carbon market, good trading mechanism is the basis for the healthy development of the carbon trading market. In order to explore the core problem of carbon price formation, our research explores the influencing factors of the price of carbon trading market. After the preliminary statistical analysis, our study found that Hubei Province is in the leading position among seven pilots in the carbon trading volume and the transaction, so our study of carbon price takes Hubei Province as sample of the empirical research. Multi-time series model and ARCH model analysis method are used in the research, we use the data of Hubei carbon trading pilot from June 2014 to December 2016 to carry out empirical research, the results found that industrial income, energy price, government intervention and the number of participating corporation have significant effect on the carbon price, which provides a meaningful reference for the other pilots in-depth study, as well as the construction of a national carbon trading market.
Listening to hypochondriasis and hearing health anxiety.
Braddock, Autumn E; Abramowitz, Jonathan S
2006-09-01
Although hypochondriasis is categorized as a somatoform disorder in the Diagnostic and Statistical Manual of Mental Disorders Fourth Edition--Text Revision (DSM-IV-TR) due to excessive focus on bodily symptoms for at least 6 months, a contemporary conceptualization suggests that hypochondriasis represents an intense form of health anxiety. This article discusses the clinical presentation of hypochondriasis, etiological underpinnings and multiple maintaining factors, including physiological, cognitive and behavioral components. A cognitive-behavioral model of hypochondriasis as health anxiety and the empirically supported treatment based on the model are articulated. Future directions and informational resources are provided for both clinicians and patients.
An Application of Epidemiological Modeling to Information Diffusion
NASA Astrophysics Data System (ADS)
McCormack, Robert; Salter, William
Messages often spread within a population through unofficial - particularly web-based - media. Such ideas have been termed "memes." To impede the flow of terrorist messages and to promote counter messages within a population, intelligence analysts must understand how messages spread. We used statistical language processing technologies to operationalize "memes" as latent topics in electronic text and applied epidemiological techniques to describe and analyze patterns of message propagation. We developed our methods and applied them to English-language newspapers and blogs in the Arab world. We found that a relatively simple epidemiological model can reproduce some dynamics of observed empirical relationships.
Zhu, Yenan; Hsieh, Yee-Hsee; Dhingra, Rishi R; Dick, Thomas E; Jacono, Frank J; Galán, Roberto F
2013-02-01
Interactions between oscillators can be investigated with standard tools of time series analysis. However, these methods are insensitive to the directionality of the coupling, i.e., the asymmetry of the interactions. An elegant alternative was proposed by Rosenblum and collaborators [M. G. Rosenblum, L. Cimponeriu, A. Bezerianos, A. Patzak, and R. Mrowka, Phys. Rev. E 65, 041909 (2002); M. G. Rosenblum and A. S. Pikovsky, Phys. Rev. E 64, 045202 (2001)] which consists in fitting the empirical phases to a generic model of two weakly coupled phase oscillators. This allows one to obtain the interaction functions defining the coupling and its directionality. A limitation of this approach is that a solution always exists in the least-squares sense, even in the absence of coupling. To preclude spurious results, we propose a three-step protocol: (1) Determine if a statistical dependency exists in the data by evaluating the mutual information of the phases; (2) if so, compute the interaction functions of the oscillators; and (3) validate the empirical oscillator model by comparing the joint probability of the phases obtained from simulating the model with that of the empirical phases. We apply this protocol to a model of two coupled Stuart-Landau oscillators and show that it reliably detects genuine coupling. We also apply this protocol to investigate cardiorespiratory coupling in anesthetized rats. We observe reciprocal coupling between respiration and heartbeat and that the influence of respiration on the heartbeat is generally much stronger than vice versa. In addition, we find that the vagus nerve mediates coupling in both directions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lovejoy, S., E-mail: lovejoy@physics.mcgill.ca; Lima, M. I. P. de; Department of Civil Engineering, University of Coimbra, 3030-788 Coimbra
2015-07-15
Over the range of time scales from about 10 days to 30–100 years, in addition to the familiar weather and climate regimes, there is an intermediate “macroweather” regime characterized by negative temporal fluctuation exponents: implying that fluctuations tend to cancel each other out so that averages tend to converge. We show theoretically and numerically that macroweather precipitation can be modeled by a stochastic weather-climate model (the Climate Extended Fractionally Integrated Flux, model, CEFIF) first proposed for macroweather temperatures and we show numerically that a four parameter space-time CEFIF model can approximately reproduce eight or so empirical space-time exponents. In spitemore » of this success, CEFIF is theoretically and numerically difficult to manage. We therefore propose a simplified stochastic model in which the temporal behavior is modeled as a fractional Gaussian noise but the spatial behaviour as a multifractal (climate) cascade: a spatial extension of the recently introduced ScaLIng Macroweather Model, SLIMM. Both the CEFIF and this spatial SLIMM model have a property often implicitly assumed by climatologists that climate statistics can be “homogenized” by normalizing them with the standard deviation of the anomalies. Physically, it means that the spatial macroweather variability corresponds to different climate zones that multiplicatively modulate the local, temporal statistics. This simplified macroweather model provides a framework for macroweather forecasting that exploits the system's long range memory and spatial correlations; for it, the forecasting problem has been solved. We test this factorization property and the model with the help of three centennial, global scale precipitation products that we analyze jointly in space and in time.« less
Zero-state Markov switching count-data models: an empirical assessment.
Malyshkina, Nataliya V; Mannering, Fred L
2010-01-01
In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.
2012-01-01
Background Patient Safety Indicators (PSI) are being modestly used in Spain, somewhat due to concerns on their empirical properties. This paper provides evidence by answering three questions: a) Are PSI differences across hospitals systematic -rather than random?; b) Do PSI measure differences among hospital-providers -as opposed to differences among patients?; and, c) Are measurements able to detect hospitals with a higher than "expected" number of cases? Methods An empirical validation study on administrative data was carried out. All 2005 and 2006 publicly-funded hospital discharges were used to retrieve eligible cases of five PSI: Death in low-mortality DRGs (MLM); decubitus ulcer (DU); postoperative pulmonary embolism or deep-vein thrombosis (PE-DVT); catheter-related infections (CRI), and postoperative sepsis (PS). Empirical Bayes statistic (EB) was used to estimate whether the variation was systematic; logistic-multilevel modelling determined what proportion of the variation was explained by the hospital; and, shrunken residuals, as provided by multilevel modelling, were plotted to flag hospitals performing worse than expected. Results Variation across hospitals was observed to be systematic in all indicators, with EB values ranging from 0.19 (CI95%:0.12 to 0.28) in PE-DVT to 0.34 (CI95%:0.25 to 0.45) in DU. A significant proportion of the variance was explained by the hospital, once patient case-mix was adjusted: from a 6% in MLM (CI95%:3% to 11%) to a 24% (CI95%:20% to 30%) in CRI. All PSI were able to flag hospitals with rates over the expected, although this capacity decreased when the largest hospitals were analysed. Conclusion Five PSI showed reasonable empirical properties to screen healthcare performance in Spanish hospitals, particularly in the largest ones. PMID:22369291
Assessment of rockfall susceptibility by integrating statistical and physically-based approaches
NASA Astrophysics Data System (ADS)
Frattini, Paolo; Crosta, Giovanni; Carrara, Alberto; Agliardi, Federico
In Val di Fassa (Dolomites, Eastern Italian Alps) rockfalls constitute the most significant gravity-induced natural disaster that threatens both the inhabitants of the valley, who are few, and the thousands of tourists who populate the area in summer and winter. To assess rockfall susceptibility, we developed an integrated statistical and physically-based approach that aimed to predict both the susceptibility to onset and the probability that rockfalls will attain specific reaches. Through field checks and multi-temporal aerial photo-interpretation, we prepared a detailed inventory of both rockfall source areas and associated scree-slope deposits. Using an innovative technique based on GIS tools and a 3D rockfall simulation code, grid cells pertaining to the rockfall source-area polygons were classified as active or inactive, based on the state of activity of the associated scree-slope deposits. The simulation code allows one to link each source grid cell with scree deposit polygons by calculating the trajectory of each simulated launch of blocks. By means of discriminant analysis, we then identified the mix of environmental variables that best identifies grid cells with low or high susceptibility to rockfalls. Among these variables, structural setting, land use, and morphology were the most important factors that led to the initiation of rockfalls. We developed 3D simulation models of the runout distance, intensity and frequency of rockfalls, whose source grid cells corresponded either to the geomorphologically-defined source polygons ( geomorphological scenario) or to study area grid cells with slope angle greater than an empirically-defined value of 37° ( empirical scenario). For each scenario, we assigned to the source grid cells an either fixed or variable onset susceptibility; the latter was derived from the discriminant model group (active/inactive) membership probabilities. Comparison of these four models indicates that the geomorphological scenario with variable onset susceptibility appears to be the most realistic model. Nevertheless, political and legal issues seem to guide local administrators, who tend to select the more conservative empirically-based scenario as a land-planning tool.
Kolmogorov-Smirnov test for spatially correlated data
Olea, R.A.; Pawlowsky-Glahn, V.
2009-01-01
The Kolmogorov-Smirnov test is a convenient method for investigating whether two underlying univariate probability distributions can be regarded as undistinguishable from each other or whether an underlying probability distribution differs from a hypothesized distribution. Application of the test requires that the sample be unbiased and the outcomes be independent and identically distributed, conditions that are violated in several degrees by spatially continuous attributes, such as topographical elevation. A generalized form of the bootstrap method is used here for the purpose of modeling the distribution of the statistic D of the Kolmogorov-Smirnov test. The innovation is in the resampling, which in the traditional formulation of bootstrap is done by drawing from the empirical sample with replacement presuming independence. The generalization consists of preparing resamplings with the same spatial correlation as the empirical sample. This is accomplished by reading the value of unconditional stochastic realizations at the sampling locations, realizations that are generated by simulated annealing. The new approach was tested by two empirical samples taken from an exhaustive sample closely following a lognormal distribution. One sample was a regular, unbiased sample while the other one was a clustered, preferential sample that had to be preprocessed. Our results show that the p-value for the spatially correlated case is always larger that the p-value of the statistic in the absence of spatial correlation, which is in agreement with the fact that the information content of an uncorrelated sample is larger than the one for a spatially correlated sample of the same size. ?? Springer-Verlag 2008.
NASA Astrophysics Data System (ADS)
Bergant, Klemen; Kajfež-Bogataj, Lučka; Črepinšek, Zalika
2002-02-01
Phenological observations are a valuable source of information for investigating the relationship between climate variation and plant development. Potential climate change in the future will shift the occurrence of phenological phases. Information about future climate conditions is needed in order to estimate this shift. General circulation models (GCM) provide the best information about future climate change. They are able to simulate reliably the most important mean features on a large scale, but they fail on a regional scale because of their low spatial resolution. A common approach to bridging the scale gap is statistical downscaling, which was used to relate the beginning of flowering of Taraxacum officinale in Slovenia with the monthly mean near-surface air temperature for January, February and March in Central Europe. Statistical models were developed and tested with NCAR/NCEP Reanalysis predictor data and EARS predictand data for the period 1960-1999. Prior to developing statistical models, empirical orthogonal function (EOF) analysis was employed on the predictor data. Multiple linear regression was used to relate the beginning of flowering with expansion coefficients of the first three EOF for the Janauary, Febrauary and March air temperatures, and a strong correlation was found between them. Developed statistical models were employed on the results of two GCM (HadCM3 and ECHAM4/OPYC3) to estimate the potential shifts in the beginning of flowering for the periods 1990-2019 and 2020-2049 in comparison with the period 1960-1989. The HadCM3 model predicts, on average, 4 days earlier occurrence and ECHAM4/OPYC3 5 days earlier occurrence of flowering in the period 1990-2019. The analogous results for the period 2020-2049 are a 10- and 11-day earlier occurrence.
Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.
ERIC Educational Resources Information Center
Jarrell, Michele G.
A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…
Approximate Model of Zone Sedimentation
NASA Astrophysics Data System (ADS)
Dzianik, František
2011-12-01
The process of zone sedimentation is affected by many factors that are not possible to express analytically. For this reason, the zone settling is evaluated in practice experimentally or by application of an empirical mathematical description of the process. The paper presents the development of approximate model of zone settling, i.e. the general function which should properly approximate the behaviour of the settling process within its entire range and at the various conditions. Furthermore, the specification of the model parameters by the regression analysis of settling test results is shown. The suitability of the model is reviewed by graphical dependencies and by statistical coefficients of correlation. The approximate model could by also useful on the simplification of process design of continual settling tanks and thickeners.
Introduction to the special section on mixture modeling in personality assessment.
Wright, Aidan G C; Hallquist, Michael N
2014-01-01
Latent variable models offer a conceptual and statistical framework for evaluating the underlying structure of psychological constructs, including personality and psychopathology. Complex structures that combine or compare categorical and dimensional latent variables can be accommodated using mixture modeling approaches, which provide a powerful framework for testing nuanced theories about psychological structure. This special series includes introductory primers on cross-sectional and longitudinal mixture modeling, in addition to empirical examples applying these techniques to real-world data collected in clinical settings. This group of articles is designed to introduce personality assessment scientists and practitioners to a general latent variable framework that we hope will stimulate new research and application of mixture models to the assessment of personality and its pathology.
Comparison of field-aligned currents at ionospheric and magnetospheric altitudes
NASA Technical Reports Server (NTRS)
Spence, H. E.; Kivelson, M. G.; Walker, R. J.
1988-01-01
Using the empirical terrestrial magnetospheric magnetic field models of Tsyganenko and Usmanov (1982) and Tsyganenko (1987) the average field-aligned currents (FACs) in the magnetosphere were determined as a function of the Kp index. Three major model FAC systems were identified, namely, the dayside region 1, the nightside region 1, and the nightside polar cap. The models provide information about the sources of the current systems. Mapped ionospheric model FACs are compared with low-altitude measurements obtained by the spacecraft. It is found that low-altitude data can reveal either classic region 1/2 or more highly structured FAC patterns. Therefore, statistical results either obtained from observations or inferred from models are expected to be averages over temporally and spatially shifting patterns.
A review of failure models for unidirectional ceramic matrix composites under monotonic loads
NASA Technical Reports Server (NTRS)
Tripp, David E.; Hemann, John H.; Gyekenyesi, John P.
1989-01-01
Ceramic matrix composites offer significant potential for improving the performance of turbine engines. In order to achieve their potential, however, improvements in design methodology are needed. In the past most components using structural ceramic matrix composites were designed by trial and error since the emphasis of feasibility demonstration minimized the development of mathematical models. To understand the key parameters controlling response and the mechanics of failure, the development of structural failure models is required. A review of short term failure models with potential for ceramic matrix composite laminates under monotonic loads is presented. Phenomenological, semi-empirical, shear-lag, fracture mechanics, damage mechanics, and statistical models for the fast fracture analysis of continuous fiber unidirectional ceramic matrix composites under monotonic loads are surveyed.
Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn
2009-01-01
In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.
Patel, Ashaben; Erb, Steven M; Strange, Linda; Shukla, Ravi S; Kumru, Ozan S; Smith, Lee; Nelson, Paul; Joshi, Sangeeta B; Livengood, Jill A; Volkin, David B
2018-05-24
A combination experimental approach, utilizing semi-empirical excipient screening followed by statistical modeling using design of experiments (DOE), was undertaken to identify stabilizing candidate formulations for a lyophilized live attenuated Flavivirus vaccine candidate. Various potential pharmaceutical compounds used in either marketed or investigative live attenuated viral vaccine formulations were first identified. The ability of additives from different categories of excipients, either alone or in combination, were then evaluated for their ability to stabilize virus against freeze-thaw, freeze-drying, and accelerated storage (25°C) stresses by measuring infectious virus titer. An exploratory data analysis and predictive DOE modeling approach was subsequently undertaken to gain a better understanding of the interplay between the key excipients and stability of virus as well as to determine which combinations were interacting to improve virus stability. The lead excipient combinations were identified and tested for stabilizing effects using a tetravalent mixture of viruses in accelerated and real time (2-8°C) stability studies. This work demonstrates the utility of combining semi-empirical excipient screening and DOE experimental design strategies in the formulation development of lyophilized live attenuated viral vaccine candidates. Copyright © 2017 Elsevier Ltd. All rights reserved.
Accounting for Sampling Error in Genetic Eigenvalues Using Random Matrix Theory.
Sztepanacz, Jacqueline L; Blows, Mark W
2017-07-01
The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be overdispersed by sampling error, where large eigenvalues are biased upward, and small eigenvalues are biased downward. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy-Widom (TW) distribution. Here we show that genetic eigenvalues estimated using restricted maximum likelihood (REML) in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW distribution, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix. Copyright © 2017 by the Genetics Society of America.
Evolving Scale-Free Networks by Poisson Process: Modeling and Degree Distribution.
Feng, Minyu; Qu, Hong; Yi, Zhang; Xie, Xiurui; Kurths, Jurgen
2016-05-01
Since the great mathematician Leonhard Euler initiated the study of graph theory, the network has been one of the most significant research subject in multidisciplinary. In recent years, the proposition of the small-world and scale-free properties of complex networks in statistical physics made the network science intriguing again for many researchers. One of the challenges of the network science is to propose rational models for complex networks. In this paper, in order to reveal the influence of the vertex generating mechanism of complex networks, we propose three novel models based on the homogeneous Poisson, nonhomogeneous Poisson and birth death process, respectively, which can be regarded as typical scale-free networks and utilized to simulate practical networks. The degree distribution and exponent are analyzed and explained in mathematics by different approaches. In the simulation, we display the modeling process, the degree distribution of empirical data by statistical methods, and reliability of proposed networks, results show our models follow the features of typical complex networks. Finally, some future challenges for complex systems are discussed.
Review of Nearshore Morphologic Prediction
NASA Astrophysics Data System (ADS)
Plant, N. G.; Dalyander, S.; Long, J.
2014-12-01
The evolution of the world's erodible coastlines will determine the balance between the benefits and costs associated with human and ecological utilization of shores, beaches, dunes, barrier islands, wetlands, and estuaries. So, we would like to predict coastal evolution to guide management and planning of human and ecological response to coastal changes. After decades of research investment in data collection, theoretical and statistical analysis, and model development we have a number of empirical, statistical, and deterministic models that can predict the evolution of the shoreline, beaches, dunes, and wetlands over time scales of hours to decades, and even predict the evolution of geologic strata over the course of millennia. Comparisons of predictions to data have demonstrated that these models can have meaningful predictive skill. But these comparisons also highlight the deficiencies in fundamental understanding, formulations, or data that are responsible for prediction errors and uncertainty. Here, we review a subset of predictive models of the nearshore to illustrate tradeoffs in complexity, predictive skill, and sensitivity to input data and parameterization errors. We identify where future improvement in prediction skill will result from improved theoretical understanding, and data collection, and model-data assimilation.
Selecting the most appropriate inferential statistical test for your quantitative research study.
Bettany-Saltikov, Josette; Whittaker, Victoria Jane
2014-06-01
To discuss the issues and processes relating to the selection of the most appropriate statistical test. A review of the basic research concepts together with a number of clinical scenarios is used to illustrate this. Quantitative nursing research generally features the use of empirical data which necessitates the selection of both descriptive and statistical tests. Different types of research questions can be answered by different types of research designs, which in turn need to be matched to a specific statistical test(s). Discursive paper. This paper discusses the issues relating to the selection of the most appropriate statistical test and makes some recommendations as to how these might be dealt with. When conducting empirical quantitative studies, a number of key issues need to be considered. Considerations for selecting the most appropriate statistical tests are discussed and flow charts provided to facilitate this process. When nursing clinicians and researchers conduct quantitative research studies, it is crucial that the most appropriate statistical test is selected to enable valid conclusions to be made. © 2013 John Wiley & Sons Ltd.
Hatfield, Laura A.; Gutreuter, Steve; Boogaard, Michael A.; Carlin, Bradley P.
2011-01-01
Estimation of extreme quantal-response statistics, such as the concentration required to kill 99.9% of test subjects (LC99.9), remains a challenge in the presence of multiple covariates and complex study designs. Accurate and precise estimates of the LC99.9 for mixtures of toxicants are critical to ongoing control of a parasitic invasive species, the sea lamprey, in the Laurentian Great Lakes of North America. The toxicity of those chemicals is affected by local and temporal variations in water chemistry, which must be incorporated into the modeling. We develop multilevel empirical Bayes models for data from multiple laboratory studies. Our approach yields more accurate and precise estimation of the LC99.9 compared to alternative models considered. This study demonstrates that properly incorporating hierarchical structure in laboratory data yields better estimates of LC99.9 stream treatment values that are critical to larvae control in the field. In addition, out-of-sample prediction of the results of in situ tests reveals the presence of a latent seasonal effect not manifest in the laboratory studies, suggesting avenues for future study and illustrating the importance of dual consideration of both experimental and observational data.
Chan, R W; Titze, I R
2000-01-01
The viscoelastic shear properties of human vocal fold mucosa (cover) were previously measured as a function of frequency [Chan and Titze, J. Acoust. Soc. Am. 106, 2008-2021 (1999)], but data were obtained only in a frequency range of 0.01-15 Hz, an order of magnitude below typical frequencies of vocal fold oscillation (on the order of 100 Hz). This study represents an attempt to extrapolate the data to higher frequencies based on two viscoelastic theories, (1) a quasilinear viscoelastic theory widely used for the constitutive modeling of the viscoelastic properties of biological tissues [Fung, Biomechanics (Springer-Verlag, New York, 1993), pp. 277-292], and (2) a molecular (statistical network) theory commonly used for the rheological modeling of polymeric materials [Zhu et al., J. Biomech. 24, 1007-1018 (1991)]. Analytical expressions of elastic and viscous shear moduli, dynamic viscosity, and damping ratio based on the two theories with specific model parameters were applied to curve-fit the empirical data. Results showed that the theoretical predictions matched the empirical data reasonably well, allowing for parametric descriptions of the data and their extrapolations to frequencies of phonation.
Modeling bias and variation in the stochastic processes of small RNA sequencing
Etheridge, Alton; Sakhanenko, Nikita; Galas, David
2017-01-01
Abstract The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data. PMID:28369495
An Empirical State Error Covariance Matrix Orbit Determination Example
NASA Technical Reports Server (NTRS)
Frisbee, Joseph H., Jr.
2015-01-01
State estimation techniques serve effectively to provide mean state estimates. However, the state error covariance matrices provided as part of these techniques suffer from some degree of lack of confidence in their ability to adequately describe the uncertainty in the estimated states. A specific problem with the traditional form of state error covariance matrices is that they represent only a mapping of the assumed observation error characteristics into the state space. Any errors that arise from other sources (environment modeling, precision, etc.) are not directly represented in a traditional, theoretical state error covariance matrix. First, consider that an actual observation contains only measurement error and that an estimated observation contains all other errors, known and unknown. Then it follows that a measurement residual (the difference between expected and observed measurements) contains all errors for that measurement. Therefore, a direct and appropriate inclusion of the actual measurement residuals in the state error covariance matrix of the estimate will result in an empirical state error covariance matrix. This empirical state error covariance matrix will fully include all of the errors in the state estimate. The empirical error covariance matrix is determined from a literal reinterpretation of the equations involved in the weighted least squares estimation algorithm. It is a formally correct, empirical state error covariance matrix obtained through use of the average form of the weighted measurement residual variance performance index rather than the usual total weighted residual form. Based on its formulation, this matrix will contain the total uncertainty in the state estimate, regardless as to the source of the uncertainty and whether the source is anticipated or not. It is expected that the empirical error covariance matrix will give a better, statistical representation of the state error in poorly modeled systems or when sensor performance is suspect. In its most straight forward form, the technique only requires supplemental calculations to be added to existing batch estimation algorithms. In the current problem being studied a truth model making use of gravity with spherical, J2 and J4 terms plus a standard exponential type atmosphere with simple diurnal and random walk components is used. The ability of the empirical state error covariance matrix to account for errors is investigated under four scenarios during orbit estimation. These scenarios are: exact modeling under known measurement errors, exact modeling under corrupted measurement errors, inexact modeling under known measurement errors, and inexact modeling under corrupted measurement errors. For this problem a simple analog of a distributed space surveillance network is used. The sensors in this network make only range measurements and with simple normally distributed measurement errors. The sensors are assumed to have full horizon to horizon viewing at any azimuth. For definiteness, an orbit at the approximate altitude and inclination of the International Space Station is used for the study. The comparison analyses of the data involve only total vectors. No investigation of specific orbital elements is undertaken. The total vector analyses will look at the chisquare values of the error in the difference between the estimated state and the true modeled state using both the empirical and theoretical error covariance matrices for each of scenario.
Jahn, I; Foraita, R
2008-01-01
In Germany gender-sensitive approaches are part of guidelines for good epidemiological practice as well as health reporting. They are increasingly claimed to realize the gender mainstreaming strategy in research funding by the federation and federal states. This paper focuses on methodological aspects of data analysis, as an empirical data example of which serves the health report of Bremen, a population-based cross-sectional study. Health reporting requires analysis and reporting methods that are able to discover sex/gender issues of questions, on the one hand, and consider how results can adequately be communicated, on the other hand. The core question is: Which consequences do a different inclusion of the category sex in different statistical analyses for identification of potential target groups have on the results? As evaluation methods logistic regressions as well as a two-stage procedure were exploratively conducted. This procedure combines graphical models with CHAID decision trees and allows for visualising complex results. Both methods are analysed by stratification as well as adjusted by sex/gender and compared with each other. As a result, only stratified analyses are able to detect differences between the sexes and within the sex/gender groups as long as one cannot resort to previous knowledge. Adjusted analyses can detect sex/gender differences only if interaction terms have been included in the model. Results are discussed from a statistical-epidemiological perspective as well as in the context of health reporting. As a conclusion, the question, if a statistical method is gender-sensitive, can only be answered by having concrete research questions and known conditions. Often, an appropriate statistic procedure can be chosen after conducting a separate analysis for women and men. Future gender studies deserve innovative study designs as well as conceptual distinctiveness with regard to the biological and the sociocultural elements of the category sex/gender.
Modeling Smoke Plume-Rise and Dispersion from Southern United States Prescribed Burns with Daysmoke.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Achtemeier, Gary, L.; Goodrick, Scott, A.; Liu, Yongqiang
2011-08-19
We present Daysmoke, an empirical-statistical plume rise and dispersion model for simulating smoke from prescribed burns. Prescribed fires are characterized by complex plume structure including multiple-core updrafts which makes modeling with simple plume models difficult. Daysmoke accounts for plume structure in a three-dimensional veering/sheering atmospheric environment, multiple-core updrafts, and detrainment of particulate matter. The number of empirical coefficients appearing in the model theory is reduced through a sensitivity analysis with the Fourier Amplitude Sensitivity Test (FAST). Daysmoke simulations for 'bent-over' plumes compare closely with Briggs theory although the two-thirds law is not explicit in Daysmoke. However, the solutions for themore » 'highly-tilted' plume characterized by weak buoyancy, low initial vertical velocity, and large initial plume diameter depart considerably from Briggs theory. Results from a study of weak plumes from prescribed burns at Fort Benning GA showed simulated ground-level PM2.5 comparing favorably with observations taken within the first eight kilometers of eleven prescribed burns. Daysmoke placed plume tops near the lower end of the range of observed plume tops for six prescribed burns. Daysmoke provides the levels and amounts of smoke injected into regional scale air quality models. Results from CMAQ with and without an adaptive grid are presented.« less
Studying aerodynamic drag for modeling the kinematical behavior of CMEs
NASA Astrophysics Data System (ADS)
Temmer, M.; Vrsnak, B.; Moestl, C.; Zic, T.; Veronig, A. M.; Rollett, T.
2013-12-01
With the SECCHI instrument suite aboard STEREO, coronal mass ejections (CMEs) can be observed from multiple vantage points during their entire propagation all the way from the Sun to 1 AU. The propagation behavior of CMEs in interplanetary space is mainly influenced by the ambient solar wind flow. CMEs that are faster than the ambient solar wind get decelerated, whereas slower ones are accelerated until the CME speed is finally adjusted to the solar wind speed. On a statistical basis, empirical models taking into account the drag force acting on CMEs, are able to describe the observed kinematical behaviors. For several well observed CME events we derive the kinematical evolution by combining remote sensing and in situ data. The observed kinematical behavior is compared to results from current empirical and numerical propagation models. For this we mainly use the drag based model DBM as well as the MHD model ENLIL. We aim to obtain the distance regime at which the solar wind drag force is dominating the CME propagation and quantify differences between different model results. This work has received funding from the FWF: V195-N16, and the European Commission FP7 Projects eHEROES (284461, www.eheroes.eu) and COMESEP (263252, www.comesep.eu).
NASA Astrophysics Data System (ADS)
Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi
2008-11-01
Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the statistical properties of the three-dimensional inhomogeneity in galaxy clusters.
Duration on unemployment: geographic mobility and selectivity bias.
Goss, E P; Paul, C; Wilhite, A
1994-01-01
Modeling the factors affecting the duration of unemployment was found to be influenced by the inclusion of migration factors. Traditional models which did not control for migration factors were found to underestimate movers' probability of finding an acceptable job. The empirical test of the theory, based on the analysis of data on US household heads unemployed in 1982 and employed in 1982 and 1983, found that the cumulative probability of reemployment in the traditional model was .422 and in the migration selectivity model was .624 after 30 weeks of searching. In addition, controlling for selectivity eliminated the significance of the relationship between race and job search duration in the model. The relationship between search duration and the county unemployment rate in 1982 became statistically significant, and the relationship between search duration and 1980 population per square mile in the 1982 county of residence became statistically insignificant. The finding that non-Whites have a longer duration of unemployment can better be understood as non-Whites' lower geographic mobility and lack of greater job contacts. The statistical significance of a high unemployment rate in the home labor market reducing the probability of finding employment was more in keeping with expectations. The findings assumed that the duration of employment accurately reflected the length of job search. The sample was redrawn to exclude discouraged workers and the analysis was repeated. The findings were similar to the full sample, with the coefficient for migration variable being negative and statistically significant and the coefficient for alpha remaining positive and statistically significant. Race in the selectivity model remained statistically insignificant. The findings supported the Schwartz model hypothesizing that the expansion of the radius of the search would reduce the duration of unemployment. The exclusion of the migration factor misspecified the equation for unemployment duration. Policy should be directed to the problems of geographic mobility, particularly among non-Whites.
Scaling laws and fluctuations in the statistics of word frequencies
NASA Astrophysics Data System (ADS)
Gerlach, Martin; Altmann, Eduardo G.
2014-11-01
In this paper, we combine statistical analysis of written texts and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. The average vocabulary of an ensemble of fixed-length texts is known to scale sublinearly with the total number of words (Heaps’ law). Analyzing the fluctuations around this average in three large databases (Google-ngram, English Wikipedia, and a collection of scientific articles), we find that the standard deviation scales linearly with the average (Taylor's law), in contrast to the prediction of decaying fluctuations obtained using simple sampling arguments. We explain both scaling laws (Heaps’ and Taylor) by modeling the usage of words using a Poisson process with a fat-tailed distribution of word frequencies (Zipf's law) and topic-dependent frequencies of individual words (as in topic models). Considering topical variations lead to quenched averages, turn the vocabulary size a non-self-averaging quantity, and explain the empirical observations. For the numerous practical applications relying on estimations of vocabulary size, our results show that uncertainties remain large even for long texts. We show how to account for these uncertainties in measurements of lexical richness of texts with different lengths.
Do quantitative decadal forecasts from GCMs provide decision relevant skill?
NASA Astrophysics Data System (ADS)
Suckling, E. B.; Smith, L. A.
2012-04-01
It is widely held that only physics-based simulation models can capture the dynamics required to provide decision-relevant probabilistic climate predictions. This fact in itself provides no evidence that predictions from today's GCMs are fit for purpose. Empirical (data-based) models are employed to make probability forecasts on decadal timescales, where it is argued that these 'physics free' forecasts provide a quantitative 'zero skill' target for the evaluation of forecasts based on more complicated models. It is demonstrated that these zero skill models are competitive with GCMs on decadal scales for probability forecasts evaluated over the last 50 years. Complications of statistical interpretation due to the 'hindcast' nature of this experiment, and the likely relevance of arguments that the lack of hindcast skill is irrelevant as the signal will soon 'come out of the noise' are discussed. A lack of decision relevant quantiative skill does not bring the science-based insights of anthropogenic warming into doubt, but it does call for a clear quantification of limits, as a function of lead time, for spatial and temporal scales on which decisions based on such model output are expected to prove maladaptive. Failing to do so may risk the credibility of science in support of policy in the long term. The performance amongst a collection of simulation models is evaluated, having transformed ensembles of point forecasts into probability distributions through the kernel dressing procedure [1], according to a selection of proper skill scores [2] and contrasted with purely data-based empirical models. Data-based models are unlikely to yield realistic forecasts for future climate change if the Earth system moves away from the conditions observed in the past, upon which the models are constructed; in this sense the empirical model defines zero skill. When should a decision relevant simulation model be expected to significantly outperform such empirical models? Probability forecasts up to ten years ahead (decadal forecasts) are considered, both on global and regional spatial scales for surface air temperature. Such decadal forecasts are not only important in terms of providing information on the impacts of near-term climate change, but also from the perspective of climate model validation, as hindcast experiments and a sufficient database of historical observations allow standard forecast verification methods to be used. Simulation models from the ENSEMBLES hindcast experiment [3] are evaluated and contrasted with static forecasts of the observed climatology, persistence forecasts and against simple statistical models, called dynamic climatology (DC). It is argued that DC is a more apropriate benchmark in the case of a non-stationary climate. It is found that the ENSEMBLES models do not demonstrate a significant increase in skill relative to the empirical models even at global scales over any lead time up to a decade ahead. It is suggested that the contsruction and co-evaluation with the data-based models become a regular component of the reporting of large simulation model forecasts. The methodology presented may easily be adapted to other forecasting experiments and is expected to influence the design of future experiments. The inclusion of comparisons with dynamic climatology and other data-based approaches provide important information to both scientists and decision makers on which aspects of state-of-the-art simulation forecasts are likely to be fit for purpose. [1] J. Bröcker and L. A. Smith. From ensemble forecasts to predictive distributions, Tellus A, 60(4), 663-678 (2007). [2] J. Bröcker and L. A. Smith. Scoring probabilistic forecasts: The importance of being proper, Weather and Forecasting, 22, 382-388 (2006). [3] F. J. Doblas-Reyes, A. Weisheimer, T. N. Palmer, J. M. Murphy and D. Smith. Forecast quality asessment of the ENSEMBLES seasonal-to-decadal stream 2 hindcasts, ECMWF Technical Memorandum, 621 (2010).
Time irreversibility and multifractality of power along single particle trajectories in turbulence
NASA Astrophysics Data System (ADS)
Cencini, Massimo; Biferale, Luca; Boffetta, Guido; De Pietro, Massimo
2017-10-01
The irreversible turbulent energy cascade epitomizes strongly nonequilibrium systems. At the level of single fluid particles, time irreversibility is revealed by the asymmetry of the rate of kinetic energy change, the Lagrangian power, whose moments display a power-law dependence on the Reynolds number, as recently shown by Xu et al. [H. Xu et al., Proc. Natl. Acad. Sci. USA 111, 7558 (2014), 10.1073/pnas.1321682111]. Here Lagrangian power statistics are rationalized within the multifractal model of turbulence, whose predictions are shown to agree with numerical and empirical data. Multifractal predictions are also tested, for very large Reynolds numbers, in dynamical models of the turbulent cascade, obtaining remarkably good agreement for statistical quantities insensitive to the asymmetry and, remarkably, deviations for those probing the asymmetry. These findings raise fundamental questions concerning time irreversibility in the infinite-Reynolds-number limit of the Navier-Stokes equations.
Empirical estimation of astrophysical photodisintegration rates of 106Cd
NASA Astrophysics Data System (ADS)
Belyshev, S. S.; Kuznetsov, A. A.; Stopani, K. A.
2017-09-01
It has been noted in previous experiments that the ratio between the photoneutron and photoproton disintegration channels of 106Cd might be considerably different from predictions of statistical models. The thresholds of these reactions differ by several MeV and the total astrophysical rate of photodisintegration of 106Cd, which is mostly produced in photonuclear reactions during the p-process nucleosynthesis, might be noticeably different from the calculated value. In this work the bremsstrahlung beam of a 55.6 MeV microtron and the photon activation technique is used to measure yields of photonuclear reaction products on isotopically-enriched cadmium targets. The obtained results are compared with predictions of statistical models. The experimental yields are used to estimate photodisintegration reaction rates on 106Cd, which are then used in nuclear network calculations to examine the effects of uncertainties on the produced abundences of p-nuclei.
Liu, Jian; Pedroza, Luana S; Misch, Carissa; Fernández-Serra, Maria V; Allen, Philip B
2014-07-09
We present total energy and force calculations for the (GaN)1-x(ZnO)x alloy. Site-occupancy configurations are generated from Monte Carlo (MC) simulations, on the basis of a cluster expansion model proposed in a previous study. Local atomic coordinate relaxations of surprisingly large magnitude are found via density-functional calculations using a 432-atom periodic supercell, for three representative configurations at x = 0.5. These are used to generate bond-length distributions. The configurationally averaged composition- and temperature-dependent short-range order (SRO) parameters of the alloys are discussed. The entropy is approximated in terms of pair distribution statistics and thus related to SRO parameters. This approximate entropy is compared with accurate numerical values from MC simulations. An empirical model for the dependence of the bond length on the local chemical environments is proposed.
A comment on measuring the Hurst exponent of financial time series
NASA Astrophysics Data System (ADS)
Couillard, Michel; Davison, Matt
2005-03-01
A fundamental hypothesis of quantitative finance is that stock price variations are independent and can be modeled using Brownian motion. In recent years, it was proposed to use rescaled range analysis and its characteristic value, the Hurst exponent, to test for independence in financial time series. Theoretically, independent time series should be characterized by a Hurst exponent of 1/2. However, finite Brownian motion data sets will always give a value of the Hurst exponent larger than 1/2 and without an appropriate statistical test such a value can mistakenly be interpreted as evidence of long term memory. We obtain a more precise statistical significance test for the Hurst exponent and apply it to real financial data sets. Our empirical analysis shows no long-term memory in some financial returns, suggesting that Brownian motion cannot be rejected as a model for price dynamics.
Dynamical Constraints On The Galaxy-Halo Connection
NASA Astrophysics Data System (ADS)
Desmond, Harry
2017-07-01
Dark matter halos comprise the bulk of the universe's mass, yet must be probed by the luminous galaxies that form within them. A key goal of modern astrophysics, therefore, is to robustly relate the visible and dark mass, which to first order means relating the properties of galaxies and halos. This may be expected not only to improve our knowledge of galaxy formation, but also to enable high-precision cosmological tests using galaxies and hence maximise the utility of future galaxy surveys. As halos are inaccessible to observations - as galaxies are to N-body simulations - this relation requires an additional modelling step.The aim of this thesis is to develop and evaluate models of the galaxy-halo connection using observations of galaxy dynamics. In particular, I build empirical models based on the technique of halo abundance matching for five key dynamical scaling relations of galaxies - the Tully-Fisher, Faber-Jackson, mass-size and mass discrepancy-acceleration relations, and Fundamental Plane - which relate their baryon distributions and rotation or velocity dispersion profiles. I then develop a statistical scheme based on approximate Bayesian computation to compare the predicted and measured values of a number of summary statistics describing the relations' important features. This not only provides quantitative constraints on the free parameters of the models, but also allows absolute goodness-of-fit measures to be formulated. I find some features to be naturally accounted for by an abundance matching approach and others to impose new constraints on the galaxy-halo connection; the remainder are challenging to account for and may imply galaxy-halo correlations beyond the scope of basic abundance matching.Besides providing concrete statistical tests of specific galaxy formation theories, these results will be of use for guiding the inputs of empirical and semi-analytic galaxy formation models, which require galaxy-halo correlations to be imposed by hand. As galaxy datasets become larger and more precise in the future, we may expect these methods to continue providing insight into the relation between the visible and dark matter content of the universe and the physical processes that underlie it.
Re-entry survivability and risk
NASA Astrophysics Data System (ADS)
Fudge, Michael L.
1998-11-01
This paper is the culmination of the research effort which was reported on last year while still in-progress. As previously reported, statistical methods for expressing the impact risk posed to space systems in general [and the International Space Station (ISS) in particular] by other resident space objects have been examined. One of the findings of this investigation is that there are legitimate physical modeling reasons for the common statistical expression of the collision risk. A combination of statistical methods and physical modeling is also used to express the impact risk posed by reentering space systems to objects of interest (e.g., people and property) on Earth. One of the largest uncertainties in the expressing of this risk is the estimation of survivable material which survives reentry to impact Earth's surface. This point was demonstrated in dramatic fashion in January 1997 by the impact of an intact expendable launch vehicle (ELV) upper stage near a private residence in the continental United States. Since approximately half of the missions supporting ISS will utilize ELVs, it is appropriate to examine the methods used to estimate the amount and physical characteristics of ELV debris surviving reentry to impact Earth's surface. This report details reentry survivability estimation methodology, including the specific methodology used by ITT Systems' (formerly Kaman Sciences) 'SURVIVE' model. The major change to the model in the last twelve months has been the increase in the fidelity with which upper- atmospheric aerodynamics has been modeled. This has resulted in an adjustment in the factor relating the amount of kinetic energy loss to the amount of heating entering and reentering body, and also validated and removed the necessity for certain empirically-based adjustments made to the theoretical heating expressions. Comparisons between empirical results (observations of objects which have been recovered on Earth after surviving reentry) and SURVIVE estimates are presented for selected generic upper stage or spacecraft components, a Soyuz launch vehicle second stage, and for a Delta II launch vehicle second stage and its significant components. Significant similarity is demonstrated between the type and dispersion pattern of the recovered debris from the January 1997 Delta II 2nd stage event and the simulation of that reentry and breakup.
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
The Development of Statistical Literacy at School
ERIC Educational Resources Information Center
Callingham, Rosemary; Watson, Jane M.
2017-01-01
Statistical literacy increasingly is considered an important outcome of schooling. There is little information, however, about appropriate expectations of students at different stages of schooling. Some progress towards this goal was made by Watson and Callingham (2005), who identified an empirical 6-level hierarchy of statistical literacy and the…
Explorations in Statistics: Permutation Methods
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2012-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This eighth installment of "Explorations in Statistics" explores permutation methods, empiric procedures we can use to assess an experimental result--to test a null hypothesis--when we are reluctant to trust statistical…
NASA Astrophysics Data System (ADS)
Dickson, N.
2009-12-01
The global observation, assimilation and prediction in numerical models of ice super-saturated (ISS) regions (ISSR) are crucial if the climate impact of aircraft condensations trails (contrails) is to be fully understood, and if, for example, contrail formation is to be avoided through aircraft operational measures. A robust assessment of the global distribution of ISSR will further this debate, and ISS event occurrence, frequency and spatial scales have recently attracted significant attention. The mean horizontal size of ISSR is 150 km (±250km) although 12-14% of ISS events occur on horizontal scales of less than 5km. The average vertical thickness of ISS layers is 600-800m (±575m) but layers ranging from 25m to 3000m have been observed, with up to one third of ISS layers thought to be less than 100m deep. Given their small scales compared to typical atmospheric model grid sizes, statistical representations of the spatial scales of ISSR are required, in both horizontal and vertical dimensions, if global occurrence of ISSR is to be adequately represented in climate models. This paper uses radiosonde launches made by the UK Meteorological Office, from the British Isles, Gibraltar, St. Helena and the Falkland Islands between January 2002 and December 2006, to investigate the probabilistic occurrence of ISSR. Specifically each radiosonde profile is divided into 50 and 100 hPa pressure layers, to emulate the coarse vertical resolution of some atmospheric models. Then the high resolution observations contained within each thick pressure layer are used to calculate an average relative humidity and an ISS fraction for each individual thick pressure layer. These relative humidity pressure layer descriptions are then linked through a probability function to produce an s-shaped curve describing the ISS fraction in any average relative humidity pressure layer. An empirical investigation has shown that this one curve is statistically valid for mid-latitude locations, irrespective of season and altitude, however, pressure layer depth is an important variable. Using this empirical understanding of the s-shaped relationship a mathematical model was developed to represent the ISS fraction within any arbitrary thick pressure layer. Here the statistical distributions of actual high resolution RHi observations in any thick pressure layer, along with an error function, are used to mathematically describe the s-shape. Two models were developed to represent both 50 and 100 hPa pressure layers with each reconstructing their respective s-shapes within 8-10% of the empirical curves. These new models can be used, to represent the small scale structures of ISS events, in modelled data where only low vertical resolution is available. This will be useful in understanding, and improving the global distribution, both observed and forecasted, of ice super-saturation.
Statistical distributions of extreme dry spell in Peninsular Malaysia
NASA Astrophysics Data System (ADS)
Zin, Wan Zawiah Wan; Jemain, Abdul Aziz
2010-11-01
Statistical distributions of annual extreme (AE) series and partial duration (PD) series for dry-spell event are analyzed for a database of daily rainfall records of 50 rain-gauge stations in Peninsular Malaysia, with recording period extending from 1975 to 2004. The three-parameter generalized extreme value (GEV) and generalized Pareto (GP) distributions are considered to model both series. In both cases, the parameters of these two distributions are fitted by means of the L-moments method, which provides a robust estimation of them. The goodness-of-fit (GOF) between empirical data and theoretical distributions are then evaluated by means of the L-moment ratio diagram and several goodness-of-fit tests for each of the 50 stations. It is found that for the majority of stations, the AE and PD series are well fitted by the GEV and GP models, respectively. Based on the models that have been identified, we can reasonably predict the risks associated with extreme dry spells for various return periods.
Biome-specific scaling of ocean productivity, temperature, and carbon export efficiency
NASA Astrophysics Data System (ADS)
Britten, Gregory L.; Primeau, François W.
2016-05-01
Mass conservation and metabolic theory place constraints on how marine export production (EP) scales with net primary productivity (NPP) and sea surface temperature (SST); however, little is empirically known about how these relationships vary across ecologically distinct ocean biomes. Here we compiled in situ observations of EP, NPP, and SST and used statistical model selection theory to demonstrate significant biome-specific scaling relationships among these variables. Multiple statistically similar models yield a threefold variation in the globally integrated carbon flux (~4-12 Pg C yr-1) when applied to climatological satellite-derived NPP and SST. Simulated NPP and SST input variables from a 4×CO2 climate model experiment further show that biome-specific scaling alters the predicted response of EP to simulated increases of atmospheric CO2. These results highlight the need to better understand distinct pathways of carbon export across unique ecological biomes and may help guide proposed efforts for in situ observations of the ocean carbon cycle.