Sample records for statistical model called

  1. Helping Students Develop Statistical Reasoning: Implementing a Statistical Reasoning Learning Environment

    ERIC Educational Resources Information Center

    Garfield, Joan; Ben-Zvi, Dani

    2009-01-01

    This article describes a model for an interactive, introductory secondary- or tertiary-level statistics course that is designed to develop students' statistical reasoning. This model is called a "Statistical Reasoning Learning Environment" and is built on the constructivist theory of learning.

  2. A comparative analysis of the statistical properties of large mobile phone calling networks.

    PubMed

    Li, Ming-Xia; Jiang, Zhi-Qiang; Xie, Wen-Jie; Miccichè, Salvatore; Tumminello, Michele; Zhou, Wei-Xing; Mantegna, Rosario N

    2014-05-30

    Mobile phone calling is one of the most widely used communication methods in modern society. The records of calls among mobile phone users provide us a valuable proxy for the understanding of human communication patterns embedded in social networks. Mobile phone users call each other forming a directed calling network. If only reciprocal calls are considered, we obtain an undirected mutual calling network. The preferential communication behavior between two connected users can be statistically tested and it results in two Bonferroni networks with statistically validated edges. We perform a comparative analysis of the statistical properties of these four networks, which are constructed from the calling records of more than nine million individuals in Shanghai over a period of 110 days. We find that these networks share many common structural properties and also exhibit idiosyncratic features when compared with previously studied large mobile calling networks. The empirical findings provide us an intriguing picture of a representative large social network that might shed new lights on the modelling of large social networks.

  3. Developing Statistical Knowledge for Teaching during Design-Based Research

    ERIC Educational Resources Information Center

    Groth, Randall E.

    2017-01-01

    Statistical knowledge for teaching is not precisely equivalent to statistics subject matter knowledge. Teachers must know how to make statistics understandable to others as well as understand the subject matter themselves. This dual demand on teachers calls for the development of viable teacher education models. This paper offers one such model,…

  4. Infinitely divisible cascades to model the statistics of natural images.

    PubMed

    Chainais, Pierre

    2007-12-01

    We propose to model the statistics of natural images thanks to the large class of stochastic processes called Infinitely Divisible Cascades (IDC). IDC were first introduced in one dimension to provide multifractal time series to model the so-called intermittency phenomenon in hydrodynamical turbulence. We have extended the definition of scalar infinitely divisible cascades from 1 to N dimensions and commented on the relevance of such a model in fully developed turbulence in [1]. In this article, we focus on the particular 2 dimensional case. IDC appear as good candidates to model the statistics of natural images. They share most of their usual properties and appear to be consistent with several independent theoretical and experimental approaches of the literature. We point out the interest of IDC for applications to procedural texture synthesis.

  5. Solar granulation and statistical crystallography: A modeling approach using size-shape relations

    NASA Technical Reports Server (NTRS)

    Noever, D. A.

    1994-01-01

    The irregular polygonal pattern of solar granulation is analyzed for size-shape relations using statistical crystallography. In contrast to previous work which has assumed perfectly hexagonal patterns for granulation, more realistic accounting of cell (granule) shapes reveals a broader basis for quantitative analysis. Several features emerge as noteworthy: (1) a linear correlation between number of cell-sides and neighboring shapes (called Aboav-Weaire's law); (2) a linear correlation between both average cell area and perimeter and the number of cell-sides (called Lewis's law and a perimeter law, respectively) and (3) a linear correlation between cell area and squared perimeter (called convolution index). This statistical picture of granulation is consistent with a finding of no correlation in cell shapes beyond nearest neighbors. A comparative calculation between existing model predictions taken from luminosity data and the present analysis shows substantial agreements for cell-size distributions. A model for understanding grain lifetimes is proposed which links convective times to cell shape using crystallographic results.

  6. Calculating phase equilibrium properties of plasma pseudopotential model using hybrid Gibbs statistical ensemble Monte-Carlo technique

    NASA Astrophysics Data System (ADS)

    Butlitsky, M. A.; Zelener, B. B.; Zelener, B. V.

    2015-11-01

    Earlier a two-component pseudopotential plasma model, which we called a “shelf Coulomb” model has been developed. A Monte-Carlo study of canonical NVT ensemble with periodic boundary conditions has been undertaken to calculate equations of state, pair distribution functions, internal energies and other thermodynamics properties of the model. In present work, an attempt is made to apply so-called hybrid Gibbs statistical ensemble Monte-Carlo technique to this model. First simulation results data show qualitatively similar results for critical point region for both methods. Gibbs ensemble technique let us to estimate the melting curve position and a triple point of the model (in reduced temperature and specific volume coordinates): T* ≈ 0.0476, v* ≈ 6 × 10-4.

  7. Probability and Statistics in Sensor Performance Modeling

    DTIC Science & Technology

    2010-12-01

    language software program is called Environmental Awareness for Sensor and Emitter Employment. Some important numerical issues in the implementation...3 Statistical analysis for measuring sensor performance...complementary cumulative distribution function cdf cumulative distribution function DST decision-support tool EASEE Environmental Awareness of

  8. Statistical physics of vehicular traffic and some related systems

    NASA Astrophysics Data System (ADS)

    Chowdhury, Debashish; Santen, Ludger; Schadschneider, Andreas

    2000-05-01

    In the so-called “microscopic” models of vehicular traffic, attention is paid explicitly to each individual vehicle each of which is represented by a “particle”; the nature of the “interactions” among these particles is determined by the way the vehicles influence each others’ movement. Therefore, vehicular traffic, modeled as a system of interacting “particles” driven far from equilibrium, offers the possibility to study various fundamental aspects of truly nonequilibrium systems which are of current interest in statistical physics. Analytical as well as numerical techniques of statistical physics are being used to study these models to understand rich variety of physical phenomena exhibited by vehicular traffic. Some of these phenomena, observed in vehicular traffic under different circumstances, include transitions from one dynamical phase to another, criticality and self-organized criticality, metastability and hysteresis, phase-segregation, etc. In this critical review, written from the perspective of statistical physics, we explain the guiding principles behind all the main theoretical approaches. But we present detailed discussions on the results obtained mainly from the so-called “particle-hopping” models, particularly emphasizing those which have been formulated in recent years using the language of cellular automata.

  9. Strongly magnetized classical plasma models

    NASA Technical Reports Server (NTRS)

    Montgomery, D. C.

    1972-01-01

    The class of plasma processes for which the so-called Vlasov approximation is inadequate is investigated. Results from the equilibrium statistical mechanics of two-dimensional plasmas are derived. These results are independent of the presence of an external dc magnetic field. The nonequilibrium statistical mechanics of the electrostatic guiding-center plasma, a two-dimensional plasma model, is discussed. This model is then generalized to three dimensions. The guiding-center model is relaxed to include finite Larmor radius effects for a two-dimensional plasma.

  10. Monte Carlo based statistical power analysis for mediation models: methods and software.

    PubMed

    Zhang, Zhiyong

    2014-12-01

    The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.

  11. Application of Hierarchy Theory to Cross-Scale Hydrologic Modeling of Nutrient Loads

    EPA Science Inventory

    We describe a model called Regional Hydrologic Modeling for Environmental Evaluation 16 (RHyME2) for quantifying annual nutrient loads in stream networks and watersheds. RHyME2 is 17 a cross-scale statistical and process-based water-quality model. The model ...

  12. FORSPAN Model Users Guide

    USGS Publications Warehouse

    Klett, T.R.; Charpentier, Ronald R.

    2003-01-01

    The USGS FORSPAN model is designed for the assessment of continuous accumulations of crude oil, natural gas, and natural gas liquids (collectively called petroleum). Continuous (also called ?unconventional?) accumulations have large spatial dimensions and lack well defined down-dip petroleum/water contacts. Oil and natural gas therefore are not localized by buoyancy in water in these accumulations. Continuous accumulations include ?tight gas reservoirs,? coalbed gas, oil and gas in shale, oil and gas in chalk, and shallow biogenic gas. The FORSPAN model treats a continuous accumulation as a collection of petroleumcontaining cells for assessment purposes. Each cell is capable of producing oil or gas, but the cells may vary significantly from one another in their production (and thus economic) characteristics. The potential additions to reserves from continuous petroleum resources are calculated by statistically combining probability distributions of the estimated number of untested cells having the potential for additions to reserves with the estimated volume of oil and natural gas that each of the untested cells may potentially produce (total recovery). One such statistical method for combination of number of cells with total recovery, used by the USGS, is called ACCESS.

  13. Factors Contributing to Academic Achievement: A Bayesian Structure Equation Modelling Study

    ERIC Educational Resources Information Center

    Payandeh Najafabadi, Amir T.; Najafabadi, Maryam Omidi; Farid-Rohani, Mohammad Reza

    2013-01-01

    In Iran, high school graduates enter university after taking a very difficult entrance exam called the Konkoor. Therefore, only the top-performing students are admitted by universities to continue their bachelor's education in statistics. Surprisingly, statistically, most of such students fall into the following categories: (1) do not succeed in…

  14. A Computer-Assisted Instruction in Teaching Abstract Statistics to Public Affairs Undergraduates

    ERIC Educational Resources Information Center

    Ozturk, Ali Osman

    2012-01-01

    This article attempts to demonstrate the applicability of a computer-assisted instruction supported with simulated data in teaching abstract statistical concepts to political science and public affairs students in an introductory research methods course. The software is called the Elaboration Model Computer Exercise (EMCE) in that it takes a great…

  15. Incorporating principal component analysis into air quality model evaluation

    EPA Science Inventory

    The efficacy of standard air quality model evaluation techniques is becoming compromised as the simulation periods continue to lengthen in response to ever increasing computing capacity. Accordingly, the purpose of this paper is to demonstrate a statistical approach called Princi...

  16. Efficiency Analysis of Public Universities in Thailand

    ERIC Educational Resources Information Center

    Kantabutra, Saranya; Tang, John C. S.

    2010-01-01

    This paper examines the performance of Thai public universities in terms of efficiency, using a non-parametric approach called data envelopment analysis. Two efficiency models, the teaching efficiency model and the research efficiency model, are developed and the analysis is conducted at the faculty level. Further statistical analyses are also…

  17. MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

    PubMed Central

    Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

    2015-01-01

    Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201

  18. Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study

    ERIC Educational Resources Information Center

    Cer, Daniel

    2011-01-01

    The goal of this dissertation is to determine the best way to train a statistical machine translation system. I first develop a state-of-the-art machine translation system called Phrasal and then use it to examine a wide variety of potential learning algorithms and optimization criteria and arrive at two very surprising results. First, despite the…

  19. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    NASA Astrophysics Data System (ADS)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  20. Dark energy models through nonextensive Tsallis' statistics

    NASA Astrophysics Data System (ADS)

    Barboza, Edésio M.; Nunes, Rafael da C.; Abreu, Everton M. C.; Ananias Neto, Jorge

    2015-10-01

    The accelerated expansion of the Universe is one of the greatest challenges of modern physics. One candidate to explain this phenomenon is a new field called dark energy. In this work we have used the Tsallis nonextensive statistical formulation of the Friedmann equation to explore the Barboza-Alcaniz and Chevalier-Polarski-Linder parametric dark energy models and the Wang-Meng and Dalal vacuum decay models. After that, we have discussed the observational tests and the constraints concerning the Tsallis nonextensive parameter. Finally, we have described the dark energy physics through the role of the q-parameter.

  1. How much to trust the senses: Likelihood learning

    PubMed Central

    Sato, Yoshiyuki; Kording, Konrad P.

    2014-01-01

    Our brain often needs to estimate unknown variables from imperfect information. Our knowledge about the statistical distributions of quantities in our environment (called priors) and currently available information from sensory inputs (called likelihood) are the basis of all Bayesian models of perception and action. While we know that priors are learned, most studies of prior-likelihood integration simply assume that subjects know about the likelihood. However, as the quality of sensory inputs change over time, we also need to learn about new likelihoods. Here, we show that human subjects readily learn the distribution of visual cues (likelihood function) in a way that can be predicted by models of statistically optimal learning. Using a likelihood that depended on color context, we found that a learned likelihood generalized to new priors. Thus, we conclude that subjects learn about likelihood. PMID:25398975

  2. A varying coefficient model to measure the effectiveness of mass media anti-smoking campaigns in generating calls to a Quitline.

    PubMed

    Bui, Quang M; Huggins, Richard M; Hwang, Wen-Han; White, Victoria; Erbas, Bircan

    2010-01-01

    Anti-smoking advertisements are an effective population-based smoking reduction strategy. The Quitline telephone service provides a first point of contact for adults considering quitting. Because of data complexity, the relationship between anti-smoking advertising placement, intensity, and time trends in total call volume is poorly understood. In this study we use a recently developed semi-varying coefficient model to elucidate this relationship. Semi-varying coefficient models comprise parametric and nonparametric components. The model is fitted to the daily number of calls to Quitline in Victoria, Australia to estimate a nonparametric long-term trend and parametric terms for day-of-the-week effects and to clarify the relationship with target audience rating points (TARPs) for the Quit and nicotine replacement advertising campaigns. The number of calls to Quitline increased with the TARP value of both the Quit and other smoking cessation advertisement; the TARP values associated with the Quit program were almost twice as effective. The varying coefficient term was statistically significant for peak periods with little or no advertising. Semi-varying coefficient models are useful for modeling public health data when there is little or no information on other factors related to the at-risk population. These models are well suited to modeling call volume to Quitline, because the varying coefficient allowed the underlying time trend to depend on fixed covariates that also vary with time, thereby explaining more of the variation in the call model.

  3. A Varying Coefficient Model to Measure the Effectiveness of Mass Media Anti-Smoking Campaigns in Generating Calls to a Quitline

    PubMed Central

    Bui, Quang M.; Huggins, Richard M.; Hwang, Wen-Han; White, Victoria; Erbas, Bircan

    2010-01-01

    Background Anti-smoking advertisements are an effective population-based smoking reduction strategy. The Quitline telephone service provides a first point of contact for adults considering quitting. Because of data complexity, the relationship between anti-smoking advertising placement, intensity, and time trends in total call volume is poorly understood. In this study we use a recently developed semi-varying coefficient model to elucidate this relationship. Methods Semi-varying coefficient models comprise parametric and nonparametric components. The model is fitted to the daily number of calls to Quitline in Victoria, Australia to estimate a nonparametric long-term trend and parametric terms for day-of-the-week effects and to clarify the relationship with target audience rating points (TARPs) for the Quit and nicotine replacement advertising campaigns. Results The number of calls to Quitline increased with the TARP value of both the Quit and other smoking cessation advertisement; the TARP values associated with the Quit program were almost twice as effective. The varying coefficient term was statistically significant for peak periods with little or no advertising. Conclusions Semi-varying coefficient models are useful for modeling public health data when there is little or no information on other factors related to the at-risk population. These models are well suited to modeling call volume to Quitline, because the varying coefficient allowed the underlying time trend to depend on fixed covariates that also vary with time, thereby explaining more of the variation in the call model. PMID:20827036

  4. A call to improve methods for estimating tree biomass for regional and national assessments

    Treesearch

    Aaron R. Weiskittel; David W. MacFarlane; Philip J. Radtke; David L.R. Affleck; Hailemariam Temesgen; Christopher W. Woodall; James A. Westfall; John W. Coulston

    2015-01-01

    Tree biomass is typically estimated using statistical models. This review highlights five limitations of most tree biomass models, which include the following: (1) biomass data are costly to collect and alternative sampling methods are used; (2) belowground data and models are generally lacking; (3) models are often developed from small and geographically limited data...

  5. Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

    PubMed

    Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

    2014-07-01

    The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.

  6. Development of the statistical ARIMA model: an application for predicting the upcoming of MJO index

    NASA Astrophysics Data System (ADS)

    Hermawan, Eddy; Nurani Ruchjana, Budi; Setiawan Abdullah, Atje; Gede Nyoman Mindra Jaya, I.; Berliana Sipayung, Sinta; Rustiana, Shailla

    2017-10-01

    This study is mainly concerned in development one of the most important equatorial atmospheric phenomena that we call as the Madden Julian Oscillation (MJO) which having strong impacts to the extreme rainfall anomalies over the Indonesian Maritime Continent (IMC). In this study, we focused to the big floods over Jakarta and surrounded area that suspecting caused by the impacts of MJO. We concentrated to develop the MJO index using the statistical model that we call as Box-Jenkis (ARIMA) ini 1996, 2002, and 2007, respectively. They are the RMM (Real Multivariate MJO) index as represented by RMM1 and RMM2, respectively. There are some steps to develop that model, starting from identification of data, estimated, determined model, before finally we applied that model for investigation some big floods that occurred at Jakarta in 1996, 2002, and 2007 respectively. We found the best of estimated model for the RMM1 and RMM2 prediction is ARIMA (2,1,2). Detailed steps how that model can be extracted and applying to predict the rainfall anomalies over Jakarta for 3 to 6 months later is discussed at this paper.

  7. New statistical scission-point model to predict fission fragment observables

    NASA Astrophysics Data System (ADS)

    Lemaître, Jean-François; Panebianco, Stefano; Sida, Jean-Luc; Hilaire, Stéphane; Heinrich, Sophie

    2015-09-01

    The development of high performance computing facilities makes possible a massive production of nuclear data in a full microscopic framework. Taking advantage of the individual potential calculations of more than 7000 nuclei, a new statistical scission-point model, called SPY, has been developed. It gives access to the absolute available energy at the scission point, which allows the use of a parameter-free microcanonical statistical description to calculate the distributions and the mean values of all fission observables. SPY uses the richness of microscopy in a rather simple theoretical framework, without any parameter except the scission-point definition, to draw clear answers based on perfect knowledge of the ingredients involved in the model, with very limited computing cost.

  8. An Item Fit Statistic Based on Pseudocounts from the Generalized Graded Unfolding Model: A Preliminary Report.

    ERIC Educational Resources Information Center

    Roberts, James S.

    Stone and colleagues (C. Stone, R. Ankenman, S. Lane, and M. Liu, 1993; C. Stone, R. Mislevy and J. Mazzeo, 1994; C. Stone, 2000) have proposed a fit index that explicitly accounts for the measurement error inherent in an estimated theta value, here called chi squared superscript 2, subscript i*. The elements of this statistic are natural…

  9. Guessing and the Rasch Model

    ERIC Educational Resources Information Center

    Holster, Trevor A.; Lake, J.

    2016-01-01

    Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…

  10. Shilling Attacks Detection in Recommender Systems Based on Target Item Analysis

    PubMed Central

    Zhou, Wei; Wen, Junhao; Koh, Yun Sing; Xiong, Qingyu; Gao, Min; Dobbie, Gillian; Alam, Shafiq

    2015-01-01

    Recommender systems are highly vulnerable to shilling attacks, both by individuals and groups. Attackers who introduce biased ratings in order to affect recommendations, have been shown to negatively affect collaborative filtering (CF) algorithms. Previous research focuses only on the differences between genuine profiles and attack profiles, ignoring the group characteristics in attack profiles. In this paper, we study the use of statistical metrics to detect rating patterns of attackers and group characteristics in attack profiles. Another question is that most existing detecting methods are model specific. Two metrics, Rating Deviation from Mean Agreement (RDMA) and Degree of Similarity with Top Neighbors (DegSim), are used for analyzing rating patterns between malicious profiles and genuine profiles in attack models. Building upon this, we also propose and evaluate a detection structure called RD-TIA for detecting shilling attacks in recommender systems using a statistical approach. In order to detect more complicated attack models, we propose a novel metric called DegSim’ based on DegSim. The experimental results show that our detection model based on target item analysis is an effective approach for detecting shilling attacks. PMID:26222882

  11. Modeling Antimicrobial Activity of Clorox(R) Using an Agar-Diffusion Test: A New Twist On an Old Experiment.

    ERIC Educational Resources Information Center

    Mitchell, James K.; Carter, William E.

    2000-01-01

    Describes using a computer statistical software package called Minitab to model the sensitivity of several microbes to the disinfectant NaOCl (Clorox') using the Kirby-Bauer technique. Each group of students collects data from one microbe, conducts regression analyses, then chooses the best-fit model based on the highest r-values obtained.…

  12. Recall of past use of mobile phone handsets.

    PubMed

    Parslow, R C; Hepworth, S J; McKinney, P A

    2003-01-01

    Previous studies investigating health effects of mobile phones have based their estimation of exposure on self-reported levels of phone use. This UK validation study assesses the accuracy of reported voice calls made from mobile handsets. Data collected by postal questionnaire from 93 volunteers was compared to records obtained prospectively over 6 months from four network operators. Agreement was measured for outgoing calls using the kappa statistic, log-linear modelling, Spearman correlation coefficient and graphical methods. Agreement for number of calls gained moderate classification (kappa = 0.39) with better agreement for duration (kappa = 0.50). Log-linear modelling produced similar results. The Spearman correlation coefficient was 0.48 for number of calls and 0.60 for duration. Graphical agreement methods demonstrated patterns of over-reporting call numbers (by a factor of 1.7) and duration (by a factor of 2.8). These results suggest that self-reported mobile phone use may not fully represent patterns of actual use. This has implications for calculating exposures from questionnaire data.

  13. Landau's statistical mechanics for quasi-particle models

    NASA Astrophysics Data System (ADS)

    Bannur, Vishnu M.

    2014-04-01

    Landau's formalism of statistical mechanics [following L. D. Landau and E. M. Lifshitz, Statistical Physics (Pergamon Press, Oxford, 1980)] is applied to the quasi-particle model of quark-gluon plasma. Here, one starts from the expression for pressure and develop all thermodynamics. It is a general formalism and consistent with our earlier studies [V. M. Bannur, Phys. Lett. B647, 271 (2007)] based on Pathria's formalism [following R. K. Pathria, Statistical Mechanics (Butterworth-Heinemann, Oxford, 1977)]. In Pathria's formalism, one starts from the expression for energy density and develop thermodynamics. Both the formalisms are consistent with thermodynamics and statistical mechanics. Under certain conditions, which are wrongly called thermodynamic consistent relation, we recover other formalism of quasi-particle system, like in M. I. Gorenstein and S. N. Yang, Phys. Rev. D52, 5206 (1995), widely studied in quark-gluon plasma.

  14. Spatial Statistical Models and Optimal Survey Design for Rapid Geophysical characterization of UXO Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    G. Ostrouchov; W.E.Doll; D.A.Wolf

    2003-07-01

    Unexploded ordnance(UXO)surveys encompass large areas, and the cost of surveying these areas can be high. Enactment of earlier protocols for sampling UXO sites have shown the shortcomings of these procedures and led to a call for development of scientifically defensible statistical procedures for survey design and analysis. This project is one of three funded by SERDP to address this need.

  15. Exploratory analysis of real personal emergency response call conversations: considerations for personal emergency response spoken dialogue systems.

    PubMed

    Young, Victoria; Rochon, Elizabeth; Mihailidis, Alex

    2016-11-14

    The purpose of this study was to derive data from real, recorded, personal emergency response call conversations to help improve the artificial intelligence and decision making capability of a spoken dialogue system in a smart personal emergency response system. The main study objectives were to: develop a model of personal emergency response; determine categories for the model's features; identify and calculate measures from call conversations (verbal ability, conversational structure, timing); and examine conversational patterns and relationships between measures and model features applicable for improving the system's ability to automatically identify call model categories and predict a target response. This study was exploratory and used mixed methods. Personal emergency response calls were pre-classified according to call model categories identified qualitatively from response call transcripts. The relationships between six verbal ability measures, three conversational structure measures, two timing measures and three independent factors: caller type, risk level, and speaker type, were examined statistically. Emergency medical response services were the preferred response for the majority of medium and high risk calls for both caller types. Older adult callers mainly requested non-emergency medical service responders during medium risk situations. By measuring the number of spoken words-per-minute and turn-length-in-words for the first spoken utterance of a call, older adult and care provider callers could be identified with moderate accuracy. Average call taker response time was calculated using the number-of-speaker-turns and time-in-seconds measures. Care providers and older adults used different conversational strategies when responding to call takers. The words 'ambulance' and 'paramedic' may hold different latent connotations for different callers. The data derived from the real personal emergency response recordings may help a spoken dialogue system classify incoming calls by caller type with moderate probability shortly after the initial caller utterance. Knowing the caller type, the target response for the call may be predicted with some degree of probability and the output dialogue could be tailored to this caller type. The average call taker response time measured from real calls may be used to limit the conversation length in a spoken dialogue system before defaulting to a live call taker.

  16. 'Chain pooling' model selection as developed for the statistical analysis of a rotor burst protection experiment

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1977-01-01

    A statistical decision procedure called chain pooling had been developed for model selection in fitting the results of a two-level fixed-effects full or fractional factorial experiment not having replication. The basic strategy included the use of one nominal level of significance for a preliminary test and a second nominal level of significance for the final test. The subject has been reexamined from the point of view of using as many as three successive statistical model deletion procedures in fitting the results of a single experiment. The investigation consisted of random number studies intended to simulate the results of a proposed aircraft turbine-engine rotor-burst-protection experiment. As a conservative approach, population model coefficients were chosen to represent a saturated 2 to the 4th power experiment with a distribution of parameter values unfavorable to the decision procedures. Three model selection strategies were developed.

  17. Semantic Importance Sampling for Statistical Model Checking

    DTIC Science & Technology

    2015-01-16

    SMT calls while maintaining correctness. Finally, we implement SIS in a tool called osmosis and use it to verify a number of stochastic systems with...2 surveys related work. Section 3 presents background definitions and concepts. Section 4 presents SIS, and Section 5 presents our tool osmosis . In...which I∗M|=Φ(x) = 1. We do this by first randomly selecting a cube c from C∗ with uniform probability since each cube has equal probability 9 5. OSMOSIS

  18. CHALLENGES IN CONSTRUCTING STATISTICALLY-BASED SAR MODELS FOR DEVELOPMENTAL TOXICITY

    EPA Science Inventory

    Regulatory agencies are increasingly called upon to review large numbers of environmental contaminants that have not been characterized for their potential to pose a health risk. Additionally, there is special interest in protecting potentially sensitive subpopulations and identi...

  19. Pseudo-Boltzmann model for modeling the junctionless transistors

    NASA Astrophysics Data System (ADS)

    Avila-Herrera, F.; Cerdeira, A.; Roldan, J. B.; Sánchez-Moreno, P.; Tienda-Luna, I. M.; Iñiguez, B.

    2014-05-01

    Calculation of the carrier concentrations in semiconductors using the Fermi-Dirac integral requires complex numerical calculations; in this context, practically all analytical device models are based on Boltzmann statistics, even though it is known that it leads to an over-estimation of carriers densities for high doping concentrations. In this paper, a new approximation to Fermi-Dirac integral, called Pseudo-Boltzmann model, is presented for modeling junctionless transistors with high doping concentrations.

  20. A Comparison of Four Estimators of a Population Measure of Model Fit in Covariance Structure Analysis

    ERIC Educational Resources Information Center

    Zhang, Wei

    2008-01-01

    A major issue in the utilization of covariance structure analysis is model fit evaluation. Recent years have witnessed increasing interest in various test statistics and so-called fit indexes, most of which are actually based on or closely related to F[subscript 0], a measure of model fit in the population. This study aims to provide a systematic…

  1. Shift-Invariant Image Reconstruction of Speckle-Degraded Images Using Bispectrum Estimation

    DTIC Science & Technology

    1990-05-01

    process with the requisite negative exponential pelf. I call this model the Negative Exponential Model ( NENI ). The NENI flowchart is seen in Figure 6...Figure ]3d-g. Statistical Histograms and Phase for the RPj NG EXP FDF MULT METHOD FILuteC 14a. Truth Object Speckled Via the NENI HISTOGRAM OF SPECKLE

  2. Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization

    ERIC Educational Resources Information Center

    Gelman, Andrew; Lee, Daniel; Guo, Jiqiang

    2015-01-01

    Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users' and developers'…

  3. Statistical mechanics of protein structural transitions: Insights from the island model

    PubMed Central

    Kobayashi, Yukio

    2016-01-01

    The so-called island model of protein structural transition holds that hydrophobic interactions are the key to both the folding and function of proteins. Herein, the genesis and statistical mechanical basis of the island model of transitions are reviewed, by presenting the results of simulations of such transitions. Elucidating the physicochemical mechanism of protein structural formation is the foundation for understanding the hierarchical structure of life at the microscopic level. Based on the results obtained to date using the island model, remaining problems and future work in the field of protein structures are discussed, referencing Professor Saitô’s views on the hierarchic structure of science. PMID:28409078

  4. Examining the Process of Responding to Circumplex Scales of Interpersonal Values Items: Should Ideal Point Scoring Methods Be Considered?

    PubMed

    Ling, Ying; Zhang, Minqiang; Locke, Kenneth D; Li, Guangming; Li, Zonglong

    2016-01-01

    The Circumplex Scales of Interpersonal Values (CSIV) is a 64-item self-report measure of goals from each octant of the interpersonal circumplex. We used item response theory methods to compare whether dominance models or ideal point models best described how people respond to CSIV items. Specifically, we fit a polytomous dominance model called the generalized partial credit model and an ideal point model of similar complexity called the generalized graded unfolding model to the responses of 1,893 college students. The results of both graphical comparisons of item characteristic curves and statistical comparisons of model fit suggested that an ideal point model best describes the process of responding to CSIV items. The different models produced different rank orderings of high-scoring respondents, but overall the models did not differ in their prediction of criterion variables (agentic and communal interpersonal traits and implicit motives).

  5. Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data

    PubMed Central

    Colegrave, Nick

    2017-01-01

    A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure. PMID:28330912

  6. Probabilistic Evaluation of Competing Climate Models

    NASA Astrophysics Data System (ADS)

    Braverman, A. J.; Chatterjee, S.; Heyman, M.; Cressie, N.

    2017-12-01

    A standard paradigm for assessing the quality of climate model simulations is to compare what these models produce for past and present time periods, to observations of the past and present. Many of these comparisons are based on simple summary statistics called metrics. Here, we propose an alternative: evaluation of competing climate models through probabilities derived from tests of the hypothesis that climate-model-simulated and observed time sequences share common climate-scale signals. The probabilities are based on the behavior of summary statistics of climate model output and observational data, over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise sequences. The statistics we choose come from working in the space of decorrelated and dimension-reduced wavelet coefficients. We compare monthly sequences of CMIP5 model output of average global near-surface temperature anomalies to similar sequences obtained from the well-known HadCRUT4 data set, as an illustration.

  7. Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2012-01-01

    Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950

  8. “Plateau”-related summary statistics are uninformative for comparing working memory models

    PubMed Central

    van den Berg, Ronald; Ma, Wei Ji

    2014-01-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon. Zhang and Luck (2008) and Anderson, Vogel, and Awh (2011) noticed that as more items need to be remembered, “memory noise” seems to first increase and then reach a “stable plateau.” They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided, at most, 0.15% of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99% correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. At realistic numbers of trials, plateau-related summary statistics are completely unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (2011), we found that the evidence in the summary statistics was, at most, 0.12% of the evidence in the raw data and far too weak to warrant any conclusions. These findings call into question claims about working memory that are based on summary statistics. PMID:24719235

  9. Financial Stylized Facts in the Word of Mouth Model

    NASA Astrophysics Data System (ADS)

    Misawa, Tadanobu; Watanabe, Kyoko; Shimokawa, Tetsuya

    Recently, we proposed an agent-based model called the word of mouth model to analyze the influence of an information transmission process to price formation in financial markets. Especially, the short-term predictability of asset return was focused on and an explanation in the view of information transmission was provided to the question why the predictability was much clearly observed in the small-sized stocks. This paper, to extend the previous study, demonstrates that the word of mouth model also has a consistency with other important financial stylized facts. This strengthens the possibility that the information transmission among investors plays a crucial role in price formation. Concretely, this paper addresses two famous statistical features of returns; the leptokurtic distribution of return and the autocorrelation of return volatility. The reasons why these statistical facts receive especial attentions of researchers among financial stylized facts are their statistical robustness and practical importance, such as the applications to the derivative pricing problems.

  10. Entropy of dynamical social networks

    NASA Astrophysics Data System (ADS)

    Zhao, Kun; Karsai, Marton; Bianconi, Ginestra

    2012-02-01

    Dynamical social networks are evolving rapidly and are highly adaptive. Characterizing the information encoded in social networks is essential to gain insight into the structure, evolution, adaptability and dynamics. Recently entropy measures have been used to quantify the information in email correspondence, static networks and mobility patterns. Nevertheless, we still lack methods to quantify the information encoded in time-varying dynamical social networks. In this talk we present a model to quantify the entropy of dynamical social networks and use this model to analyze the data of phone-call communication. We show evidence that the entropy of the phone-call interaction network changes according to circadian rhythms. Moreover we show that social networks are extremely adaptive and are modified by the use of technologies such as mobile phone communication. Indeed the statistics of duration of phone-call is described by a Weibull distribution and is significantly different from the distribution of duration of face-to-face interactions in a conference. Finally we investigate how much the entropy of dynamical social networks changes in realistic models of phone-call or face-to face interactions characterizing in this way different type human social behavior.

  11. Linking customisation of ERP systems to support effort: an empirical study

    NASA Astrophysics Data System (ADS)

    Koch, Stefan; Mitteregger, Kurt

    2016-01-01

    The amount of customisation to an enterprise resource planning (ERP) system has always been a major concern in the context of the implementation. This article focuses on the phase of maintenance and presents an empirical study about the relationship between the amount of customising and the resulting support effort. We establish a structural equation modelling model that explains support effort using customisation effort, organisational characteristics and scope of implementation. The findings using data from an ERP provider show that there is a statistically significant effect: with an increasing amount of customisation, the quantity of telephone calls to support increases, as well as the duration of each call.

  12. Improved Statistical Model Of 10.7-cm Solar Radiation

    NASA Technical Reports Server (NTRS)

    Vedder, John D.; Tabor, Jill L.

    1993-01-01

    Improved mathematical model simulates short-term fluctuations of flux of 10.7-cm-wavelength solar radiation during 91-day averaging period. Called "F10.7 flux", important as measure of solar activity and because it is highly correlated with ultraviolet radiation causing fluctuations in heating and density of upper atmosphere. F10.7 flux easily measureable at surface of Earth.

  13. POOLMS: A computer program for fitting and model selection for two level factorial replication-free experiments

    NASA Technical Reports Server (NTRS)

    Amling, G. E.; Holms, A. G.

    1973-01-01

    A computer program is described that performs a statistical multiple-decision procedure called chain pooling. It uses a number of mean squares assigned to error variance that is conditioned on the relative magnitudes of the mean squares. The model selection is done according to user-specified levels of type 1 or type 2 error probabilities.

  14. High call volume at poison control centers: identification and implications for communication

    PubMed Central

    CARAVATI, E. M.; LATIMER, S.; REBLIN, M.; BENNETT, H. K. W.; CUMMINS, M. R.; CROUCH, B. I.; ELLINGTON, L.

    2016-01-01

    Context High volume surges in health care are uncommon and unpredictable events. Their impact on health system performance and capacity is difficult to study. Objectives To identify time periods that exhibited very busy conditions at a poison control center and to determine whether cases and communication during high volume call periods are different from cases during low volume periods. Methods Call data from a US poison control center over twelve consecutive months was collected via a call logger and an electronic case database (Toxicall®). Variables evaluated for high call volume conditions were: (1) call duration; (2) number of cases; and (3) number of calls per staff member per 30 minute period. Statistical analyses identified peak periods as busier than 99% of all other 30 minute time periods and low volume periods as slower than 70% of all other 30 minute periods. Case and communication characteristics of high volume and low volume calls were compared using logistic regression. Results A total of 65,364 incoming calls occurred over 12 months. One hundred high call volume and 4885 low call volume 30 minute periods were identified. High volume periods were more common between 1500 and 2300 hours and during the winter months. Coded verbal communication data were evaluated for 42 high volume and 296 low volume calls. The mean (standard deviation) call length of these calls during high volume and low volume periods was 3 minutes 27 seconds (1 minute 46 seconds) and 3 minutes 57 seconds (2 minutes 11 seconds), respectively. Regression analyses revealed a trend for fewer overall verbal statements and fewer staff questions during peak periods, but no other significant differences for staff-caller communication behaviors were found. Conclusion Peak activity for poison center call volume can be identified by statistical modeling. Calls during high volume periods were similar to low volume calls. Communication was more concise yet staff was able to maintain a good rapport with callers during busy call periods. This approach allows evaluation of poison exposure call characteristics and communication during high volume periods. PMID:22889059

  15. High call volume at poison control centers: identification and implications for communication.

    PubMed

    Caravati, E M; Latimer, S; Reblin, M; Bennett, H K W; Cummins, M R; Crouch, B I; Ellington, L

    2012-09-01

    High volume surges in health care are uncommon and unpredictable events. Their impact on health system performance and capacity is difficult to study. To identify time periods that exhibited very busy conditions at a poison control center and to determine whether cases and communication during high volume call periods are different from cases during low volume periods. Call data from a US poison control center over twelve consecutive months was collected via a call logger and an electronic case database (Toxicall®).Variables evaluated for high call volume conditions were: (1) call duration; (2) number of cases; and (3) number of calls per staff member per 30 minute period. Statistical analyses identified peak periods as busier than 99% of all other 30 minute time periods and low volume periods as slower than 70% of all other 30 minute periods. Case and communication characteristics of high volume and low volume calls were compared using logistic regression. A total of 65,364 incoming calls occurred over 12 months. One hundred high call volume and 4885 low call volume 30 minute periods were identified. High volume periods were more common between 1500 and 2300 hours and during the winter months. Coded verbal communication data were evaluated for 42 high volume and 296 low volume calls. The mean (standard deviation) call length of these calls during high volume and low volume periods was 3 minutes 27 seconds (1 minute 46 seconds) and 3 minutes 57 seconds (2 minutes 11 seconds), respectively. Regression analyses revealed a trend for fewer overall verbal statements and fewer staff questions during peak periods, but no other significant differences for staff-caller communication behaviors were found. Peak activity for poison center call volume can be identified by statistical modeling. Calls during high volume periods were similar to low volume calls. Communication was more concise yet staff was able to maintain a good rapport with callers during busy call periods. This approach allows evaluation of poison exposure call characteristics and communication during high volume periods.

  16. Photolysis rates in correlated overlapping cloud fields: Cloud-J 7.3

    DOE PAGES

    Prather, M. J.

    2015-05-27

    A new approach for modeling photolysis rates ( J values) in atmospheres with fractional cloud cover has been developed and implemented as Cloud-J – a multi-scattering eight-stream radiative transfer model for solar radiation based on Fast-J. Using observed statistics for the vertical correlation of cloud layers, Cloud-J 7.3 provides a practical and accurate method for modeling atmospheric chemistry. The combination of the new maximum-correlated cloud groups with the integration over all cloud combinations represented by four quadrature atmospheres produces mean J values in an atmospheric column with root-mean-square errors of 4% or less compared with 10–20% errors using simpler approximations.more » Cloud-J is practical for chemistry-climate models, requiring only an average of 2.8 Fast-J calls per atmosphere, vs. hundreds of calls with the correlated cloud groups, or 1 call with the simplest cloud approximations. Another improvement in modeling J values, the treatment of volatile organic compounds with pressure-dependent cross sections is also incorporated into Cloud-J.« less

  17. An empirical analysis of the corporate call decision

    NASA Astrophysics Data System (ADS)

    Carlson, Murray Dean

    1998-12-01

    In this thesis we provide insights into the behavior of financial managers of utility companies by studying their decisions to redeem callable preferred shares. In particular, we investigate whether or not an option pricing based model of the call decision, with managers who maximize shareholder value, does a better job of explaining callable preferred share prices and call decisions than do other models of the decision. In order to perform these tests, we extend an empirical technique introduced by Rust (1987) to include the use of information from preferred share prices in addition to the call decisions. The model we develop to value the option embedded in a callable preferred share differs from standard models in two ways. First, as suggested in Kraus (1983), we explicitly account for transaction costs associated with a redemption. Second, we account for state variables that are observed by the decision makers but not by the preferred shareholders. We interpret these unobservable state variables as the benefits and costs associated with a change in capital structure that can accompany a call decision. When we add this variable, our empirical model changes from one which predicts exactly when a share should be called to one which predicts the probability of a call as the function of the observable state. These two modifications of the standard model result in predictions of calls, and therefore of callable preferred share prices, that are consistent with several previously unexplained features of the data; we show that the predictive power of the model is improved in a statistical sense by adding these features to the model. The pricing and call probability functions from our model do a good job of describing call decisions and preferred share prices for several utilities. Using data from shares of the Pacific Gas and Electric Co. (PGE) we obtain reasonable estimates for the transaction costs associated with a call. Using a formal empirical test, we are able to conclude that the managers of the Pacific Gas and Electric Company clearly take into account the value of the option to delay the call when making their call decisions. Overall, the model seems to be robust to tests of its specification and does a better job of describing the data than do simpler models of the decision making process. Limitations in the data do not allow us to perform the same tests in a larger cross-section of utility companies. However, we are able to estimate transaction cost parameters for many firms and these do not seem to vary significantly from those of PGE. This evidence does not cause us to reject our hypothesis that managerial behavior is consistent with a model in which managers maximize shareholder value.

  18. Climate Verification Using Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A robust method previously used to detect observed intra- to multi-decadal (IMD) climate regimes was adapted to test whether climate models could reproduce IMD variations in U.S. surface temperatures during 1919-2008. This procedure, called the running Mann Whitney Z (MWZ) method, samples data ranki...

  19. "Plateau"-related summary statistics are uninformative for comparing working memory models.

    PubMed

    van den Berg, Ronald; Ma, Wei Ji

    2014-10-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon (Ma, Husain, Bays (Nature Neuroscience 17, 347-356, 2014). Zhang and Luck (Nature 453, (7192), 233-235, 2008) and Anderson, Vogel, and Awh (Attention, Perception, Psychophys 74, (5), 891-910, 2011) noticed that as more items need to be remembered, "memory noise" seems to first increase and then reach a "stable plateau." They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided at most 0.15 % of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99 % correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. Therefore, at realistic numbers of trials, plateau-related summary statistics are highly unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (Attention, Perception, Psychophys 74, (5), 891-910, 2011), we found that the evidence in the summary statistics was at most 0.12 % of the evidence in the raw data and far too weak to warrant any conclusions. The evidence in the raw data, in fact, strongly favored the slotless model. These findings call into question claims about working memory that are based on summary statistics.

  20. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  1. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  2. The mediating effect of calling on the relationship between medical school students' academic burnout and empathy.

    PubMed

    Chae, Su Jin; Jeong, So Mi; Chung, Yoon-Sok

    2017-09-01

    This study is aimed at identifying the relationships between medical school students' academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students' empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. This result demonstrates that calling is a key variable that mediates the relationship between medical students' academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students' empathy skills.

  3. Empirical comparison study of approximate methods for structure selection in binary graphical models.

    PubMed

    Viallon, Vivian; Banerjee, Onureena; Jougla, Eric; Rey, Grégoire; Coste, Joel

    2014-03-01

    Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Phylogenetic signal in the acoustic parameters of the advertisement calls of four clades of anurans.

    PubMed

    Gingras, Bruno; Mohandesan, Elmira; Boko, Drasko; Fitch, W Tecumseh

    2013-07-01

    Anuran vocalizations, especially their advertisement calls, are largely species-specific and can be used to identify taxonomic affiliations. Because anurans are not vocal learners, their vocalizations are generally assumed to have a strong genetic component. This suggests that the degree of similarity between advertisement calls may be related to large-scale phylogenetic relationships. To test this hypothesis, advertisement calls from 90 species belonging to four large clades (Bufo, Hylinae, Leptodactylus, and Rana) were analyzed. Phylogenetic distances were estimated based on the DNA sequences of the 12S mitochondrial ribosomal RNA gene, and, for a subset of 49 species, on the rhodopsin gene. Mean values for five acoustic parameters (coefficient of variation of root-mean-square amplitude, dominant frequency, spectral flux, spectral irregularity, and spectral flatness) were computed for each species. We then tested for phylogenetic signal on the body-size-corrected residuals of these five parameters, using three statistical tests (Moran's I, Mantel, and Blomberg's K) and three models of genetic distance (pairwise distances, Abouheif's proximities, and the variance-covariance matrix derived from the phylogenetic tree). A significant phylogenetic signal was detected for most acoustic parameters on the 12S dataset, across statistical tests and genetic distance models, both for the entire sample of 90 species and within clades in several cases. A further analysis on a subset of 49 species using genetic distances derived from rhodopsin and from 12S broadly confirmed the results obtained on the larger sample, indicating that the phylogenetic signals observed in these acoustic parameters can be detected using a variety of genetic distance models derived either from a variable mitochondrial sequence or from a conserved nuclear gene. We found a robust relationship, in a large number of species, between anuran phylogenetic relatedness and acoustic similarity in the advertisement calls in a taxon with no evidence for vocal learning, even after correcting for the effect of body size. This finding, covering a broad sample of species whose vocalizations are fairly diverse, indicates that the intense selection on certain call characteristics observed in many anurans does not eliminate all acoustic indicators of relatedness. Our approach could potentially be applied to other vocal taxa.

  5. Toda hierarchies and their applications

    NASA Astrophysics Data System (ADS)

    Takasaki, Kanehisa

    2018-05-01

    The 2D Toda hierarchy occupies a central position in the family of integrable hierarchies of the Toda type. The 1D Toda hierarchy and the Ablowitz–Ladik (aka relativistic Toda) hierarchy can be derived from the 2D Toda hierarchy as reductions. These integrable hierarchies have been applied to various problems of mathematics and mathematical physics since 1990s. A recent example is a series of studies on models of statistical mechanics called the melting crystal model. This research has revealed that the aforementioned two reductions of the 2D Toda hierarchy underlie two different melting crystal models. Technical clues are a fermionic realization of the quantum torus algebra, special algebraic relations therein called shift symmetries, and a matrix factorization problem. The two melting crystal models thus exhibit remarkable similarity with the Hermitian and unitary matrix models for which the two reductions of the 2D Toda hierarchy play the role of fundamental integrable structures.

  6. Site specific probability of passive acoustic detection of humpback whale calls from single fixed hydrophones.

    PubMed

    Helble, Tyler A; D'Spain, Gerald L; Hildebrand, John A; Campbell, Gregory S; Campbell, Richard L; Heaney, Kevin D

    2013-09-01

    Passive acoustic monitoring of marine mammal calls is an increasingly important method for assessing population numbers, distribution, and behavior. A common mistake in the analysis of marine mammal acoustic data is formulating conclusions about these animals without first understanding how environmental properties such as bathymetry, sediment properties, water column sound speed, and ocean acoustic noise influence the detection and character of vocalizations in the acoustic data. The approach in this paper is to use Monte Carlo simulations with a full wave field acoustic propagation model to characterize the site specific probability of detection of six types of humpback whale calls at three passive acoustic monitoring locations off the California coast. Results show that the probability of detection can vary by factors greater than ten when comparing detections across locations, or comparing detections at the same location over time, due to environmental effects. Effects of uncertainties in the inputs to the propagation model are also quantified, and the model accuracy is assessed by comparing calling statistics amassed from 24,690 humpback units recorded in the month of October 2008. Under certain conditions, the probability of detection can be estimated with uncertainties sufficiently small to allow for accurate density estimates.

  7. Statistical Analysis of Complexity Generators for Cost Estimation

    NASA Technical Reports Server (NTRS)

    Rowell, Ginger Holmes

    1999-01-01

    Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.

  8. Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model.

    PubMed

    Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A

    2011-10-01

    Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.

  9. Modeling of adsorption isotherms of water vapor on Tunisian olive leaves using statistical mechanical formulation

    NASA Astrophysics Data System (ADS)

    Knani, S.; Aouaini, F.; Bahloul, N.; Khalfaoui, M.; Hachicha, M. A.; Ben Lamine, A.; Kechaou, N.

    2014-04-01

    Analytical expression for modeling water adsorption isotherms of food or agricultural products is developed using the statistical mechanics formalism. The model developed in this paper is further used to fit and interpret the isotherms of four varieties of Tunisian olive leaves called “Chemlali, Chemchali, Chetoui and Zarrazi”. The parameters involved in the model such as the number of adsorbed water molecules per site, n, the receptor sites density, NM, and the energetic parameters, a1 and a2, were determined by fitting the experimental adsorption isotherms at temperatures ranging from 303 to 323 K. We interpret the results of fitting. After that, the model is further applied to calculate thermodynamic functions which govern the adsorption mechanism such as entropy, the free enthalpy of Gibbs and the internal energy.

  10. In silico segmentations of lentivirus envelope sequences

    PubMed Central

    Boissin-Quillon, Aurélia; Piau, Didier; Leroux, Caroline

    2007-01-01

    Background The gene encoding the envelope of lentiviruses exhibits a considerable plasticity, particularly the region which encodes the surface (SU) glycoprotein. Interestingly, mutations do not appear uniformly along the sequence of SU, but they are clustered in restricted areas, called variable (V) regions, which are interspersed with relatively more stable regions, called constant (C) regions. We look for specific signatures of C/V regions, using hidden Markov models constructed with SU sequences of the equine, human, small ruminant and simian lentiviruses. Results Our models yield clear and accurate delimitations of the C/V regions, when the test set and the training set were made up of sequences of the same lentivirus, but also when they were made up of sequences of different lentiviruses. Interestingly, the models predicted the different regions of lentiviruses such as the bovine and feline lentiviruses, not used in the training set. Models based on composite training sets produce accurate segmentations of sequences of all these lentiviruses. Conclusion Our results suggest that each C/V region has a specific statistical oligonucleotide composition, and that the C (respectively V) regions of one of these lentiviruses are statistically more similar to the C (respectively V) regions of the other lentiviruses, than to the V (respectively C) regions of the same lentivirus. PMID:17376229

  11. Monetary policy and the effects of oil price shocks on the Japanese economy

    NASA Astrophysics Data System (ADS)

    Lee, Byung Rhae

    1998-12-01

    The evidence of output decreases and price level increases following oil price shocks in the Japanese economy is presented in this paper. These negative effects of oil shocks are better explained by Hamilton's (1996) net oil price increase measure (NOPI) than by other oil measures. The fact that an oil shock has a statistically significant effect on the call money rate and real output and that the call money rate also has a statistically significant effect on real output appears to explain that the effects of oil price shocks on economic activity are partially attributed to contractionary monetary policy responses. The asymmetric effects of positive and negative oil shocks are also found in the Japanese economy and this asymmetry can also be partially explained by monetary policy responses. To assess the relative contribution of oil shocks and endogenous monetary policy responses to the economic downturns, I shut off the responses of the call money rate to oil shocks utilizing the impulse response results from the VAR model. Then, I re-run the VAR with the adjusted call money rate series. The empirical results show that around 30--40% of the negative effects of oil price shocks on the Japanese economy can be accounted for by oil shock induced monetary tightening.

  12. Understanding Teacher Users of a Digital Library Service: A Clustering Approach

    ERIC Educational Resources Information Center

    Xu, Beijie

    2011-01-01

    This research examined teachers' online behaviors while using a digital library service--the Instructional Architect (IA)--through three consecutive studies. In the first two studies, a statistical model called latent class analysis (LCA) was applied to cluster different groups of IA teachers according to their diverse online behaviors. The third…

  13. Standard and Robust Methods in Regression Imputation

    ERIC Educational Resources Information Center

    Moraveji, Behjat; Jafarian, Koorosh

    2014-01-01

    The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…

  14. The mediating effect of calling on the relationship between medical school students’ academic burnout and empathy

    PubMed Central

    2017-01-01

    Purpose This study is aimed at identifying the relationships between medical school students’ academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. Methods A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students’ empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. Results The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. Conclusion This result demonstrates that calling is a key variable that mediates the relationship between medical students’ academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students’ empathy skills. PMID:28870019

  15. Can Retinal Ganglion Cell Dipoles Seed Iso-Orientation Domains in the Visual Cortex?

    PubMed Central

    Schottdorf, Manuel; Eglen, Stephen J.; Wolf, Fred; Keil, Wolfgang

    2014-01-01

    It has been argued that the emergence of roughly periodic orientation preference maps (OPMs) in the primary visual cortex (V1) of carnivores and primates can be explained by a so-called statistical connectivity model. This model assumes that input to V1 neurons is dominated by feed-forward projections originating from a small set of retinal ganglion cells (RGCs). The typical spacing between adjacent cortical orientation columns preferring the same orientation then arises via Moiré-Interference between hexagonal ON/OFF RGC mosaics. While this Moiré-Interference critically depends on long-range hexagonal order within the RGC mosaics, a recent statistical analysis of RGC receptive field positions found no evidence for such long-range positional order. Hexagonal order may be only one of several ways to obtain spatially repetitive OPMs in the statistical connectivity model. Here, we investigate a more general requirement on the spatial structure of RGC mosaics that can seed the emergence of spatially repetitive cortical OPMs, namely that angular correlations between so-called RGC dipoles exhibit a spatial structure similar to that of OPM autocorrelation functions. Both in cat beta cell mosaics as well as primate parasol receptive field mosaics we find that RGC dipole angles are spatially uncorrelated. To help assess the level of these correlations, we introduce a novel point process that generates mosaics with realistic nearest neighbor statistics and a tunable degree of spatial correlations of dipole angles. Using this process, we show that given the size of available data sets, the presence of even weak angular correlations in the data is very unlikely. We conclude that the layout of ON/OFF ganglion cell mosaics lacks the spatial structure necessary to seed iso-orientation domains in the primary visual cortex. PMID:24475081

  16. Can retinal ganglion cell dipoles seed iso-orientation domains in the visual cortex?

    PubMed

    Schottdorf, Manuel; Eglen, Stephen J; Wolf, Fred; Keil, Wolfgang

    2014-01-01

    It has been argued that the emergence of roughly periodic orientation preference maps (OPMs) in the primary visual cortex (V1) of carnivores and primates can be explained by a so-called statistical connectivity model. This model assumes that input to V1 neurons is dominated by feed-forward projections originating from a small set of retinal ganglion cells (RGCs). The typical spacing between adjacent cortical orientation columns preferring the same orientation then arises via Moiré-Interference between hexagonal ON/OFF RGC mosaics. While this Moiré-Interference critically depends on long-range hexagonal order within the RGC mosaics, a recent statistical analysis of RGC receptive field positions found no evidence for such long-range positional order. Hexagonal order may be only one of several ways to obtain spatially repetitive OPMs in the statistical connectivity model. Here, we investigate a more general requirement on the spatial structure of RGC mosaics that can seed the emergence of spatially repetitive cortical OPMs, namely that angular correlations between so-called RGC dipoles exhibit a spatial structure similar to that of OPM autocorrelation functions. Both in cat beta cell mosaics as well as primate parasol receptive field mosaics we find that RGC dipole angles are spatially uncorrelated. To help assess the level of these correlations, we introduce a novel point process that generates mosaics with realistic nearest neighbor statistics and a tunable degree of spatial correlations of dipole angles. Using this process, we show that given the size of available data sets, the presence of even weak angular correlations in the data is very unlikely. We conclude that the layout of ON/OFF ganglion cell mosaics lacks the spatial structure necessary to seed iso-orientation domains in the primary visual cortex.

  17. From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling

    NASA Astrophysics Data System (ADS)

    Nazé, Pierre

    2018-03-01

    Copula modeling consists in finding a probabilistic distribution, called copula, whereby its coupling with the marginal distributions of a set of random variables produces their joint distribution. The present work aims to use this technique to connect the statistical distributions of weakly chaotic dynamics and deterministic subdiffusion. More precisely, we decompose the jumps distribution of Geisel-Thomae map into a bivariate one and determine the marginal and copula distributions respectively by infinite ergodic theory and statistical inference techniques. We verify therefore that the characteristic tail distribution of subdiffusion is an extreme value copula coupling Mittag-Leffler distributions. We also present a method to calculate the exact copula and joint distributions in the case where weakly chaotic dynamics and deterministic subdiffusion statistical distributions are already known. Numerical simulations and consistency with the dynamical aspects of the map support our results.

  18. Statistical Mechanics of Node-perturbation Learning with Noisy Baseline

    NASA Astrophysics Data System (ADS)

    Hara, Kazuyuki; Katahira, Kentaro; Okada, Masato

    2017-02-01

    Node-perturbation learning is a type of statistical gradient descent algorithm that can be applied to problems where the objective function is not explicitly formulated, including reinforcement learning. It estimates the gradient of an objective function by using the change in the object function in response to the perturbation. The value of the objective function for an unperturbed output is called a baseline. Cho et al. proposed node-perturbation learning with a noisy baseline. In this paper, we report on building the statistical mechanics of Cho's model and on deriving coupled differential equations of order parameters that depict learning dynamics. We also show how to derive the generalization error by solving the differential equations of order parameters. On the basis of the results, we show that Cho's results are also apply in general cases and show some general performances of Cho's model.

  19. Appplication of statistical mechanical methods to the modeling of social networks

    NASA Astrophysics Data System (ADS)

    Strathman, Anthony Robert

    With the recent availability of large-scale social data sets, social networks have become open to quantitative analysis via the methods of statistical physics. We examine the statistical properties of a real large-scale social network, generated from cellular phone call-trace logs. We find this network, like many other social networks to be assortative (r = 0.31) and clustered (i.e., strongly transitive, C = 0.21). We measure fluctuation scaling to identify the presence of internal structure in the network and find that structural inhomogeneity effectively disappears at the scale of a few hundred nodes, though there is no sharp cutoff. We introduce an agent-based model of social behavior, designed to model the formation and dissolution of social ties. The model is a modified Metropolis algorithm containing agents operating under the basic sociological constraints of reciprocity, communication need and transitivity. The model introduces the concept of a social temperature. We go on to show that this simple model reproduces the global statistical network features (incl. assortativity, connected fraction, mean degree, clustering, and mean shortest path length) of the real network data and undergoes two phase transitions, one being from a "gas" to a "liquid" state and the second from a liquid to a glassy state as function of this social temperature.

  20. The effect of work shift configurations on emergency medical dispatch center response.

    PubMed

    Montassier, Emmanuel; Labady, Julien; Andre, Antoine; Potel, Gilles; Berthier, Frederic; Jenvrin, Joel; Penverne, Yann

    2015-01-01

    It has been proved that emergency medical dispatch centers (EMDC) save lives by promoting an appropriate allocation of emergency medical service resources. Indeed, optimal dispatcher call duration is pivotal to reduce the time gap between the time a call is placed and the delivery of medical care. However, little is known about the impact of work shift configurations (i.e., work shift duration and work shift rotation throughout the day) and dispatcher call duration. Thus, the objective of our study was to assess the effect of work shift configurations on dispatcher call duration. During a 1-year study period, we analyzed the dispatcher call durations for medical and trauma calls during the 4 different work shift rotations (day, morning, evening, and night) and during the 10-hour work shift of each dispatcher in the EMDC of Nantes. We extracted dispatcher call durations from our advanced telephone system, configured with CC Pulse + (Genesys, Alcatel Lucent), and collected them in a custom designed database (Excel, Microsoft). Afterward, we analyzed these data using linear mixed effects models. During the study period, our EMDC received 408,077 calls. Globally, the mean dispatcher call duration was 107 ± 45 seconds. Based on multivariate linear mixed effects models, the dispatcher call duration was affected by night work shift and work shift duration greater than 8 hours, increasing it by about 10 ± 1 seconds and 4 ± 1 seconds, respectively (both p < 0.001). Our study showed that there was a statistically significant difference in dispatcher call duration over work shift rotation and duration, with longer durations seen over night shifts and shifts over 8 hours. While these differences are small and may not have clinical significance, they may have implications for EMDC efficiency.

  1. Bayesian Sensitivity Analysis of Statistical Models with Missing Data

    PubMed Central

    ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG

    2013-01-01

    Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718

  2. Using Computational Modeling to Assess the Impact of Clinical Decision Support on Cancer Screening within Community Health Centers

    PubMed Central

    Carney, Timothy Jay; Morgan, Geoffrey P.; Jones, Josette; McDaniel, Anna M.; Weaver, Michael; Weiner, Bryan; Haggstrom, David A.

    2014-01-01

    Our conceptual model demonstrates our goal to investigate the impact of clinical decision support (CDS) utilization on cancer screening improvement strategies in the community health care (CHC) setting. We employed a dual modeling technique using both statistical and computational modeling to evaluate impact. Our statistical model used the Spearman’s Rho test to evaluate the strength of relationship between our proximal outcome measures (CDS utilization) against our distal outcome measure (provider self-reported cancer screening improvement). Our computational model relied on network evolution theory and made use of a tool called Construct-TM to model the use of CDS measured by the rate of organizational learning. We employed the use of previously collected survey data from community health centers Cancer Health Disparities Collaborative (HDCC). Our intent is to demonstrate the added valued gained by using a computational modeling tool in conjunction with a statistical analysis when evaluating the impact a health information technology, in the form of CDS, on health care quality process outcomes such as facility-level screening improvement. Significant simulated disparities in organizational learning over time were observed between community health centers beginning the simulation with high and low clinical decision support capability. PMID:24953241

  3. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.

    PubMed

    Li, Bing; Chun, Hyonho; Zhao, Hongyu

    2014-09-01

    We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.

  4. Design of a testing strategy using non-animal based test methods: lessons learnt from the ACuteTox project.

    PubMed

    Kopp-Schneider, Annette; Prieto, Pilar; Kinsner-Ovaskainen, Agnieszka; Stanzel, Sven

    2013-06-01

    In the framework of toxicology, a testing strategy can be viewed as a series of steps which are taken to come to a final prediction about a characteristic of a compound under study. The testing strategy is performed as a single-step procedure, usually called a test battery, using simultaneously all information collected on different endpoints, or as tiered approach in which a decision tree is followed. Design of a testing strategy involves statistical considerations, such as the development of a statistical prediction model. During the EU FP6 ACuteTox project, several prediction models were proposed on the basis of statistical classification algorithms which we illustrate here. The final choice of testing strategies was not based on statistical considerations alone. However, without thorough statistical evaluations a testing strategy cannot be identified. We present here a number of observations made from the statistical viewpoint which relate to the development of testing strategies. The points we make were derived from problems we had to deal with during the evaluation of this large research project. A central issue during the development of a prediction model is the danger of overfitting. Procedures are presented to deal with this challenge. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. Testing the Predictive Power of Coulomb Stress on Aftershock Sequences

    NASA Astrophysics Data System (ADS)

    Woessner, J.; Lombardi, A.; Werner, M. J.; Marzocchi, W.

    2009-12-01

    Empirical and statistical models of clustered seismicity are usually strongly stochastic and perceived to be uninformative in their forecasts, since only marginal distributions are used, such as the Omori-Utsu and Gutenberg-Richter laws. In contrast, so-called physics-based aftershock models, based on seismic rate changes calculated from Coulomb stress changes and rate-and-state friction, make more specific predictions: anisotropic stress shadows and multiplicative rate changes. We test the predictive power of models based on Coulomb stress changes against statistical models, including the popular Short Term Earthquake Probabilities and Epidemic-Type Aftershock Sequences models: We score and compare retrospective forecasts on the aftershock sequences of the 1992 Landers, USA, the 1997 Colfiorito, Italy, and the 2008 Selfoss, Iceland, earthquakes. To quantify predictability, we use likelihood-based metrics that test the consistency of the forecasts with the data, including modified and existing tests used in prospective forecast experiments within the Collaboratory for the Study of Earthquake Predictability (CSEP). Our results indicate that a statistical model performs best. Moreover, two Coulomb model classes seem unable to compete: Models based on deterministic Coulomb stress changes calculated from a given fault-slip model, and those based on fixed receiver faults. One model of Coulomb stress changes does perform well and sometimes outperforms the statistical models, but its predictive information is diluted, because of uncertainties included in the fault-slip model. Our results suggest that models based on Coulomb stress changes need to incorporate stochastic features that represent model and data uncertainty.

  6. Statistics of Optical Coherence Tomography Data From Human Retina

    PubMed Central

    de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo

    2010-01-01

    Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733

  7. Assimilating NOAA SST data into BSH operational circulation model for North and Baltic Seas

    NASA Astrophysics Data System (ADS)

    Losa, Svetlana; Schroeter, Jens; Nerger, Lars; Janjic, Tijana; Danilov, Sergey; Janssen, Frank

    A data assimilation (DA) system is developed for BSH operational circulation model in order to improve forecast of current velocities, sea surface height, temperature and salinity in the North and Baltic Seas. Assimilated data are NOAA sea surface temperature (SST) data for the following period: 01.10.07 -30.09.08. All data assimilation experiments are based on im-plementation of one of the so-called statistical DA methods -Singular Evolutive Interpolated Kalman (SEIK) filter, -with different ways of prescribing assumed model and data errors statis-tics. Results of the experiments will be shown and compared against each other. Hydrographic data from MARNET stations and sea level at series of tide gauges are used as independent information to validate the data assimilation system. Keywords: Operational Oceanography and forecasting

  8. Quantum statistics in complex networks

    NASA Astrophysics Data System (ADS)

    Bianconi, Ginestra

    The Barabasi-Albert (BA) model for a complex network shows a characteristic power law connectivity distribution typical of scale free systems. The Ising model on the BA network shows that the ferromagnetic phase transition temperature depends logarithmically on its size. We have introduced a fitness parameter for the BA network which describes the different abilities of nodes to compete for links. This model predicts the formation of a scale free network where each node increases its connectivity in time as a power-law with an exponent depending on its fitness. This model includes the fact that the node connectivity and growth rate do not depend on the node age alone and it reproduces non trivial correlation properties of the Internet. We have proposed a model of bosonic networks by a generalization of the BA model where the properties of quantum statistics can be applied. We have introduced a fitness eta i = e-bei where the temperature T = 1/ b is determined by the noise in the system and the energy ei accounts for qualitative differences of each node for acquiring links. The results of this work show that a power law network with exponent gamma = 2 can give a Bose condensation where a single node grabs a finite fraction of all the links. In order to address the connection with self-organized processes we have introduced a model for a growing Cayley tree that generalizes the dynamics of invasion percolation. At each node we associate a parameter ei (called energy) such that the probability to grow for each node is given by pii ∝ ebei where T = 1/ b is a statistical parameter of the system determined by the noise called the temperature. This model has been solved analytically with a similar mathematical technique as the bosonic scale-free networks and it shows the self organization of the low energy nodes at the interface. In the thermodynamic limit the Fermi distribution describes the probability of the energy distribution at the interface.

  9. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  10. A Model for Post Hoc Evaluation.

    ERIC Educational Resources Information Center

    Theimer, William C., Jr.

    Often a research department in a school system is called on to make an after the fact evaluation of a program or project. Although the department is operating under a handicap, it can still provide some data useful for evaluative purposes. It is suggested that all the classical methods of descriptive statistics be brought into play. The use of…

  11. Searching for hidden unexpected features in the SnIa data

    NASA Astrophysics Data System (ADS)

    Shafieloo, A.; Perivolaropoulos, L.

    2010-06-01

    It is known that κ2 statistic and likelihood analysis may not be sensitive to the all features of the data. Despite of the fact that by using κ2 statistic we can measure the overall goodness of fit for a model confronted to a data set, some specific features of the data can stay undetectable. For instance, it has been pointed out that there is an unexpected brightness of the SnIa data at z > 1 in the Union compilation. We quantify this statement by constructing a new statistic, called Binned Normalized Difference (BND) statistic, which is applicable directly on the Type Ia Supernova (SnIa) distance moduli. This statistic is designed to pick up systematic brightness trends of SnIa data points with respect to a best fit cosmological model at high redshifts. According to this statistic there are 2.2%, 5.3% and 12.6% consistency between the Gold06, Union08 and Constitution09 data and spatially flat ΛCDM model when the real data is compared with many realizations of the simulated monte carlo datasets. The corresponding realization probability in the context of a (w0,w1) = (-1.4,2) model is more than 30% for all mentioned datasets indicating a much better consistency for this model with respect to the BND statistic. The unexpected high z brightness of SnIa can be interpreted either as a trend towards more deceleration at high z than expected in the context of ΛCDM or as a statistical fluctuation or finally as a systematic effect perhaps due to a mild SnIa evolution at high z.

  12. Integrated driver modelling considering state transition feature for individual adaptation of driver assistance systems

    NASA Astrophysics Data System (ADS)

    Raksincharoensak, Pongsathorn; Khaisongkram, Wathanyoo; Nagai, Masao; Shimosaka, Masamichi; Mori, Taketoshi; Sato, Tomomasa

    2010-12-01

    This paper describes the modelling of naturalistic driving behaviour in real-world traffic scenarios, based on driving data collected via an experimental automobile equipped with a continuous sensing drive recorder. This paper focuses on the longitudinal driving situations which are classified into five categories - car following, braking, free following, decelerating and stopping - and are referred to as driving states. Here, the model is assumed to be represented by a state flow diagram. Statistical machine learning of driver-vehicle-environment system model based on driving database is conducted by a discriminative modelling approach called boosting sequential labelling method.

  13. Quantum Optics Models of EIT Noise and Power Broadening

    NASA Astrophysics Data System (ADS)

    Snider, Chad; Crescimanno, Michael; O'Leary, Shannon

    2011-04-01

    When two coherent beams of light interact with an atom they tend to drive the atom to a non-absorbing state through a process called Electromagnetically Induced Transparency (EIT). If the light's frequency dithers, the atom's state stochastically moves in and out of this non-absorbing state. We describe a simple quantum optics model of this process that captures the essential experimentally observed statistical features of this EIT noise, with a particular emphasis on understanding power broadening.

  14. A statistical method to estimate low-energy hadronic cross sections

    NASA Astrophysics Data System (ADS)

    Balassa, Gábor; Kovács, Péter; Wolf, György

    2018-02-01

    In this article we propose a model based on the Statistical Bootstrap approach to estimate the cross sections of different hadronic reactions up to a few GeV in c.m.s. energy. The method is based on the idea, when two particles collide a so-called fireball is formed, which after a short time period decays statistically into a specific final state. To calculate the probabilities we use a phase space description extended with quark combinatorial factors and the possibility of more than one fireball formation. In a few simple cases the probability of a specific final state can be calculated analytically, where we show that the model is able to reproduce the ratios of the considered cross sections. We also show that the model is able to describe proton-antiproton annihilation at rest. In the latter case we used a numerical method to calculate the more complicated final state probabilities. Additionally, we examined the formation of strange and charmed mesons as well, where we used existing data to fit the relevant model parameters.

  15. The epistemological status of general circulation models

    NASA Astrophysics Data System (ADS)

    Loehle, Craig

    2018-03-01

    Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.

  16. Two-state Markov-chain Poisson nature of individual cellphone call statistics

    NASA Astrophysics Data System (ADS)

    Jiang, Zhi-Qiang; Xie, Wen-Jie; Li, Ming-Xia; Zhou, Wei-Xing; Sornette, Didier

    2016-07-01

    Unfolding the burst patterns in human activities and social interactions is a very important issue especially for understanding the spreading of disease and information and the formation of groups and organizations. Here, we conduct an in-depth study of the temporal patterns of cellphone conversation activities of 73 339 anonymous cellphone users, whose inter-call durations are Weibull distributed. We find that the individual call events exhibit a pattern of bursts, that high activity periods are alternated with low activity periods. In both periods, the number of calls are exponentially distributed for individuals, but power-law distributed for the population. Together with the exponential distributions of inter-call durations within bursts and of the intervals between consecutive bursts, we demonstrate that the individual call activities are driven by two independent Poisson processes, which can be combined within a minimal model in terms of a two-state first-order Markov chain, giving significant fits for nearly half of the individuals. By measuring directly the distributions of call rates across the population, which exhibit power-law tails, we purport the existence of power-law distributions, via the ‘superposition of distributions’ mechanism. Our findings shed light on the origins of bursty patterns in other human activities.

  17. EQS Goes R: Simulations for SEM Using the Package REQS

    ERIC Educational Resources Information Center

    Mair, Patrick; Wu, Eric; Bentler, Peter M.

    2010-01-01

    The REQS package is an interface between the R environment of statistical computing and the EQS software for structural equation modeling. The package consists of 3 main functions that read EQS script files and import the results into R, call EQS script files from R, and run EQS script files from R and import the results after EQS computations.…

  18. Asymptotic formulae for likelihood-based tests of new physics

    NASA Astrophysics Data System (ADS)

    Cowan, Glen; Cranmer, Kyle; Gross, Eilam; Vitells, Ofer

    2011-02-01

    We describe likelihood-based statistical tests for use in high energy physics for the discovery of new phenomena and for construction of confidence intervals on model parameters. We focus on the properties of the test procedures that allow one to account for systematic uncertainties. Explicit formulae for the asymptotic distributions of test statistics are derived using results of Wilks and Wald. We motivate and justify the use of a representative data set, called the "Asimov data set", which provides a simple method to obtain the median experimental sensitivity of a search or measurement as well as fluctuations about this expectation.

  19. Energetic expenditure during vocalization in pups of the subterranean rodent Ctenomys talarum

    NASA Astrophysics Data System (ADS)

    Schleich, Cristian Eric; Busch, Cristina

    2004-11-01

    Theoretical signaling models predict that to be honest, begging vocalizations must be costly. To test this hypothesis, oxygen consumption was measured during resting and begging (i.e., vocalizing) activities in pups of the subterranean rodent Ctenomys talarum by means of open-flow respirometry. No statistical differences in individual oxygen consumption between resting and calling pups ranging in age from day 2 to day 20 were found. Given these data, begging calls of C. talarum could not be considered as honest advertisements of offspring need, contrary to what suggested by the behavioral observations of the mother and pups during the nestling period.

  20. Model-based quality assessment and base-calling for second-generation sequencing data.

    PubMed

    Bravo, Héctor Corrada; Irizarry, Rafael A

    2010-09-01

    Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.

  1. A Public Health Initiative for Preventing Family Violence.

    ERIC Educational Resources Information Center

    Hargens, Yvonne Marie

    A community task force studying violence issues closely examined police statistics for domestic calls. Few records of referrals were made in response to these calls. Other statistics on child abuse and family violence reinforced the fact that family violence was a significant problem, making program response to family violence issues a top…

  2. A Prototype of Pilot Knowledge Evaluation by an Intelligent CAI (Computer -Aided Instruction) System Using a Bayesian Diagnostic Model.

    DTIC Science & Technology

    1987-06-01

    to a field of research called Computer-Aided Instruction (CAI). CAI is a powerful methodology for enhancing the overall quaiity and effectiveness of...provides a very powerful tool for statistical inference, especially when pooling informations from different source is appropriate. Thus. prior...04 , 2 ’ .. ."k, + ++ ,,;-+-,..,,..v ->’,0,,.’ I The power of the model lies in its ability to adapt a diagnostic session to the level of knowledge

  3. Science and Engineering of an Operational Tsunami Forecasting System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gonzalez, Frank

    2009-04-06

    After a review of tsunami statistics and the destruction caused by tsunamis, a means of forecasting tsunamis is discussed as part of an overall program of reducing fatalities through hazard assessment, education, training, mitigation, and a tsunami warning system. The forecast is accomplished via a concept called Deep Ocean Assessment and Reporting of Tsunamis (DART). Small changes of pressure at the sea floor are measured and relayed to warning centers. Under development is an international modeling network to transfer, maintain, and improve tsunami forecast models.

  4. Characterization of Cloud Water-Content Distribution

    NASA Technical Reports Server (NTRS)

    Lee, Seungwon

    2010-01-01

    The development of realistic cloud parameterizations for climate models requires accurate characterizations of subgrid distributions of thermodynamic variables. To this end, a software tool was developed to characterize cloud water-content distributions in climate-model sub-grid scales. This software characterizes distributions of cloud water content with respect to cloud phase, cloud type, precipitation occurrence, and geo-location using CloudSat radar measurements. It uses a statistical method called maximum likelihood estimation to estimate the probability density function of the cloud water content.

  5. Science and Engineering of an Operational Tsunami Forecasting System

    ScienceCinema

    Gonzalez, Frank

    2017-12-09

    After a review of tsunami statistics and the destruction caused by tsunamis, a means of forecasting tsunamis is discussed as part of an overall program of reducing fatalities through hazard assessment, education, training, mitigation, and a tsunami warning system. The forecast is accomplished via a concept called Deep Ocean Assessment and Reporting of Tsunamis (DART). Small changes of pressure at the sea floor are measured and relayed to warning centers. Under development is an international modeling network to transfer, maintain, and improve tsunami forecast models.

  6. Statistical auditing and randomness test of lotto k/N-type games

    NASA Astrophysics Data System (ADS)

    Coronel-Brizio, H. F.; Hernández-Montoya, A. R.; Rapallo, F.; Scalas, E.

    2008-11-01

    One of the most popular lottery games worldwide is the so-called “lotto k/N”. It considers N numbers 1,2,…,N from which k are drawn randomly, without replacement. A player selects k or more numbers and the first prize is shared amongst those players whose selected numbers match all of the k randomly drawn. Exact rules may vary in different countries. In this paper, mean values and covariances for the random variables representing the numbers drawn from this kind of game are presented, with the aim of using them to audit statistically the consistency of a given sample of historical results with theoretical values coming from a hypergeometric statistical model. The method can be adapted to test pseudorandom number generators.

  7. Self-organized network of fractal-shaped components coupled through statistical interaction.

    PubMed

    Ugajin, R

    2001-09-01

    A dissipative dynamics is introduced to generate self-organized networks of interacting objects, which we call coupled-fractal networks. The growth model is constructed based on a growth hypothesis in which the growth rate of each object is a product of the probability of receiving source materials from faraway and the probability of receiving adhesives from other grown objects, where each object grows to be a random fractal if isolated, but connects with others if glued. The network is governed by the statistical interaction between fractal-shaped components, which can only be identified in a statistical manner over ensembles. This interaction is investigated using the degree of correlation between fractal-shaped components, enabling us to determine whether it is attractive or repulsive.

  8. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods.

    PubMed

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil

    2014-08-01

    We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.

  9. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods

    PubMed Central

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil

    2015-01-01

    We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called “Patient Recursive Survival Peeling” is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called “combined” cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication. PMID:26997922

  10. Time-course variation of statistics embedded in music: Corpus study on implicit learning and knowledge.

    PubMed

    Daikoku, Tatsuya

    2018-01-01

    Learning and knowledge of transitional probability in sequences like music, called statistical learning and knowledge, are considered implicit processes that occur without intention to learn and awareness of what one knows. This implicit statistical knowledge can be alternatively expressed via abstract medium such as musical melody, which suggests this knowledge is reflected in melodies written by a composer. This study investigates how statistics in music vary over a composer's lifetime. Transitional probabilities of highest-pitch sequences in Ludwig van Beethoven's Piano Sonata were calculated based on different hierarchical Markov models. Each interval pattern was ordered based on the sonata opus number. The transitional probabilities of sequential patterns that are musical universal in music gradually decreased, suggesting that time-course variations of statistics in music reflect time-course variations of a composer's statistical knowledge. This study sheds new light on novel methodologies that may be able to evaluate the time-course variation of composer's implicit knowledge using musical scores.

  11. Role of sufficient statistics in stochastic thermodynamics and its implication to sensory adaptation

    NASA Astrophysics Data System (ADS)

    Matsumoto, Takumi; Sagawa, Takahiro

    2018-04-01

    A sufficient statistic is a significant concept in statistics, which means a probability variable that has sufficient information required for an inference task. We investigate the roles of sufficient statistics and related quantities in stochastic thermodynamics. Specifically, we prove that for general continuous-time bipartite networks, the existence of a sufficient statistic implies that an informational quantity called the sensory capacity takes the maximum. Since the maximal sensory capacity imposes a constraint that the energetic efficiency cannot exceed one-half, our result implies that the existence of a sufficient statistic is inevitably accompanied by energetic dissipation. We also show that, in a particular parameter region of linear Langevin systems there exists the optimal noise intensity at which the sensory capacity, the information-thermodynamic efficiency, and the total entropy production are optimized at the same time. We apply our general result to a model of sensory adaptation of E. coli and find that the sensory capacity is nearly maximal with experimentally realistic parameters.

  12. Stochastic geometry in disordered systems, applications to quantum Hall transitions

    NASA Astrophysics Data System (ADS)

    Gruzberg, Ilya

    2012-02-01

    A spectacular success in the study of random fractal clusters and their boundaries in statistical mechanics systems at or near criticality using Schramm-Loewner Evolutions (SLE) naturally calls for extensions in various directions. Can this success be repeated for disordered and/or non-equilibrium systems? Naively, when one thinks about disordered systems and their average correlation functions one of the very basic assumptions of SLE, the so called domain Markov property, is lost. Also, in some lattice models of Anderson transitions (the network models) there are no natural clusters to consider. Nevertheless, in this talk I will argue that one can apply the so called conformal restriction, a notion of stochastic conformal geometry closely related to SLE, to study the integer quantum Hall transition and its variants. I will focus on the Chalker-Coddington network model and will demonstrate that its average transport properties can be mapped to a classical problem where the basic objects are geometric shapes (loosely speaking, the current paths) that obey an important restriction property. At the transition point this allows to use the theory of conformal restriction to derive exact expressions for point contact conductances in the presence of various non-trivial boundary conditions.

  13. A proposed technique for vehicle tracking, direction, and speed determination

    NASA Astrophysics Data System (ADS)

    Fisher, Paul S.; Angaye, Cleopas O.; Fisher, Howard P.

    2004-12-01

    A technique for recognition of vehicles in terms of direction, distance, and rate of change is presented. This represents very early work on this problem with significant hurdles still to be addressed. These are discussed in the paper. However, preliminary results also show promise for this technique for use in security and defense environments where the penetration of a perimeter is of concern. The material described herein indicates a process whereby the protection of a barrier could be augmented by computers and installed cameras assisting the individuals charged with this responsibility. The technique we employ is called Finite Inductive Sequences (FI) and is proposed as a means for eliminating data requiring storage and recognition where conventional mathematical models don"t eliminate enough and statistical models eliminate too much. FI is a simple idea and is based upon a symbol push-out technique that allows the order (inductive base) of the model to be set to an a priori value for all derived rules. The rules are obtained from exemplar data sets, and are derived by a technique called Factoring, yielding a table of rules called a Ruling. These rules can then be used in pattern recognition applications such as described in this paper.

  14. Reservoir Modeling by Data Integration via Intermediate Spaces and Artificial Intelligence Tools in MPS Simulation Frameworks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahmadi, Rouhollah, E-mail: rouhollahahmadi@yahoo.com; Khamehchi, Ehsan

    Conditioning stochastic simulations are very important in many geostatistical applications that call for the introduction of nonlinear and multiple-point data in reservoir modeling. Here, a new methodology is proposed for the incorporation of different data types into multiple-point statistics (MPS) simulation frameworks. Unlike the previous techniques that call for an approximate forward model (filter) for integration of secondary data into geologically constructed models, the proposed approach develops an intermediate space where all the primary and secondary data are easily mapped onto. Definition of the intermediate space, as may be achieved via application of artificial intelligence tools like neural networks andmore » fuzzy inference systems, eliminates the need for using filters as in previous techniques. The applicability of the proposed approach in conditioning MPS simulations to static and geologic data is verified by modeling a real example of discrete fracture networks using conventional well-log data. The training patterns are well reproduced in the realizations, while the model is also consistent with the map of secondary data.« less

  15. Intimate partner violence in Madrid: a time series analysis (2008-2016).

    PubMed

    Sanz-Barbero, Belén; Linares, Cristina; Vives-Cases, Carmen; González, José Luis; López-Ossorio, Juan José; Díaz, Julio

    2018-06-02

    This study analyzes whether there are time patterns in different intimate partner violence (IPV) indicators and aims to obtain models that can predict the behavior of these time series. Univariate autoregressive moving average models were used to analyze the time series corresponding to the number of daily calls to the 016 telephone IPV helpline and the number of daily police reports filed in the Community of Madrid during the period 2008-2015. Predictions were made for both dependent variables for 2016. The daily number of calls to the 016 telephone IPV helpline decreased during January 2008-April 2012 and increased during April 2012-December 2015. No statistically significant change was observed in the trend of the number of daily IPV police reports. The number of IPV police reports filed increased on weekends and on Christmas holidays. The number of calls to the 016 IPV help line increased on Mondays. Using data from 2008 to 2015, the univariate autoregressive moving average models predicted 64.2% of calls to the 016 telephone IPV helpline and 73.2% of police reports filed during 2016 in the Community of Madrid. Our results suggest the need for an increase in police and judicial resources on nonwork days. Also, the 016 telephone IPV helpline should be especially active on work days. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. An improved approach for flight readiness certification: Probabilistic models for flaw propagation and turbine blade failure. Volume 1: Methodology and applications

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for designs failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.

  17. An improved approach for flight readiness certification: Probabilistic models for flaw propagation and turbine blade failure. Volume 2: Software documentation

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflights systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for design, failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.

  18. Statistical self-similarity of hotspot seamount volumes modeled as self-similar criticality

    USGS Publications Warehouse

    Tebbens, S.F.; Burroughs, S.M.; Barton, C.C.; Naar, D.F.

    2001-01-01

    The processes responsible for hotspot seamount formation are complex, yet the cumulative frequency-volume distribution of hotspot seamounts in the Easter Island/Salas y Gomez Chain (ESC) is found to be well-described by an upper-truncated power law. We develop a model for hotspot seamount formation where uniform energy input produces events initiated on a self-similar distribution of critical cells. We call this model Self-Similar Criticality (SSC). By allowing the spatial distribution of magma migration to be self-similar, the SSC model recreates the observed ESC seamount volume distribution. The SSC model may have broad applicability to other natural systems.

  19. Dose response explorer: an integrated open-source tool for exploring and modelling radiotherapy dose volume outcome relationships

    NASA Astrophysics Data System (ADS)

    El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.

    2006-11-01

    Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.

  20. Accurate Modeling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron; Scoccimarro, Roman

    2015-01-01

    The large-scale distribution of galaxies can be explained fairly simply by assuming (i) a cosmological model, which determines the dark matter halo distribution, and (ii) a simple connection between galaxies and the halos they inhabit. This conceptually simple framework, called the halo model, has been remarkably successful at reproducing the clustering of galaxies on all scales, as observed in various galaxy redshift surveys. However, none of these previous studies have carefully modeled the systematics and thus truly tested the halo model in a statistically rigorous sense. We present a new accurate and fully numerical halo model framework and test it against clustering measurements from two luminosity samples of galaxies drawn from the SDSS DR7. We show that the simple ΛCDM cosmology + halo model is not able to simultaneously reproduce the galaxy projected correlation function and the group multiplicity function. In particular, the more luminous sample shows significant tension with theory. We discuss the implications of our findings and how this work paves the way for constraining galaxy formation by accurate simultaneous modeling of multiple galaxy clustering statistics.

  1. Ontogenetic variation of heritability and maternal effects in yellow-bellied marmot alarm calls.

    PubMed

    Blumstein, Daniel T; Nguyen, Kathy T; Martin, Julien G A

    2013-05-07

    Individuals of many species produce distinctive vocalizations that may relay potential information about the signaller. The alarm calls of some species have been reported to be individually specific, and this distinctiveness may allow individuals to access the reliability or kinship of callers. While not much is known generally about the heritability of mammalian vocalizations, if alarm calls were individually distinctive to permit kinship assessment, then call structure should be heritable. Here, we show conclusively for the first time that alarm call structure is heritable. We studied yellow-bellied marmots (Marmota flaviventris) and made nine quantitative measurements of their alarm calls. With a known genealogy, we used the animal model (a statistical technique) to estimate alarm call heritability. In juveniles, only one of the measured variables had heritability significantly different from zero; however, most variables had significant maternal environmental effects. By contrast, yearlings and adults had no significant maternal environmental effects, but the heritability of nearly all measured variables was significantly different from zero. Some, but not all of these heritable effects were significantly different across age classes. The presence of significantly non-zero maternal environmental effects in juveniles could reflect the impact of maternal environmental stresses on call structure. Regardless of this mechanism, maternal environmental effects could permit kinship recognition in juveniles. In older animals, the substantial genetic basis of alarm call structure suggests that calls could be used to assess kinship and, paradoxically, might also suggest a role of learning in call structure.

  2. Ontogenetic variation of heritability and maternal effects in yellow-bellied marmot alarm calls

    PubMed Central

    Blumstein, Daniel T.; Nguyen, Kathy T.; Martin, Julien G. A.

    2013-01-01

    Individuals of many species produce distinctive vocalizations that may relay potential information about the signaller. The alarm calls of some species have been reported to be individually specific, and this distinctiveness may allow individuals to access the reliability or kinship of callers. While not much is known generally about the heritability of mammalian vocalizations, if alarm calls were individually distinctive to permit kinship assessment, then call structure should be heritable. Here, we show conclusively for the first time that alarm call structure is heritable. We studied yellow-bellied marmots (Marmota flaviventris) and made nine quantitative measurements of their alarm calls. With a known genealogy, we used the animal model (a statistical technique) to estimate alarm call heritability. In juveniles, only one of the measured variables had heritability significantly different from zero; however, most variables had significant maternal environmental effects. By contrast, yearlings and adults had no significant maternal environmental effects, but the heritability of nearly all measured variables was significantly different from zero. Some, but not all of these heritable effects were significantly different across age classes. The presence of significantly non-zero maternal environmental effects in juveniles could reflect the impact of maternal environmental stresses on call structure. Regardless of this mechanism, maternal environmental effects could permit kinship recognition in juveniles. In older animals, the substantial genetic basis of alarm call structure suggests that calls could be used to assess kinship and, paradoxically, might also suggest a role of learning in call structure. PMID:23466987

  3. Additive Genetic Variability and the Bayesian Alphabet

    PubMed Central

    Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan

    2009-01-01

    The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397

  4. Combining forecast weights: Why and how?

    NASA Astrophysics Data System (ADS)

    Yin, Yip Chee; Kok-Haur, Ng; Hock-Eam, Lim

    2012-09-01

    This paper proposes a procedure called forecast weight averaging which is a specific combination of forecast weights obtained from different methods of constructing forecast weights for the purpose of improving the accuracy of pseudo out of sample forecasting. It is found that under certain specified conditions, forecast weight averaging can lower the mean squared forecast error obtained from model averaging. In addition, we show that in a linear and homoskedastic environment, this superior predictive ability of forecast weight averaging holds true irrespective whether the coefficients are tested by t statistic or z statistic provided the significant level is within the 10% range. By theoretical proofs and simulation study, we have shown that model averaging like, variance model averaging, simple model averaging and standard error model averaging, each produces mean squared forecast error larger than that of forecast weight averaging. Finally, this result also holds true marginally when applied to business and economic empirical data sets, Gross Domestic Product (GDP growth rate), Consumer Price Index (CPI) and Average Lending Rate (ALR) of Malaysia.

  5. Semantic Importance Sampling for Statistical Model Checking

    DTIC Science & Technology

    2014-10-18

    we implement SIS in a tool called osmosis and use it to verify a number of stochastic systems with rare events. Our results indicate that SIS reduces...background definitions and concepts. Section 4 presents SIS, and Section 5 presents our tool osmosis . In Section 6, we present our experiments and results...Syntactic Extraction ∗( ) dReal + Refinement ∗ |∗| , Monte-Carlo , Fig. 5. Architecture of osmosis

  6. Statistical self-similarity of width function maxima with implications to floods

    USGS Publications Warehouse

    Veitzer, S.A.; Gupta, V.K.

    2001-01-01

    Recently a new theory of random self-similar river networks, called the RSN model, was introduced to explain empirical observations regarding the scaling properties of distributions of various topologic and geometric variables in natural basins. The RSN model predicts that such variables exhibit statistical simple scaling, when indexed by Horton-Strahler order. The average side tributary structure of RSN networks also exhibits Tokunaga-type self-similarity which is widely observed in nature. We examine the scaling structure of distributions of the maximum of the width function for RSNs for nested, complete Strahler basins by performing ensemble simulations. The maximum of the width function exhibits distributional simple scaling, when indexed by Horton-Strahler order, for both RSNs and natural river networks extracted from digital elevation models (DEMs). We also test a powerlaw relationship between Horton ratios for the maximum of the width function and drainage areas. These results represent first steps in formulating a comprehensive physical statistical theory of floods at multiple space-time scales for RSNs as discrete hierarchical branching structures. ?? 2001 Published by Elsevier Science Ltd.

  7. 3D automatic anatomy recognition based on iterative graph-cut-ASM

    NASA Astrophysics Data System (ADS)

    Chen, Xinjian; Udupa, Jayaram K.; Bagci, Ulas; Alavi, Abass; Torigian, Drew A.

    2010-02-01

    We call the computerized assistive process of recognizing, delineating, and quantifying organs and tissue regions in medical imaging, occurring automatically during clinical image interpretation, automatic anatomy recognition (AAR). The AAR system we are developing includes five main parts: model building, object recognition, object delineation, pathology detection, and organ system quantification. In this paper, we focus on the delineation part. For the modeling part, we employ the active shape model (ASM) strategy. For recognition and delineation, we integrate several hybrid strategies of combining purely image based methods with ASM. In this paper, an iterative Graph-Cut ASM (IGCASM) method is proposed for object delineation. An algorithm called GC-ASM was presented at this symposium last year for object delineation in 2D images which attempted to combine synergistically ASM and GC. Here, we extend this method to 3D medical image delineation. The IGCASM method effectively combines the rich statistical shape information embodied in ASM with the globally optimal delineation capability of the GC method. We propose a new GC cost function, which effectively integrates the specific image information with the ASM shape model information. The proposed methods are tested on a clinical abdominal CT data set. The preliminary results show that: (a) it is feasible to explicitly bring prior 3D statistical shape information into the GC framework; (b) the 3D IGCASM delineation method improves on ASM and GC and can provide practical operational time on clinical images.

  8. Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.

    PubMed

    Mesadi, Fitsum; Cetin, Mujdat; Tasdizen, Tolga

    2015-10-01

    The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. Active shape and appearance models require landmark points and assume unimodal shape and appearance distributions. Level set based shape priors are limited to global shape similarity. In this paper, we present a novel shape and appearance priors for image segmentation based on an implicit parametric shape representation called disjunctive normal shape model (DNSM). DNSM is formed by disjunction of conjunctions of half-spaces defined by discriminants. We learn shape and appearance statistics at varying spatial scales using nonparametric density estimation. Our method can generate a rich set of shape variations by locally combining training shapes. Additionally, by studying the intensity and texture statistics around each discriminant of our shape model, we construct a local appearance probability map. Experiments carried out on both medical and natural image datasets show the potential of the proposed method.

  9. Modified retrieval algorithm for three types of precipitation distribution using x-band synthetic aperture radar

    NASA Astrophysics Data System (ADS)

    Xie, Yanan; Zhou, Mingliang; Pan, Dengke

    2017-10-01

    The forward-scattering model is introduced to describe the response of normalized radar cross section (NRCS) of precipitation with synthetic aperture radar (SAR). Since the distribution of near-surface rainfall is related to the rate of near-surface rainfall and horizontal distribution factor, a retrieval algorithm called modified regression empirical and model-oriented statistical (M-M) based on the volterra integration theory is proposed. Compared with the model-oriented statistical and volterra integration (MOSVI) algorithm, the biggest difference is that the M-M algorithm is based on the modified regression empirical algorithm rather than the linear regression formula to retrieve the value of near-surface rainfall rate. Half of the empirical parameters are reduced in the weighted integral work and a smaller average relative error is received while the rainfall rate is less than 100 mm/h. Therefore, the algorithm proposed in this paper can obtain high-precision rainfall information.

  10. Statistical Inference in Hidden Markov Models Using k-Segment Constraints

    PubMed Central

    Titsias, Michalis K.; Holmes, Christopher C.; Yau, Christopher

    2016-01-01

    Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward–backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online. PMID:27226674

  11. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  12. Multiscale hidden Markov models for photon-limited imaging

    NASA Astrophysics Data System (ADS)

    Nowak, Robert D.

    1999-06-01

    Photon-limited image analysis is often hindered by low signal-to-noise ratios. A novel Bayesian multiscale modeling and analysis method is developed in this paper to assist in these challenging situations. In addition to providing a very natural and useful framework for modeling an d processing images, Bayesian multiscale analysis is often much less computationally demanding compared to classical Markov random field models. This paper focuses on a probabilistic graph model called the multiscale hidden Markov model (MHMM), which captures the key inter-scale dependencies present in natural image intensities. The MHMM framework presented here is specifically designed for photon-limited imagin applications involving Poisson statistics, and applications to image intensity analysis are examined.

  13. Mixed-order phase transition in a minimal, diffusion-based spin model.

    PubMed

    Fronczak, Agata; Fronczak, Piotr

    2016-07-01

    In this paper we exactly solve, within the grand canonical ensemble, a minimal spin model with the hybrid phase transition. We call the model diffusion based because its Hamiltonian can be recovered from a simple dynamic procedure, which can be seen as an equilibrium statistical mechanics representation of a biased random walk. We outline the derivation of the phase diagram of the model, in which the triple point has the hallmarks of the hybrid transition: discontinuity in the average magnetization and algebraically diverging susceptibilities. At this point, two second-order transition curves meet in equilibrium with the first-order curve, resulting in a prototypical mixed-order behavior.

  14. Six-vertex model and Schramm-Loewner evolution.

    PubMed

    Kenyon, Richard; Miller, Jason; Sheffield, Scott; Wilson, David B

    2017-05-01

    Square ice is a statistical mechanics model for two-dimensional ice, widely believed to have a conformally invariant scaling limit. We associate a Peano (space-filling) curve to a square ice configuration, and more generally to a so-called six-vertex model configuration, and argue that its scaling limit is a space-filling version of the random fractal curve SLE_{κ}, Schramm-Loewner evolution with parameter κ, where 4<κ≤12+8sqrt[2]. For square ice, κ=12. At the "free-fermion point" of the six-vertex model, κ=8+4sqrt[3]. These unusual values lie outside the classical interval 2≤κ≤8.

  15. ABrox-A user-friendly Python module for approximate Bayesian computation with a focus on model comparison.

    PubMed

    Mertens, Ulf Kai; Voss, Andreas; Radev, Stefan

    2018-01-01

    We give an overview of the basic principles of approximate Bayesian computation (ABC), a class of stochastic methods that enable flexible and likelihood-free model comparison and parameter estimation. Our new open-source software called ABrox is used to illustrate ABC for model comparison on two prominent statistical tests, the two-sample t-test and the Levene-Test. We further highlight the flexibility of ABC compared to classical Bayesian hypothesis testing by computing an approximate Bayes factor for two multinomial processing tree models. Last but not least, throughout the paper, we introduce ABrox using the accompanied graphical user interface.

  16. Three novel approaches to structural identifiability analysis in mixed-effects models.

    PubMed

    Janzén, David L I; Jirstrand, Mats; Chappell, Michael J; Evans, Neil D

    2016-05-06

    Structural identifiability is a concept that considers whether the structure of a model together with a set of input-output relations uniquely determines the model parameters. In the mathematical modelling of biological systems, structural identifiability is an important concept since biological interpretations are typically made from the parameter estimates. For a system defined by ordinary differential equations, several methods have been developed to analyse whether the model is structurally identifiable or otherwise. Another well-used modelling framework, which is particularly useful when the experimental data are sparsely sampled and the population variance is of interest, is mixed-effects modelling. However, established identifiability analysis techniques for ordinary differential equations are not directly applicable to such models. In this paper, we present and apply three different methods that can be used to study structural identifiability in mixed-effects models. The first method, called the repeated measurement approach, is based on applying a set of previously established statistical theorems. The second method, called the augmented system approach, is based on augmenting the mixed-effects model to an extended state-space form. The third method, called the Laplace transform mixed-effects extension, is based on considering the moment invariants of the systems transfer function as functions of random variables. To illustrate, compare and contrast the application of the three methods, they are applied to a set of mixed-effects models. Three structural identifiability analysis methods applicable to mixed-effects models have been presented in this paper. As method development of structural identifiability techniques for mixed-effects models has been given very little attention, despite mixed-effects models being widely used, the methods presented in this paper provides a way of handling structural identifiability in mixed-effects models previously not possible. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. PCI fuel failure analysis: a report on a cooperative program undertaken by Pacific Northwest Laboratory and Chalk River Nuclear Laboratories.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mohr, C.L.; Pankaskie, P.J.; Heasler, P.G.

    Reactor fuel failure data sets in the form of initial power (P/sub i/), final power (P/sub f/), transient increase in power (..delta..P), and burnup (Bu) were obtained for pressurized heavy water reactors (PHWRs), boiling water reactors (BWRs), and pressurized water reactors (PWRs). These data sets were evaluated and used as the basis for developing two predictive fuel failure models, a graphical concept called the PCI-OGRAM, and a nonlinear regression based model called PROFIT. The PCI-OGRAM is an extension of the FUELOGRAM developed by AECL. It is based on a critical threshold concept for stress dependent stress corrosion cracking. The PROFITmore » model, developed at Pacific Northwest Laboratory, is the result of applying standard statistical regression methods to the available PCI fuel failure data and an analysis of the environmental and strain rate dependent stress-strain properties of the Zircaloy cladding.« less

  18. Network inference using informative priors

    PubMed Central

    Mukherjee, Sach; Speed, Terence P.

    2008-01-01

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling. PMID:18799736

  19. Network inference using informative priors.

    PubMed

    Mukherjee, Sach; Speed, Terence P

    2008-09-23

    Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.

  20. Are Value-Added Measures of High Effectiveness Related to Students' Enrollment and Success in College? ACT Research Report Series 2016 (10)

    ERIC Educational Resources Information Center

    Bassiri, Dina

    2016-01-01

    One outcome of the implementation of No Child Left Behind Act of 2001 and its call for better accountability in public schools across the nation has been the use of student assessment data in measuring schools' effectiveness. In general, inferences about schools' effectiveness depend on the type of statistical model used to link student assessment…

  1. Boosting Stochastic Problem Solvers Through Online Self-Analysis of Performance

    DTIC Science & Technology

    2003-07-21

    Boosting Stochastic Problem Solvers Through Online Self-Analysis of Performance Vincent A. Cicirello CMU-RI-TR-03-27 Submitted in partial fulfillment...AND SUBTITLE Boosting Stochastic Problem Solvers Through Online Self-Analysis of Performance 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM...lead to the development of a search control framework, called QD-BEACON that uses online -generated statistical models of search performance to

  2. Developing and validating a new national remote health advice syndromic surveillance system in England.

    PubMed

    Harcourt, S E; Morbey, R A; Loveridge, P; Carrilho, L; Baynham, D; Povey, E; Fox, P; Rutter, J; Moores, P; Tiffen, J; Bellerby, S; McIntosh, P; Large, S; McMenamin, J; Reynolds, A; Ibbotson, S; Smith, G E; Elliot, A J

    2017-03-01

    Public Health England (PHE) coordinates a suite of real-time national syndromic surveillance systems monitoring general practice, emergency department and remote health advice data. We describe the development and informal evaluation of a new syndromic surveillance system using NHS 111 remote health advice data. NHS 111 syndromic indicators were monitored daily at national and local level. Statistical models were applied to daily data to identify significant exceedances; statistical baselines were developed for each syndrome and area using a multi-level hierarchical mixed effects model. Between November 2013 and October 2014, there were on average 19 095 NHS 111 calls each weekday and 43 084 each weekend day in the PHE dataset. There was a predominance of females using the service (57%); highest percentage of calls received was in the age group 1-4 years (14%). This system was used to monitor respiratory and gastrointestinal infections over the winter of 2013-14, the potential public health impact of severe flooding across parts of southern England and poor air quality episodes across England in April 2014. This new system complements and supplements the existing PHE syndromic surveillance systems and is now integrated into the routine daily processes that form this national syndromic surveillance service. © Crown copyright 2016.

  3. A three-dimensional refractive index model for simulation of optical wave propagation in atmospheric turbulence

    NASA Astrophysics Data System (ADS)

    Paramonov, P. V.; Vorontsov, A. M.; Kunitsyn, V. E.

    2015-10-01

    Numerical modeling of optical wave propagation in atmospheric turbulence is traditionally performed with using the so-called "split"-operator method, when the influence of the propagation medium's refractive index inhomogeneities is accounted for only within a system of infinitely narrow layers (phase screens) where phase is distorted. Commonly, under certain assumptions, such phase screens are considered as mutually statistically uncorrelated. However, in several important applications including laser target tracking, remote sensing, and atmospheric imaging, accurate optical field propagation modeling assumes upper limitations on interscreen spacing. The latter situation can be observed, for instance, in the presence of large-scale turbulent inhomogeneities or in deep turbulence conditions, where interscreen distances become comparable with turbulence outer scale and, hence, corresponding phase screens cannot be statistically uncorrelated. In this paper, we discuss correlated phase screens. The statistical characteristics of screens are calculated based on a representation of turbulent fluctuations of three-dimensional (3D) refractive index random field as a set of sequentially correlated 3D layers displaced in the wave propagation direction. The statistical characteristics of refractive index fluctuations are described in terms of the von Karman power spectrum density. In the representation of these 3D layers by corresponding phase screens, the geometrical optics approximation is used.

  4. Japan Report

    DTIC Science & Technology

    1985-03-18

    increased profitability, productivity and more effective management ." Members of the British side also called for an increase in Japan’s defense...copyright transfer, or rental permission is called technology trade. According to the Statistic Bureau of the Management and Coordina- tion Agency...1- 8.8 -Swizerland 6.3 5.6 Others 13.1 Source: Statistics Bureau of the Management and Coordination Agency China. 66.2% of Japan’s imported

  5. Use of transport models for wildfire behavior simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Linn, R.R.; Harlow, F.H.

    1998-01-01

    Investigators have attempted to describe the behavior of wildfires for over fifty years. Current models for numerical description are mainly algebraic and based on statistical or empirical ideas. The authors have developed a transport model called FIRETEC. The use of transport formulations connects the propagation rates to the full conservation equations for energy, momentum, species concentrations, mass, and turbulence. In this paper, highlights of the model formulation and results are described. The goal of the FIRETEC model is to describe most probable average behavior of wildfires in a wide variety of conditions. FIRETEC represents the essence of the combination ofmore » many small-scale processes without resolving each process in complete detail.« less

  6. SCOUT: A Fast Monte-Carlo Modeling Tool of Scintillation Camera Output

    PubMed Central

    Hunter, William C. J.; Barrett, Harrison H.; Lewellen, Thomas K.; Miyaoka, Robert S.; Muzi, John P.; Li, Xiaoli; McDougald, Wendy; MacDonald, Lawrence R.

    2011-01-01

    We have developed a Monte-Carlo photon-tracking and readout simulator called SCOUT to study the stochastic behavior of signals output from a simplified rectangular scintillation-camera design. SCOUT models the salient processes affecting signal generation, transport, and readout. Presently, we compare output signal statistics from SCOUT to experimental results for both a discrete and a monolithic camera. We also benchmark the speed of this simulation tool and compare it to existing simulation tools. We find this modeling tool to be relatively fast and predictive of experimental results. Depending on the modeled camera geometry, we found SCOUT to be 4 to 140 times faster than other modeling tools. PMID:22072297

  7. SCOUT: a fast Monte-Carlo modeling tool of scintillation camera output†

    PubMed Central

    Hunter, William C J; Barrett, Harrison H.; Muzi, John P.; McDougald, Wendy; MacDonald, Lawrence R.; Miyaoka, Robert S.; Lewellen, Thomas K.

    2013-01-01

    We have developed a Monte-Carlo photon-tracking and readout simulator called SCOUT to study the stochastic behavior of signals output from a simplified rectangular scintillation-camera design. SCOUT models the salient processes affecting signal generation, transport, and readout of a scintillation camera. Presently, we compare output signal statistics from SCOUT to experimental results for both a discrete and a monolithic camera. We also benchmark the speed of this simulation tool and compare it to existing simulation tools. We find this modeling tool to be relatively fast and predictive of experimental results. Depending on the modeled camera geometry, we found SCOUT to be 4 to 140 times faster than other modeling tools. PMID:23640136

  8. A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

    NASA Astrophysics Data System (ADS)

    Guillen, George; Rainey, Gail; Morin, Michelle

    2004-04-01

    Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.

  9. Representation of the contextual statistical model by hyperbolic amplitudes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Khrennikov, Andrei

    We continue the development of a so-called contextual statistical model (here context has the meaning of a complex of physical conditions). It is shown that, besides contexts producing the conventional trigonometric cos-interference, there exist contexts producing the hyperbolic cos-interference. Starting with the corresponding interference formula of total probability we represent such contexts by hyperbolic probabilistic amplitudes or in the abstract formalism by normalized vectors of a hyperbolic analogue of the Hilbert space. There is obtained a hyperbolic Born's rule. Incompatible observables are represented by noncommutative operators. This paper can be considered as the first step towards hyperbolic quantum probability. Wemore » also discuss possibilities of experimental verification of hyperbolic quantum mechanics: in physics of elementary particles, string theory as well as in experiments with nonphysical systems, e.g., in psychology, cognitive sciences, and economy.« less

  10. Representation of the contextual statistical model by hyperbolic amplitudes

    NASA Astrophysics Data System (ADS)

    Khrennikov, Andrei

    2005-06-01

    We continue the development of a so-called contextual statistical model (here context has the meaning of a complex of physical conditions). It is shown that, besides contexts producing the conventional trigonometric cos-interference, there exist contexts producing the hyperbolic cos-interference. Starting with the corresponding interference formula of total probability we represent such contexts by hyperbolic probabilistic amplitudes or in the abstract formalism by normalized vectors of a hyperbolic analogue of the Hilbert space. There is obtained a hyperbolic Born's rule. Incompatible observables are represented by noncommutative operators. This paper can be considered as the first step towards hyperbolic quantum probability. We also discuss possibilities of experimental verification of hyperbolic quantum mechanics: in physics of elementary particles, string theory as well as in experiments with nonphysical systems, e.g., in psychology, cognitive sciences, and economy.

  11. Optimization models for degrouping population data.

    PubMed

    Bermúdez, Silvia; Blanquero, Rafael

    2016-07-01

    In certain countries population data are available in grouped form only, usually as quinquennial age groups plus a large open-ended range for the elderly. However, official statistics call for data by individual age since many statistical operations, such as the calculation of demographic indicators, require the use of ungrouped population data. In this paper a number of mathematical models are proposed which, starting from population data given in age groups, enable these ranges to be degrouped into age-specific population values without leaving a fractional part. Unlike other existing procedures for disaggregating demographic data, ours makes it possible to process several years' data simultaneously in a coherent way, and provides accurate results longitudinally as well as transversally. This procedure is also shown to be helpful in dealing with degrouped population data affected by noise, such as those affected by the age-heaping phenomenon.

  12. Image Filtering with Boolean and Statistical Operators.

    DTIC Science & Technology

    1983-12-01

    S3(2) COMPLEX AMAT(256, 4). BMAT (256. 4). CMAT(256. 4) CALL IOF(3. MAIN. AFLNM. DFLNI, CFLNM. MS., 82, S3) CALL OPEN(1.AFLNM* 1.IER) CALL CHECKC!ER...RDBLK(2. 6164. MAT. 16, IER) CALL CHECK(IER) DO I K-1. 4 DO I J-1.256 CMAT(J. K)-AMAT(J. K)’. BMAT (J. K) I CONTINUE S CALL WRBLK(3. 164!. CMAT. 16. IER

  13. Impact of the mass media on calls to the CDC National AIDS Hotline.

    PubMed

    Fan, D P

    1996-06-01

    This paper considers new computer methodologies for assessing the impact of different types of public health information. The example used public service announcements (PSAs) and mass media news to predict the volume of attempts to call the CDC National AIDS Hotline from December 1992 through to the end of 1993. The analysis relied solely on data from electronic databases. Newspaper stories and television news transcripts were obtained from the NEXIS electronic database and were scored by machine for AIDS coverage. The PSA database was generated by computer monitoring of advertising distributed by the Centers for Disease Control and Prevention (CDC) and by others. The volume of call attempts was collected automatically by the public branch exchange (PBX) of the Hotline telephone system. The call attempts, the PSAs and the news story data were related to each other using both a standard time series method and the statistical model of ideodynamics. The analysis indicated that the only significant explanatory variable for the call attempts was PSAs produced by the CDC. One possible explanation was that these commercials all included the Hotline telephone number while the other information sources did not.

  14. Impact of Homeland Security Alert level on calls to a law enforcement peer support hotline.

    PubMed

    Omer, Saad B; Barnett, Daniel J; Castellano, Cherie; Wierzba, Rachel K; Hiremath, Girish S; Balicer, Ran D; Everly, George S

    2007-01-01

    The Homeland Security Advisory System (HSAS) was established by the Department of Homeland Security to communicate the risk of a terrorist event. In order to explore the potential psychological impacts of HSAS we analyzed the effects of terror alerts on the law enforcement community. We used data from the New Jersey Cop 2 Cop crisis intervention hotline. Incidence Rate Ratios--interpreted as average relative increases in the daily number of calls to the Cop 2 Cop hotline during an increased alert period--were computed from Poisson models. The hotline received a total of 4,145 initial calls during the study period. The mean daily number of calls was higher during alert level elevation compared to prior 7 days (7.68 vs. 8.00). In the Poisson regression analysis, the Incidence Rate Ratios of number of calls received during elevated alert levels compared to the reference period of seven days preceding each change in alert were close to 1, with confidence intervals crossing 1 (i.e. not statistically significant) for all lag periods evaluated. This investigation, in the context of New Jersey law enforcement personnel, does not support the concern that elevating the alert status places undue stress upon alert recipients.

  15. Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

    PubMed Central

    Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

    2017-01-01

    A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059

  16. Developing Statistical Literacy with Year 9 Students: A Collaborative Research Project

    ERIC Educational Resources Information Center

    Sharma, Sashi

    2013-01-01

    Advances in technology and communication have increased the amount of statistical information delivered through everyday media. The importance of statistics in everyday life has led to calls for increased attention to statistical literacy in the mathematics curriculum (Watson 2006). Gal (2004) sees statistical literacy as the need for students to…

  17. Genotyping and inflated type I error rate in genome-wide association case/control studies

    PubMed Central

    Sampson, Joshua N; Zhao, Hongyu

    2009-01-01

    Background One common goal of a case/control genome wide association study (GWAS) is to find SNPs associated with a disease. Traditionally, the first step in such studies is to assign a genotype to each SNP in each subject, based on a statistic summarizing fluorescence measurements. When the distributions of the summary statistics are not well separated by genotype, the act of genotype assignment can lead to more potential problems than acknowledged by the literature. Results Specifically, we show that the proportions of each called genotype need not equal the true proportions in the population, even as the number of subjects grows infinitely large. The called genotypes for two subjects need not be independent, even when their true genotypes are independent. Consequently, p-values from tests of association can be anti-conservative, even when the distributions of the summary statistic for the cases and controls are identical. To address these problems, we propose two new tests designed to reduce the inflation in the type I error rate caused by these problems. The first algorithm, logiCALL, measures call quality by fully exploring the likelihood profile of intensity measurements, and the second algorithm avoids genotyping by using a likelihood ratio statistic. Conclusion Genotyping can introduce avoidable false positives in GWAS. PMID:19236714

  18. The impact of media campaigns on smoking cessation activity: a structural vector autoregression analysis.

    PubMed

    Langley, Tessa E; McNeill, Ann; Lewis, Sarah; Szatkowski, Lisa; Quinn, Casey

    2012-11-01

    To evaluate the effect of tobacco control media campaigns and pharmaceutical company-funded advertising for nicotine replacement therapy (NRT) on smoking cessation activity. Multiple time series analysis using structural vector autoregression, January 2002-May 2010. England and Wales. Tobacco control campaign data from the Central Office of Information; commercial NRT campaign data; data on calls to the National Health Service (NHS) stop smoking helpline from the Department of Health; point-of-sale data on over-the-counter (OTC) sales of NRT; and prescribing data from The Health Improvement Network (THIN), a database of UK primary care records. Monthly calls to the NHS stop smoking helpline and monthly rates of OTC sales and prescribing of NRT. A 1% increase in tobacco control television ratings (TVRs), a standard measure of advertising exposure, was associated with a statistically significant 0.085% increase in calls in the same month (P = 0.007), and no statistically significant effect in subsequent months. Tobacco control TVRs were not associated with OTC NRT sales or prescribed NRT. NRT advertising TVRs had a significant effect on NRT sales which became non-significant in the seasonally adjusted model, and no significant effect on prescribing or calls. Tobacco control campaigns appear to be more effective at triggering quitting behaviour than pharmaceutical company NRT campaigns. Any effect of such campaigns on quitting behaviour seems to be restricted to the month of the campaign, suggesting that such campaigns need to be sustained over time. © 2012 The Authors, Addiction © 2012 Society for the Study of Addiction.

  19. A bilayer Double Semion model with symmetry-enriched topological order

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ortiz, L., E-mail: lauraort@ucm.es; Martin-Delgado, M.A.

    2016-12-15

    We construct a new model of two-dimensional quantum spin systems that combines intrinsic topological orders and a global symmetry called flavour symmetry. It is referred as the bilayer Doubled Semion model (bDS) and is an instance of symmetry-enriched topological order. A honeycomb bilayer lattice is introduced to combine a Double Semion Topological Order with a global spin–flavour symmetry to get the fractionalization of its quasiparticles. The bDS model exhibits non-trivial braiding self-statistics of excitations and its dual model constitutes a Symmetry-Protected Topological Order with novel edge states. This dual model gives rise to a bilayer Non-Trivial Paramagnet that is invariantmore » under the flavour symmetry and the well-known spin flip symmetry.« less

  20. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  1. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  2. Modeling the origins of mammalian sociality: moderate evidence for matrilineal signatures in mouse lemur vocalizations.

    PubMed

    Kessler, Sharon E; Radespiel, Ute; Hasiniaina, Alida I F; Leliveld, Lisette M C; Nash, Leanne T; Zimmermann, Elke

    2014-02-20

    Maternal kin selection is a driving force in the evolution of mammalian social complexity and it requires that kin are distinctive from nonkin. The transition from the ancestral state of asociality to the derived state of complex social groups is thought to have occurred via solitary foraging, in which individuals forage alone, but, unlike the asocial ancestors, maintain dispersed social networks via scent-marks and vocalizations. We hypothesize that matrilineal signatures in vocalizations were an important part of these networks. We used the solitary foraging gray mouse lemur (Microcebus murinus) as a model for ancestral solitary foragers and tested for matrilineal signatures in their calls, thus investigating whether such signatures are already present in solitary foragers and could have facilitated the kin selection thought to have driven the evolution of increased social complexity in mammals. Because agonism can be very costly, selection for matrilineal signatures in agonistic calls should help reduce agonism between unfamiliar matrilineal kin. We conducted this study on a well-studied population of wild mouse lemurs at Ankarafantsika National Park, Madagascar. We determined pairwise relatedness using seven microsatellite loci, matrilineal relatedness by sequencing the mitrochondrial D-loop, and sleeping group associations using radio-telemetry. We recorded agonistic calls during controlled social encounters and conducted a multi-parametric acoustic analysis to determine the spectral and temporal structure of the agonistic calls. We measured 10 calls for each of 16 females from six different matrilineal kin groups. Calls were assigned to their matriline at a rate significantly higher than chance (pDFA: correct = 47.1%, chance = 26.7%, p = 0.03). There was a statistical trend for a negative correlation between acoustic distance and relatedness (Mantel Test: g = -1.61, Z = 4.61, r = -0.13, p = 0.058). Mouse lemur agonistic calls are moderately distinctive by matriline. Because sleeping groups consisted of close maternal kin, both genetics and social learning may have generated these acoustic signatures. As mouse lemurs are models for solitary foragers, we recommend further studies testing whether the lemurs use these calls to recognize kin. This would enable further modeling of how kin recognition in ancestral species could have shaped the evolution of complex sociality.

  3. VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA

    PubMed Central

    Garcia, Ramon I.; Ibrahim, Joseph G.; Zhu, Hongtu

    2009-01-01

    We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology. PMID:20336190

  4. GAPIT: genome association and prediction integrated tool.

    PubMed

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  5. JIGSAW: Preference-directed, co-operative scheduling

    NASA Technical Reports Server (NTRS)

    Linden, Theodore A.; Gaw, David

    1992-01-01

    Techniques that enable humans and machines to cooperate in the solution of complex scheduling problems have evolved out of work on the daily allocation and scheduling of Tactical Air Force resources. A generalized, formal model of these applied techniques is being developed. It is called JIGSAW by analogy with the multi-agent, constructive process used when solving jigsaw puzzles. JIGSAW begins from this analogy and extends it by propagating local preferences into global statistics that dynamically influence the value and variable ordering decisions. The statistical projections also apply to abstract resources and time periods--allowing more opportunities to find a successful variable ordering by reserving abstract resources and deferring the choice of a specific resource or time period.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marekova, Elisaveta

    Series of relatively large earthquakes in different regions of the Earth are studied. The regions chooses are of a high seismic activity and has a good contemporary network for recording of the seismic events along them. The main purpose of this investigation is the attempt to describe analytically the seismic process in the space and time. We are considering the statistical distributions the distances and the times between consecutive earthquakes (so called pair analysis). Studies conducted on approximating the statistical distribution of the parameters of consecutive seismic events indicate the existence of characteristic functions that describe them best. Such amore » mathematical description allows the distributions of the examined parameters to be compared to other model distributions.« less

  7. Bladder cancer mapping in Libya based on standardized morbidity ratio and log-normal model

    NASA Astrophysics Data System (ADS)

    Alhdiri, Maryam Ahmed; Samat, Nor Azah; Mohamed, Zulkifley

    2017-05-01

    Disease mapping contains a set of statistical techniques that detail maps of rates based on estimated mortality, morbidity, and prevalence. A traditional approach to measure the relative risk of the disease is called Standardized Morbidity Ratio (SMR). It is the ratio of an observed and expected number of accounts in an area, which has the greatest uncertainty if the disease is rare or if geographical area is small. Therefore, Bayesian models or statistical smoothing based on Log-normal model are introduced which might solve SMR problem. This study estimates the relative risk for bladder cancer incidence in Libya from 2006 to 2007 based on the SMR and log-normal model, which were fitted to data using WinBUGS software. This study starts with a brief review of these models, starting with the SMR method and followed by the log-normal model, which is then applied to bladder cancer incidence in Libya. All results are compared using maps and tables. The study concludes that the log-normal model gives better relative risk estimates compared to the classical method. The log-normal model has can overcome the SMR problem when there is no observed bladder cancer in an area.

  8. Hold My Calls: An Activity for Introducing the Statistical Process

    ERIC Educational Resources Information Center

    Abel, Todd; Poling, Lisa

    2015-01-01

    Working with practicing teachers, this article demonstrates, through the facilitation of a statistical activity, how to introduce and investigate the unique qualities of the statistical process including: formulate a question, collect data, analyze data, and interpret data.

  9. Health Statistics

    MedlinePlus

    ... births, deaths, marriages, and divorces are sometimes called "vital statistics." Researchers use statistics to see patterns of diseases in groups of people. This can help in figuring out who is at risk for certain diseases, finding ways to control diseases and deciding which diseases ...

  10. The Reasoning behind Informal Statistical Inference

    ERIC Educational Resources Information Center

    Makar, Katie; Bakker, Arthur; Ben-Zvi, Dani

    2011-01-01

    Informal statistical inference (ISI) has been a frequent focus of recent research in statistics education. Considering the role that context plays in developing ISI calls into question the need to be more explicit about the reasoning that underpins ISI. This paper uses educational literature on informal statistical inference and philosophical…

  11. An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples. Volume 2: Software documentation

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes, These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.

  12. An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples, volume 1

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.

  13. Seeking for the rational basis of the median model: the optimal combination of multi-model ensemble results

    NASA Astrophysics Data System (ADS)

    Riccio, A.; Giunta, G.; Galmarini, S.

    2007-04-01

    In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.

  14. Seeking for the rational basis of the Median Model: the optimal combination of multi-model ensemble results

    NASA Astrophysics Data System (ADS)

    Riccio, A.; Giunta, G.; Galmarini, S.

    2007-12-01

    In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.

  15. Study of cliff activity dominating the gas and dust comae of comet 67P/Churyumov-Gerasimenko during the early phase of the Rosetta mission using ROSINA/COPS and OSIRIS data

    NASA Astrophysics Data System (ADS)

    Marschall, Raphael; Su, Cheng-Chin; Liao, Ying; Rubin, Martin; Wu, Jong-Shinn; Thomas, Nicolas; altwegg, kathrin; Sierks, Holger; OSIRIS, ROSINA

    2016-10-01

    The study by [1] has proposed the idea that the cometary dust jets in the northern hemisphere of comet 67P/Churyumov-Gerasimenko arise mainly from rough cliff like terrain. Using our 3D gas and dust dynamics coma model [2] we have run simulations targeting the question whether areas with high gravitational slopes alone can indeed account for both the ROSINA/COPS and the OSIRIS data obtained for mid August to end October 2014.The basis of our simulations is the shape model "SHAP4S" of [3]. Surface temperatures have been defined using a simple 1-D thermal model (including insolation, shadowing, thermal emission, sublimation but neglecting conduction) computed for each facet of the shape model allowing a consistent and known description of the gas flux and its initial temperature. In a next step we use the DSMC program PDSC++ [4] to calculate the gas properties in 3D space. The gas solution can be compared with the in situ measurements by ROSINA/COPS. In a subsequent step dust particles are introduced into the gas flow to determine dust densities and with a column integrator and Mie theory dust brightnesses that can be compared to OSIRIS data.To examine cliff activity we have divided the surface into two sets. One with gravitational slopes larger than 30° which we call cliffs and one with slopes less than 30° which we shall call plains. We have set up two models, "cliff only" and "plains only" where the respective set of areas are active and the others inert. The outgassing areas are assumed to be purely insolation driven. The "cliffs only" model is a statistically equally good fit to the ROSINA/COPS data as the global insolation driven model presented in [2]. The "plains only" model on the other hand is statistically inferior to the "cliffs only" model. We found in [2] that increased activity in the Hapi region (called inhomogeneous model) of the comet improves the fit of the gas results significantly. We can show in this study that a "cliffs + Hapi" model fits the ROSINA/COPS data equally well as the inhomogeneous model. These results are consistent with OSIRIS data.[1] Vincent et al., 2016, A&A, 587, A14[2] Marschall et al., 2016; A&A, 589, A90[3] Preusker et al., 2015, A&A 583, A33[4] Su, C. C., 2013

  16. The development of ensemble theory. A new glimpse at the history of statistical mechanics

    NASA Astrophysics Data System (ADS)

    Inaba, Hajime

    2015-12-01

    This paper investigates the history of statistical mechanics from the viewpoint of the development of the ensemble theory from 1871 to 1902. In 1871, Ludwig Boltzmann introduced a prototype model of an ensemble that represents a polyatomic gas. In 1879, James Clerk Maxwell defined an ensemble as copies of systems of the same energy. Inspired by H.W. Watson, he called his approach "statistical". Boltzmann and Maxwell regarded the ensemble theory as a much more general approach than the kinetic theory. In the 1880s, influenced by Hermann von Helmholtz, Boltzmann made use of ensembles to establish thermodynamic relations. In Elementary Principles in Statistical Mechanics of 1902, Josiah Willard Gibbs tried to get his ensemble theory to mirror thermodynamics, including thermodynamic operations in its scope. Thermodynamics played the role of a "blind guide". His theory of ensembles can be characterized as more mathematically oriented than Einstein's theory proposed in the same year. Mechanical, empirical, and statistical approaches to foundations of statistical mechanics are presented. Although it was formulated in classical terms, the ensemble theory provided an infrastructure still valuable in quantum statistics because of its generality.

  17. A Method of Relating General Circulation Model Simulated Climate to the Observed Local Climate. Part I: Seasonal Statistics.

    NASA Astrophysics Data System (ADS)

    Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David

    1990-10-01

    Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.

  18. First Polarized Power Spectra from HERA-19 Commissioning Data: Comparison with Simulations

    NASA Astrophysics Data System (ADS)

    Igarashi, Amy; Chichura, Paul; Fox Fortino, Austin; Kohn, Saul; Aguirre, James; HERA Collaboration, CHAMP

    2018-01-01

    The Hydrogen Epoch of Reionization Array (HERA) is a radio telescope whose primary goal is the detection of redshifted 21-cm line radiation produced from the spin-flip transition of HI during the Epoch of Reionization (EoR). HERA is currently under construction in South Africa, and will eventually be an array of 350 14-m antennas. HERA aims for a statistical detection of the power spectrum of this emission, using the so-called delay spectrum technique (Parsons et al 2012). We examine a first season of commissioning data from the first 19 elements (HERA-19) to characterize Galactic and extragalactic foregrounds. We compare the delay spectrum for HERA-19 constructed from data to those constructed from simulations done using a detailed instrument electromagnetic model and using the unpolarized Global Sky Model (GSM2008). We compare the data and simulations to explore the effects of Stokes-I to Q and U leakage, and further examine whether statistical models of polarization match the observed polarized power spectra.

  19. Optimized design and analysis of preclinical intervention studies in vivo

    PubMed Central

    Laajala, Teemu D.; Jumppanen, Mikael; Huhtaniemi, Riikka; Fey, Vidal; Kaur, Amanpreet; Knuuttila, Matias; Aho, Eija; Oksala, Riikka; Westermarck, Jukka; Mäkelä, Sari; Poutanen, Matti; Aittokallio, Tero

    2016-01-01

    Recent reports have called into question the reproducibility, validity and translatability of the preclinical animal studies due to limitations in their experimental design and statistical analysis. To this end, we implemented a matching-based modelling approach for optimal intervention group allocation, randomization and power calculations, which takes full account of the complex animal characteristics at baseline prior to interventions. In prostate cancer xenograft studies, the method effectively normalized the confounding baseline variability, and resulted in animal allocations which were supported by RNA-seq profiling of the individual tumours. The matching information increased the statistical power to detect true treatment effects at smaller sample sizes in two castration-resistant prostate cancer models, thereby leading to saving of both animal lives and research costs. The novel modelling approach and its open-source and web-based software implementations enable the researchers to conduct adequately-powered and fully-blinded preclinical intervention studies, with the aim to accelerate the discovery of new therapeutic interventions. PMID:27480578

  20. Optimized design and analysis of preclinical intervention studies in vivo.

    PubMed

    Laajala, Teemu D; Jumppanen, Mikael; Huhtaniemi, Riikka; Fey, Vidal; Kaur, Amanpreet; Knuuttila, Matias; Aho, Eija; Oksala, Riikka; Westermarck, Jukka; Mäkelä, Sari; Poutanen, Matti; Aittokallio, Tero

    2016-08-02

    Recent reports have called into question the reproducibility, validity and translatability of the preclinical animal studies due to limitations in their experimental design and statistical analysis. To this end, we implemented a matching-based modelling approach for optimal intervention group allocation, randomization and power calculations, which takes full account of the complex animal characteristics at baseline prior to interventions. In prostate cancer xenograft studies, the method effectively normalized the confounding baseline variability, and resulted in animal allocations which were supported by RNA-seq profiling of the individual tumours. The matching information increased the statistical power to detect true treatment effects at smaller sample sizes in two castration-resistant prostate cancer models, thereby leading to saving of both animal lives and research costs. The novel modelling approach and its open-source and web-based software implementations enable the researchers to conduct adequately-powered and fully-blinded preclinical intervention studies, with the aim to accelerate the discovery of new therapeutic interventions.

  1. Learning coefficient of generalization error in Bayesian estimation and vandermonde matrix-type singularity.

    PubMed

    Aoyagi, Miki; Nagata, Kenji

    2012-06-01

    The term algebraic statistics arises from the study of probabilistic models and techniques for statistical inference using methods from algebra and geometry (Sturmfels, 2009 ). The purpose of our study is to consider the generalization error and stochastic complexity in learning theory by using the log-canonical threshold in algebraic geometry. Such thresholds correspond to the main term of the generalization error in Bayesian estimation, which is called a learning coefficient (Watanabe, 2001a , 2001b ). The learning coefficient serves to measure the learning efficiencies in hierarchical learning models. In this letter, we consider learning coefficients for Vandermonde matrix-type singularities, by using a new approach: focusing on the generators of the ideal, which defines singularities. We give tight new bound values of learning coefficients for the Vandermonde matrix-type singularities and the explicit values with certain conditions. By applying our results, we can show the learning coefficients of three-layered neural networks and normal mixture models.

  2. Statistical dynamics of religion evolutions

    NASA Astrophysics Data System (ADS)

    Ausloos, M.; Petroni, F.

    2009-10-01

    A religion affiliation can be considered as a “degree of freedom” of an agent on the human genre network. A brief review is given on the state of the art in data analysis and modelization of religious “questions” in order to suggest and if possible initiate further research, after using a “statistical physics filter”. We present a discussion of the evolution of 18 so-called religions, as measured through their number of adherents between 1900 and 2000. Some emphasis is made on a few cases presenting a minimum or a maximum in the investigated time range-thereby suggesting a competitive ingredient to be considered, besides the well accepted “at birth” attachment effect. The importance of the “external field” is still stressed through an Avrami late stage crystal growth-like parameter. The observed features and some intuitive interpretations point to opinion based models with vector, rather than scalar, like agents.

  3. Blind prediction of natural video quality.

    PubMed

    Saad, Michele A; Bovik, Alan C; Charrier, Christophe

    2014-03-01

    We propose a blind (no reference or NR) video quality evaluation model that is nondistortion specific. The approach relies on a spatio-temporal model of video scenes in the discrete cosine transform domain, and on a model that characterizes the type of motion occurring in the scenes, to predict video quality. We use the models to define video statistics and perceptual features that are the basis of a video quality assessment (VQA) algorithm that does not require the presence of a pristine video to compare against in order to predict a perceptual quality score. The contributions of this paper are threefold. 1) We propose a spatio-temporal natural scene statistics (NSS) model for videos. 2) We propose a motion model that quantifies motion coherency in video scenes. 3) We show that the proposed NSS and motion coherency models are appropriate for quality assessment of videos, and we utilize them to design a blind VQA algorithm that correlates highly with human judgments of quality. The proposed algorithm, called video BLIINDS, is tested on the LIVE VQA database and on the EPFL-PoliMi video database and shown to perform close to the level of top performing reduced and full reference VQA algorithms.

  4. On the Emergent Constraints of Climate Sensitivity [On proposed emergent constraints of climate sensitivity

    DOE PAGES

    Qu, Xin; Hall, Alex; DeAngelis, Anthony M.; ...

    2018-01-11

    Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less

  5. On the Emergent Constraints of Climate Sensitivity [On proposed emergent constraints of climate sensitivity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qu, Xin; Hall, Alex; DeAngelis, Anthony M.

    Differences among climate models in equilibrium climate sensitivity (ECS; the equilibrium surface temperature response to a doubling of atmospheric CO2) remain a significant barrier to the accurate assessment of societally important impacts of climate change. Relationships between ECS and observable metrics of the current climate in model ensembles, so-called emergent constraints, have been used to constrain ECS. Here a statistical method (including a backward selection process) is employed to achieve a better statistical understanding of the connections between four recently proposed emergent constraint metrics and individual feedbacks influencing ECS. The relationship between each metric and ECS is largely attributable tomore » a statistical connection with shortwave low cloud feedback, the leading cause of intermodel ECS spread. This result bolsters confidence in some of the metrics, which had assumed such a connection in the first place. Additional analysis is conducted with a few thousand artificial metrics that are randomly generated but are well correlated with ECS. The relationships between the contrived metrics and ECS can also be linked statistically to shortwave cloud feedback. Thus, any proposed or forthcoming ECS constraint based on the current generation of climate models should be viewed as a potential constraint on shortwave cloud feedback, and physical links with that feedback should be investigated to verify that the constraint is real. Additionally, any proposed ECS constraint should not be taken at face value since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.« less

  6. Modelling a real-world buried valley system with vertical non-stationarity using multiple-point statistics

    NASA Astrophysics Data System (ADS)

    He, Xiulan; Sonnenborg, Torben O.; Jørgensen, Flemming; Jensen, Karsten H.

    2017-03-01

    Stationarity has traditionally been a requirement of geostatistical simulations. A common way to deal with non-stationarity is to divide the system into stationary sub-regions and subsequently merge the realizations for each region. Recently, the so-called partition approach that has the flexibility to model non-stationary systems directly was developed for multiple-point statistics simulation (MPS). The objective of this study is to apply the MPS partition method with conventional borehole logs and high-resolution airborne electromagnetic (AEM) data, for simulation of a real-world non-stationary geological system characterized by a network of connected buried valleys that incise deeply into layered Miocene sediments (case study in Denmark). The results show that, based on fragmented information of the formation boundaries, the MPS partition method is able to simulate a non-stationary system including valley structures embedded in a layered Miocene sequence in a single run. Besides, statistical information retrieved from the AEM data improved the simulation of the geology significantly, especially for the deep-seated buried valley sediments where borehole information is sparse.

  7. Machine learning to analyze images of shocked materials for precise and accurate measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dresselhaus-Cooper, Leora; Howard, Marylesa; Hock, Margaret C.

    A supervised machine learning algorithm, called locally adaptive discriminant analysis (LADA), has been developed to locate boundaries between identifiable image features that have varying intensities. LADA is an adaptation of image segmentation, which includes techniques that find the positions of image features (classes) using statistical intensity distributions for each class in the image. In order to place a pixel in the proper class, LADA considers the intensity at that pixel and the distribution of intensities in local (nearby) pixels. This paper presents the use of LADA to provide, with statistical uncertainties, the positions and shapes of features within ultrafast imagesmore » of shock waves. We demonstrate the ability to locate image features including crystals, density changes associated with shock waves, and material jetting caused by shock waves. This algorithm can analyze images that exhibit a wide range of physical phenomena because it does not rely on comparison to a model. LADA enables analysis of images from shock physics with statistical rigor independent of underlying models or simulations.« less

  8. Regularized learning of linear ordered-statistic constant false alarm rate filters (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Havens, Timothy C.; Cummings, Ian; Botts, Jonathan; Summers, Jason E.

    2017-05-01

    The linear ordered statistic (LOS) is a parameterized ordered statistic (OS) that is a weighted average of a rank-ordered sample. LOS operators are useful generalizations of aggregation as they can represent any linear aggregation, from minimum to maximum, including conventional aggregations, such as mean and median. In the fuzzy logic field, these aggregations are called ordered weighted averages (OWAs). Here, we present a method for learning LOS operators from training data, viz., data for which you know the output of the desired LOS. We then extend the learning process with regularization, such that a lower complexity or sparse LOS can be learned. Hence, we discuss what 'lower complexity' means in this context and how to represent that in the optimization procedure. Finally, we apply our learning methods to the well-known constant-false-alarm-rate (CFAR) detection problem, specifically for the case of background levels modeled by long-tailed distributions, such as the K-distribution. These backgrounds arise in several pertinent imaging problems, including the modeling of clutter in synthetic aperture radar and sonar (SAR and SAS) and in wireless communications.

  9. Calls to Florida Poison Control Centers about mercury: Trends over 2003-2013.

    PubMed

    Gribble, Matthew O; Deshpande, Aniruddha; Stephan, Wendy B; Hunter, Candis M; Weisman, Richard S

    2017-11-01

    The aim of this analysis was to contrast trends in exposure-report calls and informational queries (a measure of public interest) about mercury to the Florida Poison Control Centers over 2003-2013. Poison-control specialists coded calls to Florida Poison Control Centers by substance of concern, caller demographics, and whether the call pertained to an exposure event or was an informational query. For the present study, call records regarding mercury were de-identified and provided along with daily total number of calls for statistical analysis. We fit Poisson models using generalized estimating equations to summarize changes across years in counts of daily calls to Florida Poison Control Centers, adjusting for month. In a second stage of analysis, we further adjusted for the total number of calls each day. We also conducted analyses stratified by age of the exposed. There was an overall decrease over 2003-2013 in the number of total calls about mercury [Ratio per year: 0.89, 95% CI: (0.88, 0.90)], and calls about mercury exposure [Ratio per year: 0.84, 95% CI: (0.83, 0.85)], but the number of informational queries about mercury increased over this time [Ratio per year: 1.15 (95% CI: 1.12, 1.18)]. After adjusting for the number of calls of that type each day (e.g., call volume), the associations remained similar: a ratio of 0.88 (95% CI: 0.87, 0.89) per year for total calls, 0.85 (0.83, 0.86) for exposure-related calls, and 1.17 (1.14, 1.21) for informational queries. Although, the number of exposure-related calls decreased, informational queries increased over 2003-2013. This might suggest an increased public interest in mercury health risks despite a decrease in reported exposures over this time period. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Trends and fluctuations in the severity of interstate wars

    PubMed Central

    Clauset, Aaron

    2018-01-01

    Since 1945, there have been relatively few large interstate wars, especially compared to the preceding 30 years, which included both World Wars. This pattern, sometimes called the long peace, is highly controversial. Does it represent an enduring trend caused by a genuine change in the underlying conflict-generating processes? Or is it consistent with a highly variable but otherwise stable system of conflict? Using the empirical distributions of interstate war sizes and onset times from 1823 to 2003, we parameterize stationary models of conflict generation that can distinguish trends from statistical fluctuations in the statistics of war. These models indicate that both the long peace and the period of great violence that preceded it are not statistically uncommon patterns in realistic but stationary conflict time series. This fact does not detract from the importance of the long peace or the proposed mechanisms that explain it. However, the models indicate that the postwar pattern of peace would need to endure at least another 100 to 140 years to become a statistically significant trend. This fact places an implicit upper bound on the magnitude of any change in the true likelihood of a large war after the end of the Second World War. The historical patterns of war thus seem to imply that the long peace may be substantially more fragile than proponents believe, despite recent efforts to identify mechanisms that reduce the likelihood of interstate wars. PMID:29507877

  11. Humidity-corrected Arrhenius equation: The reference condition approach.

    PubMed

    Naveršnik, Klemen; Jurečič, Rok

    2016-03-16

    Accelerated and stress stability data is often used to predict shelf life of pharmaceuticals. Temperature, combined with humidity accelerates chemical decomposition and the Arrhenius equation is used to extrapolate accelerated stability results to long-term stability. Statistical estimation of the humidity-corrected Arrhenius equation is not straightforward due to its non-linearity. A two stage nonlinear fitting approach is used in practice, followed by a prediction stage. We developed a single-stage statistical procedure, called the reference condition approach, which has better statistical properties (less collinearity, direct estimation of uncertainty, narrower prediction interval) and is significantly easier to use, compared to the existing approaches. Our statistical model was populated with data from a 35-day stress stability study on a laboratory batch of vitamin tablets and required mere 30 laboratory assay determinations. The stability prediction agreed well with the actual 24-month long term stability of the product. The approach has high potential to assist product formulation, specification setting and stability statements. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Statistical Inference for Porous Materials using Persistent Homology.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moon, Chul; Heath, Jason E.; Mitchell, Scott A.

    2017-12-01

    We propose a porous materials analysis pipeline using persistent homology. We rst compute persistent homology of binarized 3D images of sampled material subvolumes. For each image we compute sets of homology intervals, which are represented as summary graphics called persistence diagrams. We convert persistence diagrams into image vectors in order to analyze the similarity of the homology of the material images using the mature tools for image analysis. Each image is treated as a vector and we compute its principal components to extract features. We t a statistical model using the loadings of principal components to estimate material porosity, permeability,more » anisotropy, and tortuosity. We also propose an adaptive version of the structural similarity index (SSIM), a similarity metric for images, as a measure to determine the statistical representative elementary volumes (sREV) for persistence homology. Thus we provide a capability for making a statistical inference of the uid ow and transport properties of porous materials based on their geometry and connectivity.« less

  13. On statistical independence of a contingency matrix

    NASA Astrophysics Data System (ADS)

    Tsumoto, Shusaku; Hirano, Shoji

    2005-03-01

    A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other with the information on a partition of universe generated by these attributes. Thus, this table can be viewed as a relation between two attributes with respect to information granularity. This paper focuses on several characteristics of linear and statistical independence in a contingency table from the viewpoint of granular computing, which shows that statistical independence in a contingency table is a special form of linear dependence. The discussions also show that when a contingency table is viewed as a matrix, called a contingency matrix, its rank is equal to 1.0. Thus, the degree of independence, rank plays a very important role in extracting a probabilistic model from a given contingency table. Furthermore, it is found that in some cases, partial rows or columns will satisfy the condition of statistical independence, which can be viewed as a solving process of Diophatine equations.

  14. Acoustic Mechanisms of a Species-Based Discrimination of the chick-a-dee Call in Sympatric Black-Capped (Poecile atricapillus) and Mountain Chickadees (P. gambeli).

    PubMed

    Guillette, Lauren M; Farrell, Tara M; Hoeschele, Marisa; Sturdy, Christopher B

    2010-01-01

    Previous perceptual research with black-capped and mountain chickadees has demonstrated that these species treat each other's namesake chick-a-dee calls as belonging to separate, open-ended categories. Further, the terminal dee portion of the call has been implicated as the most prominent species marker. However, statistical classification using acoustic summary features suggests that all note-types contained within the chick-a-dee call should be sufficient for species classification. The current study seeks to better understand the note-type based mechanisms underlying species-based classification of the chick-a-dee call by black-capped and mountain chickadees. In two, complementary, operant discrimination experiments, both species were trained to discriminate the species of the signaler using either entire chick-a-dee calls, or individual note-types from chick-a-dee calls. In agreement with previous perceptual work we find that the D note had significant stimulus control over species-based discrimination. However, in line with statistical classifications, we find that all note-types carry species information. We discuss reasons why the most easily discriminated note-types are likely candidates to carry species-based cues.

  15. Testing alternative ground water models using cross-validation and other methods

    USGS Publications Warehouse

    Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.

    2007-01-01

    Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.

  16. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises

    PubMed Central

    Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise. PMID:28692667

  17. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises.

    PubMed

    Jin, Qiyu; Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise.

  18. A statistical approach to understanding reproductive isolation in two sympatric species of tree crickets.

    PubMed

    Bhattacharya, Monisha; Isvaran, Kavita; Balakrishnan, Rohini

    2017-04-01

    In acoustically communicating animals, reproductive isolation between sympatric species is usually maintained through species-specific calls. This requires that the receiver be tuned to the conspecific signal. Mapping the response space of the receiver onto the signal space of the conspecific investigates this tuning. A combinatorial approach to investigating the response space is more informative as the influence on the receiver of the interactions between the features is also elucidated. However, most studies have examined individual preference functions rather than the multivariate response space. We studied the maintenance of reproductive isolation between two sympatric tree cricket species ( Oecanthus henryi and Oecanthus indicus ) through the temporal features of the calls. Individual response functions were determined experimentally for O. henryi , the results from which were combined in a statistical framework to generate a multivariate quantitative receiver response space. The predicted response was higher for the signals of the conspecific than for signals of the sympatric heterospecific, indicating maintenance of reproductive isolation through songs. The model allows prediction of response to untested combinations of temporal features as well as delineation of the evolutionary constraints on the signal space. The model can also be used to predict the response of O. henryi to other heterospecific signals, making it a useful tool for the study of the evolution and maintenance of reproductive isolation via long-range acoustic signals. © 2017. Published by The Company of Biologists Ltd.

  19. Using GAISE and NCTM Standards as Frameworks for Teaching Probability and Statistics to Pre-Service Elementary and Middle School Mathematics Teachers

    ERIC Educational Resources Information Center

    Metz, Mary Louise

    2010-01-01

    Statistics education has become an increasingly important component of the mathematics education of today's citizens. In part to address the call for a more statistically literate citizenship, The "Guidelines for Assessment and Instruction in Statistics Education (GAISE)" were developed in 2005 by the American Statistical Association. These…

  20. Region-specific network plasticity in simulated and living cortical networks: comparison of the center of activity trajectory (CAT) with other statistics

    NASA Astrophysics Data System (ADS)

    Chao, Zenas C.; Bakkum, Douglas J.; Potter, Steve M.

    2007-09-01

    Electrically interfaced cortical networks cultured in vitro can be used as a model for studying the network mechanisms of learning and memory. Lasting changes in functional connectivity have been difficult to detect with extracellular multi-electrode arrays using standard firing rate statistics. We used both simulated and living networks to compare the ability of various statistics to quantify functional plasticity at the network level. Using a simulated integrate-and-fire neural network, we compared five established statistical methods to one of our own design, called center of activity trajectory (CAT). CAT, which depicts dynamics of the location-weighted average of spatiotemporal patterns of action potentials across the physical space of the neuronal circuitry, was the most sensitive statistic for detecting tetanus-induced plasticity in both simulated and living networks. By reducing the dimensionality of multi-unit data while still including spatial information, CAT allows efficient real-time computation of spatiotemporal activity patterns. Thus, CAT will be useful for studies in vivo or in vitro in which the locations of recording sites on multi-electrode probes are important.

  1. Statistical mechanics of human resource allocation

    NASA Astrophysics Data System (ADS)

    Inoue, Jun-Ichi; Chen, He

    2014-03-01

    We provide a mathematical platform to investigate the network topology of agents, say, university graduates who are looking for their positions in labor markets. The basic model is described by the so-called Potts spin glass which is well-known in the research field of statistical physics. In the model, each Potts spin (a tiny magnet in atomic scale length) represents the action of each student, and it takes a discrete variable corresponding to the company he/she applies for. We construct the energy to include three distinct effects on the students' behavior, namely, collective effect, market history and international ranking of companies. In this model system, the correlations (the adjacent matrix) between students are taken into account through the pairwise spin-spin interactions. We carry out computer simulations to examine the efficiency of the model. We also show that some chiral representation of the Potts spin enables us to obtain some analytical insights into our labor markets. This work was financially supported by Grant-in-Aid for Scientific Research (C) of Japan Society for the Promotion of Science No. 25330278.

  2. STATISTICS OF THE VELOCITY GRADIENT TENSOR IN SPACE PLASMA TURBULENT FLOWS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Consolini, Giuseppe; Marcucci, Maria Federica; Pallocchia, Giuseppe

    2015-10-10

    In the last decade, significant advances have been presented for the theoretical characterization and experimental techniques used to measure and model all of the components of the velocity gradient tensor in the framework of fluid turbulence. Here, we attempt the evaluation of the small-scale velocity gradient tensor for a case study of space plasma turbulence, observed in the Earth's magnetosheath region by the CLUSTER mission. In detail, we investigate the joint statistics P(R, Q) of the velocity gradient geometric invariants R and Q, and find that this P(R, Q) is similar to that of the low end of the inertialmore » range for fluid turbulence, with a pronounced increase in the statistics along the so-called Vieillefosse tail. In the context of hydrodynamics, this result is referred to as the dissipation/dissipation-production due to vortex stretching.« less

  3. Associations Between the Department of Veterans Affairs' Suicide Prevention Campaign and Calls to Related Crisis Lines

    PubMed Central

    Bossarte, Robert M.; Lu, Naiji; Tu, Xin; Stephens, Brady; Draper, John; Kemp, Janet E.

    2014-01-01

    Objective The Transit Authority Suicide Prevention (TASP) campaign was launched by the Department of Veterans Affairs (VA) in a limited number of U.S. cities to promote the use of crisis lines among veterans of military service. Methods We obtained the daily number of calls to the VCL and National Suicide Prevention Lifeline (NSPL) for six implementation cities (where the campaign was active) and four control cities (where there was no TASP campaign messaging) for a 14-month period. To identify changes in call volume associated with campaign implementation, VCL and NSPL daily call counts for three time periods of equal length (pre-campaign, during campaign, and post-campaign) were modeled using a Poisson log-linear regression with inference based on the generalized estimating equations. Results Statistically significant increases in calls to both the VCL and the NSPL were reported during the TASP campaign in implementation cities, but were not reported in control cities during or following the campaign. Secondary outcome measures were also reported for the VCL and included the percentage of callers who are veterans, and calls resulting in a rescue during the study period. Conclusions Results from this study reveal some promise for suicide prevention messaging to promote the use of telephone crisis services and contribute to an emerging area of research examining the effects of campaigns on help seeking. PMID:25364053

  4. Associations between the Department of Veterans Affairs' suicide prevention campaign and calls to related crisis lines.

    PubMed

    Bossarte, Robert M; Karras, Elizabeth; Lu, Naiji; Tu, Xin; Stephens, Brady; Draper, John; Kemp, Janet E

    2014-01-01

    The Transit Authority Suicide Prevention (TASP) campaign was launched by the Department of Veterans Affairs (VA) in a limited number of U.S. cities to promote the use of crisis lines among veterans of military service. We obtained the daily number of calls to the VCL and National Suicide Prevention Lifeline (NSPL) for six implementation cities (where the campaign was active) and four control cities (where there was no TASP campaign messaging) for a 14-month period. To identify changes in call volume associated with campaign implementation, VCL and NSPL daily call counts for three time periods of equal length (pre-campaign, during campaign, and post-campaign) were modeled using a Poisson log-linear regression with inference based on the generalized estimating equations. Statistically significant increases in calls to both the VCL and the NSPL were reported during the TASP campaign in implementation cities, but were not reported in control cities during or following the campaign. Secondary outcome measures were also reported for the VCL and included the percentage of callers who are veterans, and calls resulting in a rescue during the study period. Results from this study reveal some promise for suicide prevention messaging to promote the use of telephone crisis services and contribute to an emerging area of research examining the effects of campaigns on help seeking.

  5. Fiber Breakage Model for Carbon Composite Stress Rupture Phenomenon: Theoretical Development and Applications

    NASA Technical Reports Server (NTRS)

    Murthy, Pappu L. N.; Phoenix, S. Leigh; Grimes-Ledesma, Lorie

    2010-01-01

    Stress rupture failure of Carbon Composite Overwrapped Pressure Vessels (COPVs) is of serious concern to Science Mission and Constellation programs since there are a number of COPVs on board space vehicles with stored gases under high pressure for long durations of time. It has become customary to establish the reliability of these vessels using the so called classic models. The classical models are based on Weibull statistics fitted to observed stress rupture data. These stochastic models cannot account for any additional damage due to the complex pressure-time histories characteristic of COPVs being supplied for NASA missions. In particular, it is suspected that the effects of proof test could significantly reduce the stress rupture lifetime of COPVs. The focus of this paper is to present an analytical appraisal of a model that incorporates damage due to proof test. The model examined in the current paper is based on physical mechanisms such as micromechanics based load sharing concepts coupled with creep rupture and Weibull statistics. For example, the classic model cannot accommodate for damage due to proof testing which every flight vessel undergoes. The paper compares current model to the classic model with a number of examples. In addition, several applications of the model to current ISS and Constellation program issues are also examined.

  6. Can fatigue affect acquisition of new surgical skills? A prospective trial of pre- and post-call general surgery residents using the da Vinci surgical skills simulator.

    PubMed

    Robison, Weston; Patel, Sonya K; Mehta, Akshat; Senkowski, Tristan; Allen, John; Shaw, Eric; Senkowski, Christopher K

    2018-03-01

    To study the effects of fatigue on general surgery residents' performance on the da Vinci Skills Simulator (dVSS). 15 General Surgery residents from various postgraduate training years (PGY2, PGY3, PGY4, and PGY5) performed 5 simulation tasks on the dVSS as recommended by the Robotic Training Network (RTN). The General Surgery residents had no prior experience with the dVSS. Participants were assigned to either the Pre-call group or Post-call group based on call schedule. As a measure of subjective fatigue, residents were given the Epworth Sleepiness Scale (ESS) prior to their dVSS testing. The dVSS MScore™ software recorded various metrics (Objective Structured Assessment of Technical Skills, OSATS) that were used to evaluate the performance of each resident to compare the robotic simulation proficiency between the Pre-call and Post-call groups. Six general surgery residents were stratified into the Pre-call group and nine into the Post-call group. These residents were also stratified into Fatigued (10) or Nonfatigued (5) groups, as determined by their reported ESS scores. A statistically significant difference was found between the Pre-call and Post-call reported sleep hours (p = 0.036). There was no statistically significant difference between the Pre-call and Post-call groups or between the Fatigued and Nonfatigued groups in time to complete exercise, number of attempts, and high MScore™ score. Despite variation in fatigue levels, there was no effect on the acquisition of robotic simulator skills.

  7. The intermediates take it all: asymptotics of higher criticism statistics and a powerful alternative based on equal local levels.

    PubMed

    Gontscharuk, Veronika; Landwehr, Sandra; Finner, Helmut

    2015-01-01

    The higher criticism (HC) statistic, which can be seen as a normalized version of the famous Kolmogorov-Smirnov statistic, has a long history, dating back to the mid seventies. Originally, HC statistics were used in connection with goodness of fit (GOF) tests but they recently gained some attention in the context of testing the global null hypothesis in high dimensional data. The continuing interest for HC seems to be inspired by a series of nice asymptotic properties related to this statistic. For example, unlike Kolmogorov-Smirnov tests, GOF tests based on the HC statistic are known to be asymptotically sensitive in the moderate tails, hence it is favorably applied for detecting the presence of signals in sparse mixture models. However, some questions around the asymptotic behavior of the HC statistic are still open. We focus on two of them, namely, why a specific intermediate range is crucial for GOF tests based on the HC statistic and why the convergence of the HC distribution to the limiting one is extremely slow. Moreover, the inconsistency in the asymptotic and finite behavior of the HC statistic prompts us to provide a new HC test that has better finite properties than the original HC test while showing the same asymptotics. This test is motivated by the asymptotic behavior of the so-called local levels related to the original HC test. By means of numerical calculations and simulations we show that the new HC test is typically more powerful than the original HC test in normal mixture models. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  9. MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data.

    PubMed

    Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping

    2016-05-15

    Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses

    PubMed Central

    Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah

    2015-01-01

    Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481

  11. Reflexion on linear regression trip production modelling method for ensuring good model quality

    NASA Astrophysics Data System (ADS)

    Suprayitno, Hitapriya; Ratnasari, Vita

    2017-11-01

    Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.

  12. Constraining MHD Disk-Winds with X-ray Absorbers

    NASA Astrophysics Data System (ADS)

    Fukumura, Keigo; Tombesi, F.; Shrader, C. R.; Kazanas, D.; Contopoulos, J.; Behar, E.

    2014-01-01

    From the state-of-the-art spectroscopic observations of active galactic nuclei (AGNs) the robust features of absorption lines (e.g. most notably by H/He-like ions), called warm absorbers (WAs), have been often detected in soft X-rays (< 2 keV). While the identified WAs are often mildly blueshifted to yield line-of-sight velocities up to ~100-3,000 km/sec in typical X-ray-bright Seyfert 1 AGNs, a fraction of Seyfert galaxies such as PG 1211+143 exhibits even faster absorbers (v/ 0.1-0.2) called ultra-fast outflows (UFOs) whose physical condition is much more extreme compared with the WAs. Motivated by these recent X-ray data we show that the magnetically- driven accretion-disk wind model is a plausible scenario to explain the characteristic property of these X-ray absorbers. As a preliminary case study we demonstrate that the wind model parameters (e.g. viewing angle and wind density) can be constrained by data from PG 1211+143 at a statistically significant level with chi-squared spectral analysis. Our wind models can thus be implemented into the standard analysis package, XSPEC, as a table spectrum model for general analysis of X-ray absorbers.

  13. Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

    NASA Technical Reports Server (NTRS)

    Wallace, G. R.; Weathers, G. D.; Graf, E. R.

    1973-01-01

    The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.

  14. A New Statistical Model of Electroencephalogram Noise Spectra for Real-Time Brain-Computer Interfaces.

    PubMed

    Paris, Alan; Atia, George K; Vosoughi, Azadeh; Berman, Stephen A

    2017-08-01

    A characteristic of neurological signal processing is high levels of noise from subcellular ion channels up to whole-brain processes. In this paper, we propose a new model of electroencephalogram (EEG) background periodograms, based on a family of functions which we call generalized van der Ziel-McWhorter (GVZM) power spectral densities (PSDs). To the best of our knowledge, the GVZM PSD function is the only EEG noise model that has relatively few parameters, matches recorded EEG PSD's with high accuracy from 0 to over 30 Hz, and has approximately 1/f θ behavior in the midfrequencies without infinities. We validate this model using three approaches. First, we show how GVZM PSDs can arise in a population of ion channels at maximum entropy equilibrium. Second, we present a class of mixed autoregressive models, which simulate brain background noise and whose periodograms are asymptotic to the GVZM PSD. Third, we present two real-time estimation algorithms for steady-state visual evoked potential (SSVEP) frequencies, and analyze their performance statistically. In pairwise comparisons, the GVZM-based algorithms showed statistically significant accuracy improvement over two well-known and widely used SSVEP estimators. The GVZM noise model can be a useful and reliable technique for EEG signal processing. Understanding EEG noise is essential for EEG-based neurology and applications such as real-time brain-computer interfaces, which must make accurate control decisions from very short data epochs. The GVZM approach represents a successful new paradigm for understanding and managing this neurological noise.

  15. Bubbles and denaturation in DNA

    NASA Astrophysics Data System (ADS)

    van Erp, T. S.; Cuesta-López, S.; Peyrard, M.

    2006-08-01

    The local opening of DNA is an intriguing phenomenon from a statistical-physics point of view, but is also essential for its biological function. For instance, the transcription and replication of our genetic code cannot take place without the unwinding of the DNA double helix. Although these biological processes are driven by proteins, there might well be a relation between these biological openings and the spontaneous bubble formation due to thermal fluctuations. Mesoscopic models, like the Peyrard-Bishop-Dauxois (PBD) model, have fairly accurately reproduced some experimental denaturation curves and the sharp phase transition in the thermodynamic limit. It is, hence, tempting to see whether these models could be used to predict the biological activity of DNA. In a previous study, we introduced a method that allows to obtain very accurate results on this subject, which showed that some previous claims in this direction, based on molecular-dynamics studies, were premature. This could either imply that the present PBD model should be improved or that biological activity can only be predicted in a more complex framework that involves interactions with proteins and super helical stresses. In this article, we give a detailed description of the statistical method introduced before. Moreover, for several DNA sequences, we give a thorough analysis of the bubble-statistics as a function of position and bubble size and the so-called l-denaturation curves that can be measured experimentally. These show that some important experimental observations are missing in the present model. We discuss how the present model could be improved.

  16. Protein structure modeling for CASP10 by multiple layers of global optimization.

    PubMed

    Joo, Keehyoung; Lee, Juyong; Sim, Sangjin; Lee, Sun Young; Lee, Kiho; Heo, Seungryong; Lee, In-Ho; Lee, Sung Jong; Lee, Jooyoung

    2014-02-01

    In the template-based modeling (TBM) category of CASP10 experiment, we introduced a new protocol called protein modeling system (PMS) to generate accurate protein structures in terms of side-chains as well as backbone trace. In the new protocol, a global optimization algorithm, called conformational space annealing (CSA), is applied to the three layers of TBM procedure: multiple sequence-structure alignment, 3D chain building, and side-chain re-modeling. For 3D chain building, we developed a new energy function which includes new distance restraint terms of Lorentzian type (derived from multiple templates), and new energy terms that combine (physical) energy terms such as dynamic fragment assembly (DFA) energy, DFIRE statistical potential energy, hydrogen bonding term, etc. These physical energy terms are expected to guide the structure modeling especially for loop regions where no template structures are available. In addition, we developed a new quality assessment method based on random forest machine learning algorithm to screen templates, multiple alignments, and final models. For TBM targets of CASP10, we find that, due to the combination of three stages of CSA global optimizations and quality assessment, the modeling accuracy of PMS improves at each additional stage of the protocol. It is especially noteworthy that the side-chains of the final PMS models are far more accurate than the models in the intermediate steps. Copyright © 2013 Wiley Periodicals, Inc.

  17. Mesh Dependence on Shear Driven Boundary Layers in Stable Stratification Generated by Large Eddy-Simulation

    NASA Astrophysics Data System (ADS)

    Berg, Jacob; Patton, Edward G.; Sullivan, Peter S.

    2017-11-01

    The effect of mesh resolution and size on shear driven atmospheric boundary layers in a stable stratified environment is investigated with the NCAR pseudo-spectral LES model (J. Atmos. Sci. v68, p2395, 2011 and J. Atmos. Sci. v73, p1815, 2016). The model applies FFT in the two horizontal directions and finite differencing in the vertical direction. With vanishing heat flux at the surface and a capping inversion entraining potential temperature into the boundary layer the situation is often called the conditional neutral atmospheric boundary layer (ABL). Due to its relevance in high wind applications such as wind power meteorology, we emphasize on second order statistics important for wind turbines including spectral information. The simulations range from mesh sizes of 643 to 10243 grid points. Due to the non-stationarity of the problem, different simulations are compared at equal eddy-turnover times. Whereas grid convergence is mostly achieved in the middle portion of the ABL, statistics close to the surface of the ABL, where the presence of the ground limits the growth of the energy containing eddies, second order statistics are not converged on the studies meshes. Higher order structure functions also reveal non-Gaussian statistics highly dependent on the resolution.

  18. A Localized Ensemble Kalman Smoother

    NASA Technical Reports Server (NTRS)

    Butala, Mark D.

    2012-01-01

    Numerous geophysical inverse problems prove difficult because the available measurements are indirectly related to the underlying unknown dynamic state and the physics governing the system may involve imperfect models or unobserved parameters. Data assimilation addresses these difficulties by combining the measurements and physical knowledge. The main challenge in such problems usually involves their high dimensionality and the standard statistical methods prove computationally intractable. This paper develops and addresses the theoretical convergence of a new high-dimensional Monte-Carlo approach called the localized ensemble Kalman smoother.

  19. Large-Eddy Simulations of Dust Devils and Convective Vortices

    NASA Astrophysics Data System (ADS)

    Spiga, Aymeric; Barth, Erika; Gu, Zhaolin; Hoffmann, Fabian; Ito, Junshi; Jemmett-Smith, Bradley; Klose, Martina; Nishizawa, Seiya; Raasch, Siegfried; Rafkin, Scot; Takemi, Tetsuya; Tyler, Daniel; Wei, Wei

    2016-11-01

    In this review, we address the use of numerical computations called Large-Eddy Simulations (LES) to study dust devils, and the more general class of atmospheric phenomena they belong to (convective vortices). We describe the main elements of the LES methodology. We review the properties, statistics, and variability of dust devils and convective vortices resolved by LES in both terrestrial and Martian environments. The current challenges faced by modelers using LES for dust devils are also discussed in detail.

  20. Application of Statistical Linear Time-Varying System Theory to Modeling of High Grazing Angle Sea Clutter

    DTIC Science & Technology

    2017-10-25

    radar returns from a large object (such as a planet) in radio astronomy as a function of delay and Doppler shift using a so-called “scattering...from a planet in radar astronomy . Van Trees also briefly describes the scattering function in his 8 Corey D. Cooke most well-known book [7], as does... astronomy – communication via fluctuating multipath media,” rept. 234, MIT Lincoln Laboratory (October 1960). 6. P. E. Green, Jr., “Radar astronomy

  1. New assessment of a structural alphabet

    PubMed Central

    de Brevern, Alexandre G.

    2005-01-01

    Summary A statistical analysis of the Protein Databank (PDB) structures had led us to define a set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one defined by the (Φ, Ψ) dihedral angles of 5 consecutive residues. Here, we analyze the effect of the enlargement of the PDB on the PBs’ definition. The results highlight the quality of the 3D approximation ensured by the PBs. These last could be of great interest in ab initio modeling. PMID:15996119

  2. Resampling: A Marriage of Computers and Statistics. ERIC/TM Digest.

    ERIC Educational Resources Information Center

    Rudner, Lawrence M.; Shafer, Mary Morello

    Advances in computer technology are making it possible for educational researchers to use simpler statistical methods to address a wide range of questions with smaller data sets and fewer, and less restrictive, assumptions. This digest introduces computationally intensive statistics, collectively called resampling techniques. Resampling is a…

  3. A Novel Signal Modeling Approach for Classification of Seizure and Seizure-Free EEG Signals.

    PubMed

    Gupta, Anubha; Singh, Pushpendra; Karlekar, Mandar

    2018-05-01

    This paper presents a signal modeling-based new methodology of automatic seizure detection in EEG signals. The proposed method consists of three stages. First, a multirate filterbank structure is proposed that is constructed using the basis vectors of discrete cosine transform. The proposed filterbank decomposes EEG signals into its respective brain rhythms: delta, theta, alpha, beta, and gamma. Second, these brain rhythms are statistically modeled with the class of self-similar Gaussian random processes, namely, fractional Brownian motion and fractional Gaussian noises. The statistics of these processes are modeled using a single parameter called the Hurst exponent. In the last stage, the value of Hurst exponent and autoregressive moving average parameters are used as features to design a binary support vector machine classifier to classify pre-ictal, inter-ictal (epileptic with seizure free interval), and ictal (seizure) EEG segments. The performance of the classifier is assessed via extensive analysis on two widely used data set and is observed to provide good accuracy on both the data set. Thus, this paper proposes a novel signal model for EEG data that best captures the attributes of these signals and hence, allows to boost the classification accuracy of seizure and seizure-free epochs.

  4. Sparse intervertebral fence composition for 3D cervical vertebra segmentation

    NASA Astrophysics Data System (ADS)

    Liu, Xinxin; Yang, Jian; Song, Shuang; Cong, Weijian; Jiao, Peifeng; Song, Hong; Ai, Danni; Jiang, Yurong; Wang, Yongtian

    2018-06-01

    Statistical shape models are capable of extracting shape prior information, and are usually utilized to assist the task of segmentation of medical images. However, such models require large training datasets in the case of multi-object structures, and it also is difficult to achieve satisfactory results for complex shapes. This study proposed a novel statistical model for cervical vertebra segmentation, called sparse intervertebral fence composition (SiFC), which can reconstruct the boundary between adjacent vertebrae by modeling intervertebral fences. The complex shape of the cervical spine is replaced by a simple intervertebral fence, which considerably reduces the difficulty of cervical segmentation. The final segmentation results are obtained by using a 3D active contour deformation model without shape constraint, which substantially enhances the recognition capability of the proposed method for objects with complex shapes. The proposed segmentation framework is tested on a dataset with CT images from 20 patients. A quantitative comparison against corresponding reference vertebral segmentation yields an overall mean absolute surface distance of 0.70 mm and a dice similarity index of 95.47% for cervical vertebral segmentation. The experimental results show that the SiFC method achieves competitive cervical vertebral segmentation performances, and completely eliminates inter-process overlap.

  5. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

    PubMed

    Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger

    2017-09-01

    The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).

  6. Context-Aware Generative Adversarial Privacy

    NASA Astrophysics Data System (ADS)

    Huang, Chong; Kairouz, Peter; Chen, Xiao; Sankar, Lalitha; Rajagopal, Ram

    2017-12-01

    Preserving the utility of published datasets while simultaneously providing provable privacy guarantees is a well-known challenge. On the one hand, context-free privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, context-aware privacy solutions, such as information theoretic privacy, achieve an improved privacy-utility tradeoff, but assume that the data holder has access to dataset statistics. We circumvent these limitations by introducing a novel context-aware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals' private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP's performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive game-theoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in a generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics.

  7. An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples. Volume 3: Structure and listing of programs

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.

  8. ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations

    PubMed Central

    Wright, Mark H.; Tung, Chih-Wei; Zhao, Keyan; Reynolds, Andy; McCouch, Susan R.; Bustamante, Carlos D.

    2010-01-01

    Motivation: The development of new high-throughput genotyping products requires a significant investment in testing and training samples to evaluate and optimize the product before it can be used reliably on new samples. One reason for this is current methods for automated calling of genotypes are based on clustering approaches which require a large number of samples to be analyzed simultaneously, or an extensive training dataset to seed clusters. In systems where inbred samples are of primary interest, current clustering approaches perform poorly due to the inability to clearly identify a heterozygote cluster. Results: As part of the development of two custom single nucleotide polymorphism genotyping products for Oryza sativa (domestic rice), we have developed a new genotype calling algorithm called ‘ALCHEMY’ based on statistical modeling of the raw intensity data rather than modelless clustering. A novel feature of the model is the ability to estimate and incorporate inbreeding information on a per sample basis allowing accurate genotyping of both inbred and heterozygous samples even when analyzed simultaneously. Since clustering is not used explicitly, ALCHEMY performs well on small sample sizes with accuracy exceeding 99% with as few as 18 samples. Availability: ALCHEMY is available for both commercial and academic use free of charge and distributed under the GNU General Public License at http://alchemy.sourceforge.net/ Contact: mhw6@cornell.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20926420

  9. Statistical inference and Aristotle's Rhetoric.

    PubMed

    Macdonald, Ranald R

    2004-11-01

    Formal logic operates in a closed system where all the information relevant to any conclusion is present, whereas this is not the case when one reasons about events and states of the world. Pollard and Richardson drew attention to the fact that the reasoning behind statistical tests does not lead to logically justifiable conclusions. In this paper statistical inferences are defended not by logic but by the standards of everyday reasoning. Aristotle invented formal logic, but argued that people mostly get at the truth with the aid of enthymemes--incomplete syllogisms which include arguing from examples, analogies and signs. It is proposed that statistical tests work in the same way--in that they are based on examples, invoke the analogy of a model and use the size of the effect under test as a sign that the chance hypothesis is unlikely. Of existing theories of statistical inference only a weak version of Fisher's takes this into account. Aristotle anticipated Fisher by producing an argument of the form that there were too many cases in which an outcome went in a particular direction for that direction to be plausibly attributed to chance. We can therefore conclude that Aristotle would have approved of statistical inference and there is a good reason for calling this form of statistical inference classical.

  10. [Emotional well-being and discomfort at work in call center].

    PubMed

    Emanuel, Federica; Colombo, Lara; Ghislieri, Chiara

    2014-01-01

    The theme of well-being and discomfort at work has attracted increasing interest in recent years. The present study, according to Job Demands-Resources model (JD-R), inquires the effects of personal (optimism, internal locus of control) and organizational resources (job autonomy, supervisors and colleagues support) and general (work-to-family conflict, workload) and context specific demands (emotional dissonance) on emotional well-being and discomfort at work in call centre employees. This research was conducted through an online questionnaire, composed by measures present in scientific literature, filled out individually by call center agents (N = 507) of the same telecommunication firm. Data analysis (PASW 18) provides: descriptive statistics, correlations and multiple regressions. Personal and organizational resources improve emotional well-being at work, except for colleagues support. Optimism and supervisors support reduce emotional discomfort at work. Among organizational demands, work-family conflict and emotional dissonance increase emotional discomfort at work and, to a lesser extent, reduce the emotional well-being at work. The results, according to theoretical model, highlight the different role of demands and resources on emotional well-being and discomfort at work. The results suggest organizational politics and investments to promote emotional well-being at work, in particular training program to support emotional skills, training for supervisors, increasing job autonomy and support to work-family balance.

  11. Chroma intra prediction based on inter-channel correlation for HEVC.

    PubMed

    Zhang, Xingyu; Gisquet, Christophe; François, Edouard; Zou, Feng; Au, Oscar C

    2014-01-01

    In this paper, we investigate a new inter-channel coding mode called LM mode proposed for the next generation video coding standard called high efficiency video coding. This mode exploits inter-channel correlation using reconstructed luma to predict chroma linearly with parameters derived from neighboring reconstructed luma and chroma pixels at both encoder and decoder to avoid overhead signaling. In this paper, we analyze the LM mode and prove that the LM parameters for predicting original chroma and reconstructed chroma are statistically the same. We also analyze the error sensitivity of the LM parameters. We identify some LM mode problematic situations and propose three novel LM-like modes called LMA, LML, and LMO to address the situations. To limit the increase in complexity due to the LM-like modes, we propose some fast algorithms with the help of some new cost functions. We further identify some potentially-problematic conditions in the parameter estimation (including regression dilution problem) and introduce a novel model correction technique to detect and correct those conditions. Simulation results suggest that considerable BD-rate reduction can be achieved by the proposed LM-like modes and model correction technique. In addition, the performance gain of the two techniques appears to be essentially additive when combined.

  12. The Intracranial Distribution of Gliomas in Relation to Exposure From Mobile Phones: Analyses From the INTERPHONE Study

    PubMed Central

    Grell, Kathrine; Frederiksen, Kirsten; Schüz, Joachim; Cardis, Elisabeth; Armstrong, Bruce; Siemiatycki, Jack; Krewski, Daniel R.; McBride, Mary L.; Johansen, Christoffer; Auvinen, Anssi; Hours, Martine; Blettner, Maria; Sadetzki, Siegal; Lagorio, Susanna; Yamaguchi, Naohito; Woodward, Alistair; Tynes, Tore; Feychting, Maria; Fleming, Sarah J.; Swerdlow, Anthony J.; Andersen, Per K.

    2016-01-01

    When investigating the association between brain tumors and use of mobile telephones, accurate data on tumor position are essential, due to the highly localized absorption of energy in the human brain from the radio-frequency fields emitted. We used a point process model to investigate this association using information that included tumor localization data from the INTERPHONE Study (Australia, Canada, Denmark, Finland, France, Germany, Israel, Italy, Japan, New Zealand, Norway, Sweden, and the United Kingdom). Our main analysis included 792 regular mobile phone users diagnosed with a glioma between 2000 and 2004. Similar to earlier results, we found a statistically significant association between the intracranial distribution of gliomas and the self-reported location of the phone. When we accounted for the preferred side of the head not being exclusively used for all mobile phone calls, the results were similar. The association was independent of the cumulative call time and cumulative number of calls. However, our model used reported side of mobile phone use, which is potentially influenced by recall bias. The point process method provides an alternative to previously used epidemiologic research designs when one is including localization in the investigation of brain tumors and mobile phone use. PMID:27810856

  13. Physical Regulation of the Self-Assembly of Tobacco Mosaic Virus Coat Protein

    PubMed Central

    Kegel, Willem K.; van der Schoot, Paul

    2006-01-01

    We present a statistical mechanical model based on the principle of mass action that explains the main features of the in vitro aggregation behavior of the coat protein of tobacco mosaic virus (TMV). By comparing our model to experimentally obtained stability diagrams, titration experiments, and calorimetric data, we pin down three competing factors that regulate the transitions between the different kinds of aggregated state of the coat protein. These are hydrophobic interactions, electrostatic interactions, and the formation of so-called “Caspar” carboxylate pairs. We suggest that these factors could be universal and relevant to a large class of virus coat proteins. PMID:16731551

  14. Design optimization and probabilistic analysis of a hydrodynamic journal bearing

    NASA Technical Reports Server (NTRS)

    Liniecki, Alexander G.

    1990-01-01

    A nonlinear constrained optimization of a hydrodynamic bearing was performed yielding three main variables: radial clearance, bearing length to diameter ratio, and lubricating oil viscosity. As an objective function a combined model of temperature rise and oil supply has been adopted. The optimized model of the bearing has been simulated for population of 1000 cases using Monte Carlo statistical method. It appeared that the so called 'optimal solution' generated more than 50 percent of failed bearings, because their minimum oil film thickness violated stipulated minimum constraint value. As a remedy change of oil viscosity is suggested after several sensitivities of variables have been investigated.

  15. MODFLOW-2000, the U.S. Geological Survey modular ground-water model; user guide to the observation, sensitivity, and parameter-estimation processes and three post-processing programs

    USGS Publications Warehouse

    Hill, Mary C.; Banta, E.R.; Harbaugh, A.W.; Anderman, E.R.

    2000-01-01

    This report documents the Observation, Sensitivity, and Parameter-Estimation Processes of the ground-water modeling computer program MODFLOW-2000. The Observation Process generates model-calculated values for comparison with measured, or observed, quantities. A variety of statistics is calculated to quantify this comparison, including a weighted least-squares objective function. In addition, a number of files are produced that can be used to compare the values graphically. The Sensitivity Process calculates the sensitivity of hydraulic heads throughout the model with respect to specified parameters using the accurate sensitivity-equation method. These are called grid sensitivities. If the Observation Process is active, it uses the grid sensitivities to calculate sensitivities for the simulated values associated with the observations. These are called observation sensitivities. Observation sensitivities are used to calculate a number of statistics that can be used (1) to diagnose inadequate data, (2) to identify parameters that probably cannot be estimated by regression using the available observations, and (3) to evaluate the utility of proposed new data. The Parameter-Estimation Process uses a modified Gauss-Newton method to adjust values of user-selected input parameters in an iterative procedure to minimize the value of the weighted least-squares objective function. Statistics produced by the Parameter-Estimation Process can be used to evaluate estimated parameter values; statistics produced by the Observation Process and post-processing program RESAN-2000 can be used to evaluate how accurately the model represents the actual processes; statistics produced by post-processing program YCINT-2000 can be used to quantify the uncertainty of model simulated values. Parameters are defined in the Ground-Water Flow Process input files and can be used to calculate most model inputs, such as: for explicitly defined model layers, horizontal hydraulic conductivity, horizontal anisotropy, vertical hydraulic conductivity or vertical anisotropy, specific storage, and specific yield; and, for implicitly represented layers, vertical hydraulic conductivity. In addition, parameters can be defined to calculate the hydraulic conductance of the River, General-Head Boundary, and Drain Packages; areal recharge rates of the Recharge Package; maximum evapotranspiration of the Evapotranspiration Package; pumpage or the rate of flow at defined-flux boundaries of the Well Package; and the hydraulic head at constant-head boundaries. The spatial variation of model inputs produced using defined parameters is very flexible, including interpolated distributions that require the summation of contributions from different parameters. Observations can include measured hydraulic heads or temporal changes in hydraulic heads, measured gains and losses along head-dependent boundaries (such as streams), flows through constant-head boundaries, and advective transport through the system, which generally would be inferred from measured concentrations. MODFLOW-2000 is intended for use on any computer operating system. The program consists of algorithms programmed in Fortran 90, which efficiently performs numerical calculations and is fully compatible with the newer Fortran 95. The code is easily modified to be compatible with FORTRAN 77. Coordination for multiple processors is accommodated using Message Passing Interface (MPI) commands. The program is designed in a modular fashion that is intended to support inclusion of new capabilities.

  16. Baseline models of trace elements in major aquifers of the United States

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace-element concentrations in baseline samples from a survey of aquifers used as potable-water supplies in the United States are summarized using methods appropriate for data with multiple detection limits. The resulting statistical distribution models are used to develop summary statistics and estimate probabilities of exceeding water-quality standards. The models are based on data from the major aquifer studies of the USGS National Water Quality Assessment (NAWQA) Program. These data were produced with a nationally-consistent sampling and analytical framework specifically designed to determine the quality of the most important potable groundwater resources during the years 1991-2001. The analytical data for all elements surveyed contain values that were below several detection limits. Such datasets are referred to as multiply-censored data. To address this issue, a robust semi-parametric statistical method called regression on order statistics (ROS) is employed. Utilizing the 90th-95th percentile as an arbitrary range for the upper limits of expected baseline concentrations, the models show that baseline concentrations of dissolved Ba and Zn are below 500 ??g/L. For the same percentile range, dissolved As, Cu and Mo concentrations are below 10 ??g/L, and dissolved Ag, Be, Cd, Co, Cr, Ni, Pb, Sb and Se are below 1-5 ??g/L. These models are also used to determine the probabilities that potable ground waters exceed drinking water standards. For dissolved Ba, Cr, Cu, Pb, Ni, Mo and Se, the likelihood of exceeding the US Environmental Protection Agency standards at the well-head is less than 1-1.5%. A notable exception is As, which has approximately a 7% chance of exceeding the maximum contaminant level (10 ??g/L) at the well head.

  17. Personalized Modeling for Prediction with Decision-Path Models

    PubMed Central

    Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.

    2015-01-01

    Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570

  18. A Nonlinear Interactions Approximation Model for Large-Eddy Simulation

    NASA Astrophysics Data System (ADS)

    Haliloglu, Mehmet U.; Akhavan, Rayhaneh

    2003-11-01

    A new approach to LES modelling is proposed based on direct approximation of the nonlinear terms \\overlineu_iuj in the filtered Navier-Stokes equations, instead of the subgrid-scale stress, τ_ij. The proposed model, which we call the Nonlinear Interactions Approximation (NIA) model, uses graded filters and deconvolution to parameterize the local interactions across the LES cutoff, and a Smagorinsky eddy viscosity term to parameterize the distant interactions. A dynamic procedure is used to determine the unknown eddy viscosity coefficient, rendering the model free of adjustable parameters. The proposed NIA model has been applied to LES of turbulent channel flows at Re_τ ≈ 210 and Re_τ ≈ 570. The results show good agreement with DNS not only for the mean and resolved second-order turbulence statistics but also for the full (resolved plus subgrid) Reynolds stress and turbulence intensities.

  19. A general science-based framework for dynamical spatio-temporal models

    USGS Publications Warehouse

    Wikle, C.K.; Hooten, M.B.

    2010-01-01

    Spatio-temporal statistical models are increasingly being used across a wide variety of scientific disciplines to describe and predict spatially-explicit processes that evolve over time. Correspondingly, in recent years there has been a significant amount of research on new statistical methodology for such models. Although descriptive models that approach the problem from the second-order (covariance) perspective are important, and innovative work is being done in this regard, many real-world processes are dynamic, and it can be more efficient in some cases to characterize the associated spatio-temporal dependence by the use of dynamical models. The chief challenge with the specification of such dynamical models has been related to the curse of dimensionality. Even in fairly simple linear, first-order Markovian, Gaussian error settings, statistical models are often over parameterized. Hierarchical models have proven invaluable in their ability to deal to some extent with this issue by allowing dependency among groups of parameters. In addition, this framework has allowed for the specification of science based parameterizations (and associated prior distributions) in which classes of deterministic dynamical models (e. g., partial differential equations (PDEs), integro-difference equations (IDEs), matrix models, and agent-based models) are used to guide specific parameterizations. Most of the focus for the application of such models in statistics has been in the linear case. The problems mentioned above with linear dynamic models are compounded in the case of nonlinear models. In this sense, the need for coherent and sensible model parameterizations is not only helpful, it is essential. Here, we present an overview of a framework for incorporating scientific information to motivate dynamical spatio-temporal models. First, we illustrate the methodology with the linear case. We then develop a general nonlinear spatio-temporal framework that we call general quadratic nonlinearity and demonstrate that it accommodates many different classes of scientific-based parameterizations as special cases. The model is presented in a hierarchical Bayesian framework and is illustrated with examples from ecology and oceanography. ?? 2010 Sociedad de Estad??stica e Investigaci??n Operativa.

  20. Estimation of genetic variance for macro- and micro-environmental sensitivity using double hierarchical generalized linear models.

    PubMed

    Mulder, Han A; Rönnegård, Lars; Fikse, W Freddy; Veerkamp, Roel F; Strandberg, Erling

    2013-07-04

    Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike's information criterion using h-likelihood to select the best fitting model. We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike's information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike's information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.

  1. A methodology for the design of experiments in computational intelligence with multiple regression models.

    PubMed

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  2. A methodology for the design of experiments in computational intelligence with multiple regression models

    PubMed Central

    Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952

  3. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  4. Statistical analysis of CCSN/SS7 traffic data from working CCS subnetworks

    NASA Astrophysics Data System (ADS)

    Duffy, Diane E.; McIntosh, Allen A.; Rosenstein, Mark; Willinger, Walter

    1994-04-01

    In this paper, we report on an ongoing statistical analysis of actual CCSN traffic data. The data consist of approximately 170 million signaling messages collected from a variety of different working CCS subnetworks. The key findings from our analysis concern: (1) the characteristics of both the telephone call arrival process and the signaling message arrival process; (2) the tail behavior of the call holding time distribution; and (3) the observed performance of the CCSN with respect to a variety of performance and reliability measurements.

  5. Crises and Collective Socio-Economic Phenomena: Simple Models and Challenges

    NASA Astrophysics Data System (ADS)

    Bouchaud, Jean-Philippe

    2013-05-01

    Financial and economic history is strewn with bubbles and crashes, booms and busts, crises and upheavals of all sorts. Understanding the origin of these events is arguably one of the most important problems in economic theory. In this paper, we review recent efforts to include heterogeneities and interactions in models of decision. We argue that the so-called Random Field Ising model ( rfim) provides a unifying framework to account for many collective socio-economic phenomena that lead to sudden ruptures and crises. We discuss different models that can capture potentially destabilizing self-referential feedback loops, induced either by herding, i.e. reference to peers, or trending, i.e. reference to the past, and that account for some of the phenomenology missing in the standard models. We discuss some empirically testable predictions of these models, for example robust signatures of rfim-like herding effects, or the logarithmic decay of spatial correlations of voting patterns. One of the most striking result, inspired by statistical physics methods, is that Adam Smith's invisible hand can fail badly at solving simple coordination problems. We also insist on the issue of time-scales, that can be extremely long in some cases, and prevent socially optimal equilibria from being reached. As a theoretical challenge, the study of so-called "detailed-balance" violating decision rules is needed to decide whether conclusions based on current models (that all assume detailed-balance) are indeed robust and generic.

  6. Hydrologic consistency as a basis for assessing complexity of monthly water balance models for the continental United States

    NASA Astrophysics Data System (ADS)

    Martinez, Guillermo F.; Gupta, Hoshin V.

    2011-12-01

    Methods to select parsimonious and hydrologically consistent model structures are useful for evaluating dominance of hydrologic processes and representativeness of data. While information criteria (appropriately constrained to obey underlying statistical assumptions) can provide a basis for evaluating appropriate model complexity, it is not sufficient to rely upon the principle of maximum likelihood (ML) alone. We suggest that one must also call upon a "principle of hydrologic consistency," meaning that selected ML structures and parameter estimates must be constrained (as well as possible) to reproduce desired hydrological characteristics of the processes under investigation. This argument is demonstrated in the context of evaluating the suitability of candidate model structures for lumped water balance modeling across the continental United States, using data from 307 snow-free catchments. The models are constrained to satisfy several tests of hydrologic consistency, a flow space transformation is used to ensure better consistency with underlying statistical assumptions, and information criteria are used to evaluate model complexity relative to the data. The results clearly demonstrate that the principle of consistency provides a sensible basis for guiding selection of model structures and indicate strong spatial persistence of certain model structures across the continental United States. Further work to untangle reasons for model structure predominance can help to relate conceptual model structures to physical characteristics of the catchments, facilitating the task of prediction in ungaged basins.

  7. CALL versus Paper: In Which Context Are L1 Glosses More Effective?

    ERIC Educational Resources Information Center

    Taylor, Alan M.

    2013-01-01

    CALL glossing in first language (L1) or second language (L2) texts has been shown by previous studies to be more effective than traditional, paper-and-pen L1 glossing. Using a pool of studies with much more statistical power and more accurate results, this meta-analysis demonstrates more precisely the degree to which CALL L1 glossing can be more…

  8. Analogical Instruction in Statistics: Implications for Social Work Educators

    ERIC Educational Resources Information Center

    Thomas, Leela

    2008-01-01

    This paper examines the use of analogies in statistics instruction. Much has been written about the difficulty social work students have with statistics. To address this concern, Glisson and Fischer (1987) called for the use of analogies. Understanding of analogical problem solving has surged in the last few decades with the integration of…

  9. Beneath the Skin: Statistics, Trust, and Status

    ERIC Educational Resources Information Center

    Smith, Richard

    2011-01-01

    Overreliance on statistics, and even faith in them--which Richard Smith in this essay calls a branch of "metricophilia"--is a common feature of research in education and in the social sciences more generally. Of course accurate statistics are important, but they often constitute essentially a powerful form of rhetoric. For purposes of analysis and…

  10. Stochastic modelling, Bayesian inference, and new in vivo measurements elucidate the debated mtDNA bottleneck mechanism

    PubMed Central

    Johnston, Iain G; Burgstaller, Joerg P; Havlicek, Vitezslav; Kolbe, Thomas; Rülicke, Thomas; Brem, Gottfried; Poulton, Jo; Jones, Nick S

    2015-01-01

    Dangerous damage to mitochondrial DNA (mtDNA) can be ameliorated during mammalian development through a highly debated mechanism called the mtDNA bottleneck. Uncertainty surrounding this process limits our ability to address inherited mtDNA diseases. We produce a new, physically motivated, generalisable theoretical model for mtDNA populations during development, allowing the first statistical comparison of proposed bottleneck mechanisms. Using approximate Bayesian computation and mouse data, we find most statistical support for a combination of binomial partitioning of mtDNAs at cell divisions and random mtDNA turnover, meaning that the debated exact magnitude of mtDNA copy number depletion is flexible. New experimental measurements from a wild-derived mtDNA pairing in mice confirm the theoretical predictions of this model. We analytically solve a mathematical description of this mechanism, computing probabilities of mtDNA disease onset, efficacy of clinical sampling strategies, and effects of potential dynamic interventions, thus developing a quantitative and experimentally-supported stochastic theory of the bottleneck. DOI: http://dx.doi.org/10.7554/eLife.07464.001 PMID:26035426

  11. Brain Cancer—Patient Version

    Cancer.gov

    Brain cancer refers to growths of malignant cells in tissues of the brain. Tumors that start in the brain are called primary brain tumors. Tumors that spread to the brain are called metastatic brain tumors. Start here to find information on brain cancer treatment, research, and statistics.

  12. Quantitative trait nucleotide analysis using Bayesian model selection.

    PubMed

    Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D

    2005-10-01

    Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.

  13. Frogs Exploit Statistical Regularities in Noisy Acoustic Scenes to Solve Cocktail-Party-like Problems.

    PubMed

    Lee, Norman; Ward, Jessica L; Vélez, Alejandro; Micheyl, Christophe; Bee, Mark A

    2017-03-06

    Noise is a ubiquitous source of errors in all forms of communication [1]. Noise-induced errors in speech communication, for example, make it difficult for humans to converse in noisy social settings, a challenge aptly named the "cocktail party problem" [2]. Many nonhuman animals also communicate acoustically in noisy social groups and thus face biologically analogous problems [3]. However, we know little about how the perceptual systems of receivers are evolutionarily adapted to avoid the costs of noise-induced errors in communication. In this study of Cope's gray treefrog (Hyla chrysoscelis; Hylidae), we investigated whether receivers exploit a potential statistical regularity present in noisy acoustic scenes to reduce errors in signal recognition and discrimination. We developed an anatomical/physiological model of the peripheral auditory system to show that temporal correlation in amplitude fluctuations across the frequency spectrum ("comodulation") [4-6] is a feature of the noise generated by large breeding choruses of sexually advertising males. In four psychophysical experiments, we investigated whether females exploit comodulation in background noise to mitigate noise-induced errors in evolutionarily critical mate-choice decisions. Subjects experienced fewer errors in recognizing conspecific calls and in selecting the calls of high-quality mates in the presence of simulated chorus noise that was comodulated. These data show unequivocally, and for the first time, that exploiting statistical regularities present in noisy acoustic scenes is an important biological strategy for solving cocktail-party-like problems in nonhuman animal communication. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Delayed Majority Game with Heterogeneous Learning Speeds for Financial Markets

    NASA Astrophysics Data System (ADS)

    Yoshimura, Yushi; Yamada, Kenta

    There are two famous statistical laws, so-called stylized facts, in financial markets. One is fat tail where the tail of price returns obeys a power law. The other is volatility clustering in which the autocorrelation function of absolute price returns decays with a power law. In order to understand relationships between the stylized facts and dealers' behaviors, we constructed a new agent-based model based on the grand canonical minority game (GCMG) and the Giardina-Bouchaud (GB) model. The recovery of stylized facts by GCMG and GB lacks of robustness. Therefore, based on the GCMG and GB model, we develop a new model that can reproduce stylized facts robustly. Furthermore, we find that heterogeneity of learning speeds of agents is important to reproduce the stylized facts.

  15. Torsion of DNA modeled as a heterogeneous fluctuating rod

    NASA Astrophysics Data System (ADS)

    Argudo, David; Purohit, Prashant K.

    2014-01-01

    We discuss the statistical mechanics of a heterogeneous elastic rod with bending, twisting and stretching. Our model goes beyond earlier works where only homogeneous rods were considered in the limit of high forces and long lengths. Our methods allow us to consider shorter fluctuating rods for which boundary conditions can play an important role. We use our theory to study structural transitions in torsionally constrained DNA where there is coexistence of states with different effective properties. In particular, we examine whether a newly discovered left-handed DNA conformation called L-DNA is a mixture of two known states. We also use our model to investigate the mechanical effects of the binding of small molecules to DNA. For both these applications we make experimentally falsifiable predictions.

  16. The Extended Erlang-Truncated Exponential distribution: Properties and application to rainfall data.

    PubMed

    Okorie, I E; Akpanta, A C; Ohakwe, J; Chikezie, D C

    2017-06-01

    The Erlang-Truncated Exponential ETE distribution is modified and the new lifetime distribution is called the Extended Erlang-Truncated Exponential EETE distribution. Some statistical and reliability properties of the new distribution are given and the method of maximum likelihood estimate was proposed for estimating the model parameters. The usefulness and flexibility of the EETE distribution was illustrated with an uncensored data set and its fit was compared with that of the ETE and three other three-parameter distributions. Results based on the minimized log-likelihood ([Formula: see text]), Akaike information criterion (AIC), Bayesian information criterion (BIC) and the generalized Cramér-von Mises [Formula: see text] statistics shows that the EETE distribution provides a more reasonable fit than the one based on the other competing distributions.

  17. Evaluating climate models: Should we use weather or climate observations?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oglesby, Robert J; Erickson III, David J

    2009-12-01

    Calling the numerical models that we use for simulations of climate change 'climate models' is a bit of a misnomer. These 'general circulation models' (GCMs, AKA global climate models) and their cousins the 'regional climate models' (RCMs) are actually physically-based weather simulators. That is, these models simulate, either globally or locally, daily weather patterns in response to some change in forcing or boundary condition. These simulated weather patterns are then aggregated into climate statistics, very much as we aggregate observations into 'real climate statistics'. Traditionally, the output of GCMs has been evaluated using climate statistics, as opposed to their abilitymore » to simulate realistic daily weather observations. At the coarse global scale this may be a reasonable approach, however, as RCM's downscale to increasingly higher resolutions, the conjunction between weather and climate becomes more problematic. We present results from a series of present-day climate simulations using the WRF ARW for domains that cover North America, much of Latin America, and South Asia. The basic domains are at a 12 km resolution, but several inner domains at 4 km have also been simulated. These include regions of complex topography in Mexico, Colombia, Peru, and Sri Lanka, as well as a region of low topography and fairly homogeneous land surface type (the U.S. Great Plains). Model evaluations are performed using standard climate analyses (e.g., reanalyses; NCDC data) but also using time series of daily station observations. Preliminary results suggest little difference in the assessment of long-term mean quantities, but the variability on seasonal and interannual timescales is better described. Furthermore, the value-added by using daily weather observations as an evaluation tool increases with the model resolution.« less

  18. Factors contributing to academic achievement: a Bayesian structure equation modelling study

    NASA Astrophysics Data System (ADS)

    Payandeh Najafabadi, Amir T.; Omidi Najafabadi, Maryam; Farid-Rohani, Mohammad Reza

    2013-06-01

    In Iran, high school graduates enter university after taking a very difficult entrance exam called the Konkoor. Therefore, only the top-performing students are admitted by universities to continue their bachelor's education in statistics. Surprisingly, statistically, most of such students fall into the following categories: (1) do not succeed in their education despite their excellent performance on the Konkoor and in high school; (2) graduate with a grade point average (GPA) that is considerably lower than their high school GPA; (3) continue their master's education in majors other than statistics and (4) try to find jobs unrelated to statistics. This article employs the well-known and powerful statistical technique, the Bayesian structural equation modelling (SEM), to study the academic success of recent graduates who have studied statistics at Shahid Beheshti University in Iran. This research: (i) considered academic success as a latent variable, which was measured by GPA and other academic success (see below) of students in the target population; (ii) employed the Bayesian SEM, which works properly for small sample sizes and ordinal variables; (iii), which is taken from the literature, developed five main factors that affected academic success and (iv) considered several standard psychological tests and measured characteristics such as 'self-esteem' and 'anxiety'. We then study the impact of such factors on the academic success of the target population. Six factors that positively impact student academic success were identified in the following order of relative impact (from greatest to least): 'Teaching-Evaluation', 'Learner', 'Environment', 'Family', 'Curriculum' and 'Teaching Knowledge'. Particularly, influential variables within each factor have also been noted.

  19. Differential Impact and Use of a Telehealth Intervention by Persons with MS or SCI.

    PubMed

    Mercier, Hannah W; Ni, Pensheng; Houlihan, Bethlyn V; Jette, Alan M

    2015-11-01

    The objective of this study was to compare outcomes and patterns of engaging with a telehealth intervention (CareCall) by adult wheelchair users with severe mobility limitations with a diagnosis of multiple sclerosis (MS) or spinal cord injury (SCI). The design of this study is a secondary analysis from a pilot randomized controlled trial with 106 participants with SCI and 36 participants with MS. General linear model results showed that an interaction between baseline depression score and study group significantly predicted reduced depression at 6 mos for subjects with both diagnoses (P = 0.01). For those with MS, CareCall increased participants' physical independence (P < 0.001). No statistically significant differences in skin integrity were found between study groups for subjects with either diagnosis. All participants were similarly satisfied with CareCall, although those with MS engaged in almost double the amount of calls per person than those with SCI (P = 0.005). Those with SCI missed more calls (P < 0.001) and required more extensive support from a nurse (P = 0.006) than those with MS. An interactive telephone intervention was effective in reducing depression in adult wheelchair users with either MS or SCI, and in increasing health care access and physical independence for those with a diagnosis of MS. Future research should aim to enhance the efficacy of such an intervention for participants with SCI.

  20. Lattice QCD Thermodynamics and RHIC-BES Particle Production within Generic Nonextensive Statistics

    NASA Astrophysics Data System (ADS)

    Tawfik, Abdel Nasser

    2018-05-01

    The current status of implementing Tsallis (nonextensive) statistics on high-energy physics is briefly reviewed. The remarkably low freezeout-temperature, which apparently fails to reproduce the firstprinciple lattice QCD thermodynamics and the measured particle ratios, etc. is discussed. The present work suggests a novel interpretation for the so-called " Tsallis-temperature". It is proposed that the low Tsallis-temperature is due to incomplete implementation of Tsallis algebra though exponential and logarithmic functions to the high-energy particle-production. Substituting Tsallis algebra into grand-canonical partition-function of the hadron resonance gas model seems not assuring full incorporation of nonextensivity or correlations in that model. The statistics describing the phase-space volume, the number of states and the possible changes in the elementary cells should be rather modified due to interacting correlated subsystems, of which the phase-space is consisting. Alternatively, two asymptotic properties, each is associated with a scaling function, are utilized to classify a generalized entropy for such a system with large ensemble (produced particles) and strong correlations. Both scaling exponents define equivalence classes for all interacting and noninteracting systems and unambiguously characterize any statistical system in its thermodynamic limit. We conclude that the nature of lattice QCD simulations is apparently extensive and accordingly the Boltzmann-Gibbs statistics is fully fulfilled. Furthermore, we found that the ratios of various particle yields at extreme high and extreme low energies of RHIC-BES is likely nonextensive but not necessarily of Tsallis type.

  1. Why Bayesian Psychologists Should Change the Way They Use the Bayes Factor.

    PubMed

    Hoijtink, Herbert; van Kooten, Pascal; Hulsker, Koenraad

    2016-01-01

    The discussion following Bem's ( 2011 ) psi research highlights that applications of the Bayes factor in psychological research are not without problems. The first problem is the omission to translate subjective prior knowledge into subjective prior distributions. In the words of Savage ( 1961 ): "they make the Bayesian omelet without breaking the Bayesian egg." The second problem occurs if the Bayesian egg is not broken: the omission to choose default prior distributions such that the ensuing inferences are well calibrated. The third problem is the adherence to inadequate rules for the interpretation of the size of the Bayes factor. The current paper will elaborate these problems and show how to avoid them using the basic hypotheses and statistical model used in the first experiment described in Bem ( 2011 ). It will be argued that a thorough investigation of these problems in the context of more encompassing hypotheses and statistical models is called for if Bayesian psychologists want to add a well-founded Bayes factor to the tool kit of psychological researchers.

  2. A Comparison of the Performance of Advanced Statistical Techniques for the Refinement of Day-ahead and Longer NWP-based Wind Power Forecasts

    NASA Astrophysics Data System (ADS)

    Zack, J. W.

    2015-12-01

    Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.

  3. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579

  4. Change detection in the dynamics of an intracellular protein synthesis model using nonlinear Kalman filtering.

    PubMed

    Rigatos, Gerasimos G; Rigatou, Efthymia G; Djida, Jean Daniel

    2015-10-01

    A method for early diagnosis of parametric changes in intracellular protein synthesis models (e.g. the p53 protein - mdm2 inhibitor model) is developed with the use of a nonlinear Kalman Filtering approach (Derivative-free nonlinear Kalman Filter) and of statistical change detection methods. The intracellular protein synthesis dynamic model is described by a set of coupled nonlinear differential equations. It is shown that such a dynamical system satisfies differential flatness properties and this allows to transform it, through a change of variables (diffeomorphism), to the so-called linear canonical form. For the linearized equivalent of the dynamical system, state estimation can be performed using the Kalman Filter recursion. Moreover, by applying an inverse transformation based on the previous diffeomorphism it becomes also possible to obtain estimates of the state variables of the initial nonlinear model. By comparing the output of the Kalman Filter (which is assumed to correspond to the undistorted dynamical model) with measurements obtained from the monitored protein synthesis system, a sequence of differences (residuals) is obtained. The statistical processing of the residuals with the use of x2 change detection tests, can provide indication within specific confidence intervals about parametric changes in the considered biological system and consequently indications about the appearance of specific diseases (e.g. malignancies).

  5. Commentary on the statistical properties of noise and its implication on general linear models in functional near-infrared spectroscopy.

    PubMed

    Huppert, Theodore J

    2016-01-01

    Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.

  6. Weather and emotional state: a search for associations between weather and calls to telephone counseling services

    NASA Astrophysics Data System (ADS)

    Driscoll, Dennis; Stillman, Daniel

    2002-08-01

    Previous research has revealed that an emotional response to weather might be indicated by calls to telephone counseling services. We analyzed call frequency from such "hotlines", each serving communities in a major metropolitan area of the United States (Detroit, Washington DC, Dallas and Seattle). The periods examined were all, or parts of, the years 1997 and 1998. Associations with subjectively derived synoptic weather types for all cities except Seattle, as well as with individual weather elements [cloudiness (sky cover), precipitation, windspeed, and interdiurnal temperature change] for all four cities, were investigated. Analysis of variance and t-tests (significance of means) were applied to test the statistical significance of differences. Although statistically significant results were obtained in scattered instances, the total number was within that expected by chance, and there was little in the way of consistency to these associations. One clear exception was the increased call frequency during destructive (severe) weather, when there is obvious concern about the damage done by it.

  7. Stochastic analysis of a pulse-type prey-predator model

    NASA Astrophysics Data System (ADS)

    Wu, Y.; Zhu, W. Q.

    2008-04-01

    A stochastic Lotka-Volterra model, a so-called pulse-type model, for the interaction between two species and their random natural environment is investigated. The effect of a random environment is modeled as random pulse trains in the birth rate of the prey and the death rate of the predator. The generalized cell mapping method is applied to calculate the probability distributions of the species populations at a state of statistical quasistationarity. The time evolution of the population densities is studied, and the probability of the near extinction time, from an initial state to a critical state, is obtained. The effects on the ecosystem behaviors of the prey self-competition term and of the pulse mean arrival rate are also discussed. Our results indicate that the proposed pulse-type model shows obviously distinguishable characteristics from a Gaussian-type model, and may confer a significant advantage for modeling the prey-predator system under discrete environmental fluctuations.

  8. Stochastic analysis of a pulse-type prey-predator model.

    PubMed

    Wu, Y; Zhu, W Q

    2008-04-01

    A stochastic Lotka-Volterra model, a so-called pulse-type model, for the interaction between two species and their random natural environment is investigated. The effect of a random environment is modeled as random pulse trains in the birth rate of the prey and the death rate of the predator. The generalized cell mapping method is applied to calculate the probability distributions of the species populations at a state of statistical quasistationarity. The time evolution of the population densities is studied, and the probability of the near extinction time, from an initial state to a critical state, is obtained. The effects on the ecosystem behaviors of the prey self-competition term and of the pulse mean arrival rate are also discussed. Our results indicate that the proposed pulse-type model shows obviously distinguishable characteristics from a Gaussian-type model, and may confer a significant advantage for modeling the prey-predator system under discrete environmental fluctuations.

  9. System implications of the ambulance arrival-to-patient contact interval on response interval compliance.

    PubMed

    Campbell, J P; Gratton, M C; Salomone, J A; Lindholm, D J; Watson, W A

    1994-01-01

    In some emergency medical services (EMS) system designs, response time intervals are mandated with monetary penalties for noncompliance. These times are set with the goal of providing rapid, definitive patient care. The time interval of vehicle at scene-to-patient access (VSPA) has been measured, but its effect on response time interval compliance has not been determined. To determine the effect of the VSPA interval on the mandated code 1 (< 9 min) and code 2 (< 13 min) response time interval compliance in an urban, public-utility model system. A prospective, observational study used independent third-party riders to collect the VSPA interval for emergency life-threatening (code 1) and emergency nonlife-threatening (code 2) calls. The VSPA interval was added to the 9-1-1 call-to-dispatch and vehicle dispatch-to-scene intervals to determine the total time interval from call received until paramedic access to the patient (9-1-1 call-to-patient access). Compliance with the mandated response time intervals was determined using the traditional time intervals (9-1-1 call-to-scene) plus the VSPA time intervals (9-1-1 call-to-patient access). Chi-square was used to determine statistical significance. Of the 216 observed calls, 198 were matched to the traditional time intervals. Sixty-three were code 1, and 135 were code 2. Of the code 1 calls, 90.5% were compliant using 9-1-1 call-to-scene intervals dropping to 63.5% using 9-1-1 call-to-patient access intervals (p < 0.0005). Of the code 2 calls, 94.1% were compliant using 9-1-1 call-to-scene intervals. Compliance decreased to 83.7% using 9-1-1 call-to-patient access intervals (p = 0.012). The addition of the VSPA interval to the traditional time intervals impacts system response time compliance. Using 9-1-1 call-to-scene compliance as a basis for measuring system performance underestimates the time for the delivery of definitive care. This must be considered when response time interval compliances are defined.

  10. Foreign exchange market data analysis reveals statistical features that predict price movement acceleration.

    PubMed

    Nacher, Jose C; Ochiai, Tomoshiro

    2012-05-01

    Increasingly accessible financial data allow researchers to infer market-dynamics-based laws and to propose models that are able to reproduce them. In recent years, several stylized facts have been uncovered. Here we perform an extensive analysis of foreign exchange data that leads to the unveiling of a statistical financial law. First, our findings show that, on average, volatility increases more when the price exceeds the highest (or lowest) value, i.e., breaks the resistance line. We call this the breaking-acceleration effect. Second, our results show that the probability P(T) to break the resistance line in the past time T follows power law in both real data and theoretically simulated data. However, the probability calculated using real data is rather lower than the one obtained using a traditional Black-Scholes (BS) model. Taken together, the present analysis characterizes a different stylized fact of financial markets and shows that the market exceeds a past (historical) extreme price fewer times than expected by the BS model (the resistance effect). However, when the market does, we predict that the average volatility at that time point will be much higher. These findings indicate that any Markovian model does not faithfully capture the market dynamics.

  11. Foreign exchange market data analysis reveals statistical features that predict price movement acceleration

    NASA Astrophysics Data System (ADS)

    Nacher, Jose C.; Ochiai, Tomoshiro

    2012-05-01

    Increasingly accessible financial data allow researchers to infer market-dynamics-based laws and to propose models that are able to reproduce them. In recent years, several stylized facts have been uncovered. Here we perform an extensive analysis of foreign exchange data that leads to the unveiling of a statistical financial law. First, our findings show that, on average, volatility increases more when the price exceeds the highest (or lowest) value, i.e., breaks the resistance line. We call this the breaking-acceleration effect. Second, our results show that the probability P(T) to break the resistance line in the past time T follows power law in both real data and theoretically simulated data. However, the probability calculated using real data is rather lower than the one obtained using a traditional Black-Scholes (BS) model. Taken together, the present analysis characterizes a different stylized fact of financial markets and shows that the market exceeds a past (historical) extreme price fewer times than expected by the BS model (the resistance effect). However, when the market does, we predict that the average volatility at that time point will be much higher. These findings indicate that any Markovian model does not faithfully capture the market dynamics.

  12. Relating triggering processes in lab experiments with earthquakes.

    NASA Astrophysics Data System (ADS)

    Baro Urbea, J.; Davidsen, J.; Kwiatek, G.; Charalampidou, E. M.; Goebel, T.; Stanchits, S. A.; Vives, E.; Dresen, G.

    2016-12-01

    Statistical relations such as Gutenberg-Richter's, Omori-Utsu's and the productivity of aftershocks were first observed in seismology, but are also common to other physical phenomena exhibiting avalanche dynamics such as solar flares, rock fracture, structural phase transitions and even stock market transactions. All these examples exhibit spatio-temporal correlations that can be explained as triggering processes: Instead of being activated as a response to external driving or fluctuations, some events are consequence of previous activity. Although different plausible explanations have been suggested in each system, the ubiquity of such statistical laws remains unknown. However, the case of rock fracture may exhibit a physical connection with seismology. It has been suggested that some features of seismology have a microscopic origin and are reproducible over a vast range of scales. This hypothesis has motivated mechanical experiments to generate artificial catalogues of earthquakes at a laboratory scale -so called labquakes- and under controlled conditions. Microscopic fractures in lab tests release elastic waves that are recorded as ultrasonic (kHz-MHz) acoustic emission (AE) events by means of piezoelectric transducers. Here, we analyse the statistics of labquakes recorded during the failure of small samples of natural rocks and artificial porous materials under different controlled compression regimes. Temporal and spatio-temporal correlations are identified in certain cases. Specifically, we distinguish between the background and triggered events, revealing some differences in the statistical properties. We fit the data to statistical models of seismicity. As a particular case, we explore the branching process approach simplified in the Epidemic Type Aftershock Sequence (ETAS) model. We evaluate the empirical spatio-temporal kernel of the model and investigate the physical origins of triggering. Our analysis of the focal mechanisms implies that the occurrence of the empirical laws extends well beyond purely frictional sliding events, in contrast to what is often assumed.

  13. Evolutionary neural networks for anomaly detection based on the behavior of a program.

    PubMed

    Han, Sang-Jun; Cho, Sung-Bae

    2006-06-01

    The process of learning the behavior of a given program by using machine-learning techniques (based on system-call audit data) is effective to detect intrusions. Rule learning, neural networks, statistics, and hidden Markov models (HMMs) are some of the kinds of representative methods for intrusion detection. Among them, neural networks are known for good performance in learning system-call sequences. In order to apply this knowledge to real-world problems successfully, it is important to determine the structures and weights of these call sequences. However, finding the appropriate structures requires very long time periods because there are no suitable analytical solutions. In this paper, a novel intrusion-detection technique based on evolutionary neural networks (ENNs) is proposed. One advantage of using ENNs is that it takes less time to obtain superior neural networks than when using conventional approaches. This is because they discover the structures and weights of the neural networks simultaneously. Experimental results with the 1999 Defense Advanced Research Projects Agency (DARPA) Intrusion Detection Evaluation (IDEVAL) data confirm that ENNs are promising tools for intrusion detection.

  14. Advances in the meta-analysis of heterogeneous clinical trials I: The inverse variance heterogeneity model.

    PubMed

    Doi, Suhail A R; Barendregt, Jan J; Khan, Shahjahan; Thalib, Lukman; Williams, Gail M

    2015-11-01

    This article examines an improved alternative to the random effects (RE) model for meta-analysis of heterogeneous studies. It is shown that the known issues of underestimation of the statistical error and spuriously overconfident estimates with the RE model can be resolved by the use of an estimator under the fixed effect model assumption with a quasi-likelihood based variance structure - the IVhet model. Extensive simulations confirm that this estimator retains a correct coverage probability and a lower observed variance than the RE model estimator, regardless of heterogeneity. When the proposed IVhet method is applied to the controversial meta-analysis of intravenous magnesium for the prevention of mortality after myocardial infarction, the pooled OR is 1.01 (95% CI 0.71-1.46) which not only favors the larger studies but also indicates more uncertainty around the point estimate. In comparison, under the RE model the pooled OR is 0.71 (95% CI 0.57-0.89) which, given the simulation results, reflects underestimation of the statistical error. Given the compelling evidence generated, we recommend that the IVhet model replace both the FE and RE models. To facilitate this, it has been implemented into free meta-analysis software called MetaXL which can be downloaded from www.epigear.com. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. The Environmental Data Book: A Guide to Statistics on the Environment and Development.

    ERIC Educational Resources Information Center

    Sheram, Katherine

    This book presents statistics on countries with populations of more than 1 million related to the quality of the environment, economic development, and how each is affected by the other. Sometimes called indicators, the statistics are measures of environmental, economic, and social conditions in developing and industrial countries. The book is…

  16. Tri-Center Analysis: Determining Measures of Trichotomous Central Tendency for the Parametric Analysis of Tri-Squared Test Results

    ERIC Educational Resources Information Center

    Osler, James Edward

    2014-01-01

    This monograph provides an epistemological rational for the design of a novel post hoc statistical measure called "Tri-Center Analysis". This new statistic is designed to analyze the post hoc outcomes of the Tri-Squared Test. In Tri-Center Analysis trichotomous parametric inferential parametric statistical measures are calculated from…

  17. Modeling missing data in knowledge space theory.

    PubMed

    de Chiusole, Debora; Stefanutti, Luca; Anselmi, Pasquale; Robusto, Egidio

    2015-12-01

    Missing data are a well known issue in statistical inference, because some responses may be missing, even when data are collected carefully. The problem that arises in these cases is how to deal with missing data. In this article, the missingness is analyzed in knowledge space theory, and in particular when the basic local independence model (BLIM) is applied to the data. Two extensions of the BLIM to missing data are proposed: The former, called ignorable missing BLIM (IMBLIM), assumes that missing data are missing completely at random; the latter, called missing BLIM (MissBLIM), introduces specific dependencies of the missing data on the knowledge states, thus assuming that the missing data are missing not at random. The IMBLIM and the MissBLIM modeled the missingness in a satisfactory way, in both a simulation study and an empirical application, depending on the process that generates the missingness: If the missing data-generating process is of type missing completely at random, then either IMBLIM or MissBLIM provide adequate fit to the data. However, if the pattern of missingness is functionally dependent upon unobservable features of the data (e.g., missing answers are more likely to be wrong), then only a correctly specified model of the missingness distribution provides an adequate fit to the data. (c) 2015 APA, all rights reserved).

  18. PRANAS: A New Platform for Retinal Analysis and Simulation.

    PubMed

    Cessac, Bruno; Kornprobst, Pierre; Kraria, Selim; Nasser, Hassan; Pamplona, Daniela; Portelli, Geoffrey; Viéville, Thierry

    2017-01-01

    The retina encodes visual scenes by trains of action potentials that are sent to the brain via the optic nerve. In this paper, we describe a new free access user-end software allowing to better understand this coding. It is called PRANAS (https://pranas.inria.fr), standing for Platform for Retinal ANalysis And Simulation. PRANAS targets neuroscientists and modelers by providing a unique set of retina-related tools. PRANAS integrates a retina simulator allowing large scale simulations while keeping a strong biological plausibility and a toolbox for the analysis of spike train population statistics. The statistical method (entropy maximization under constraints) takes into account both spatial and temporal correlations as constraints, allowing to analyze the effects of memory on statistics. PRANAS also integrates a tool computing and representing in 3D (time-space) receptive fields. All these tools are accessible through a friendly graphical user interface. The most CPU-costly of them have been implemented to run in parallel.

  19. Non-arbitrage in financial markets: A Bayesian approach for verification

    NASA Astrophysics Data System (ADS)

    Cerezetti, F. V.; Stern, Julio Michael

    2012-10-01

    The concept of non-arbitrage plays an essential role in finance theory. Under certain regularity conditions, the Fundamental Theorem of Asset Pricing states that, in non-arbitrage markets, prices of financial instruments are martingale processes. In this theoretical framework, the analysis of the statistical distributions of financial assets can assist in understanding how participants behave in the markets, and may or may not engender arbitrage conditions. Assuming an underlying Variance Gamma statistical model, this study aims to test, using the FBST - Full Bayesian Significance Test, if there is a relevant price difference between essentially the same financial asset traded at two distinct locations. Specifically, we investigate and compare the behavior of call options on the BOVESPA Index traded at (a) the Equities Segment and (b) the Derivatives Segment of BM&FBovespa. Our results seem to point out significant statistical differences. To what extent this evidence is actually the expression of perennial arbitrage opportunities is still an open question.

  20. On Fluctuations of Eigenvalues of Random Band Matrices

    NASA Astrophysics Data System (ADS)

    Shcherbina, M.

    2015-10-01

    We consider the fluctuations of linear eigenvalue statistics of random band matrices whose entries have the form with i.i.d. possessing the th moment, where the function u has a finite support , so that M has only nonzero diagonals. The parameter b (called the bandwidth) is assumed to grow with n in a way such that . Without any additional assumptions on the growth of b we prove CLT for linear eigenvalue statistics for a rather wide class of test functions. Thus we improve and generalize the results of the previous papers (Jana et al., arXiv:1412.2445; Li et al. Random Matrices 2:04, 2013), where CLT was proven under the assumption . Moreover, we develop a method which allows to prove automatically the CLT for linear eigenvalue statistics of the smooth test functions for almost all classical models of random matrix theory: deformed Wigner and sample covariance matrices, sparse matrices, diluted random matrices, matrices with heavy tales etc.

  1. Plasma Cell Neoplasms (Including Multiple Myeloma)—Patient Version

    Cancer.gov

    Plasma cell neoplasms occur when abnormal plasma cells form cancerous tumors. When there is only one tumor, the disease is called a plasmacytoma. When there are multiple tumors, it is called multiple myeloma. Start here to find information on plasma cell neoplasms treatment, research, and statistics.

  2. 78 FR 17469 - Government Securities: Call for Large Position Reports

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-21

    ... DEPARTMENT OF THE TREASURY Government Securities: Call for Large Position Reports AGENCY: Office... Reserve Bank of New York, Government Securities Dealer Statistics Unit, 4th Floor, 33 Liberty Street, New... Eidemiller, or Kevin Hawkins; Government Securities Regulations Staff, Department of the Treasury, at 202-504...

  3. On the cold denaturation of globular proteins

    NASA Astrophysics Data System (ADS)

    Ascolese, Eduardo; Graziano, Giuseppe

    2008-12-01

    The recent finding that yeast frataxin shows, at pH 7.0, cold denaturation at 274 K and hot denaturation at 303 K [A. Pastore, S.R. Martin, A. Politou, K.C. Kondapalli, T. Stemmler, P.A. Temussi, J. Am. Chem. Soc. 129 (2007) 5374] calls for a deeper rationalization of the molecular mechanisms stabilizing-destabilizing the native state of globular proteins. It is shown that the statistical thermodynamic model originally developed by Ikegami can reproduce in a more-than-qualitative manner the two conformational transitions of yeast frataxin, providing important clues on their molecular origin.

  4. An Artificial Intelligence Approach to Analyzing Student Errors in Statistics.

    ERIC Educational Resources Information Center

    Sebrechts, Marc M.; Schooler, Lael J.

    1987-01-01

    Describes the development of an artificial intelligence system called GIDE that analyzes student errors in statistics problems by inferring the students' intentions. Learning strategies involved in problem solving are discussed and the inclusion of goal structures is explained. (LRW)

  5. Statistical Analysis of Q-matrix Based Diagnostic Classification Models

    PubMed Central

    Chen, Yunxiao; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2014-01-01

    Diagnostic classification models have recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. Central to the model specification is the so-called Q-matrix that provides a qualitative specification of the item-attribute relationship. In this paper, we develop theories on the identifiability for the Q-matrix under the DINA and the DINO models. We further propose an estimation procedure for the Q-matrix through the regularized maximum likelihood. The applicability of this procedure is not limited to the DINA or the DINO model and it can be applied to essentially all Q-matrix based diagnostic classification models. Simulation studies are conducted to illustrate its performance. Furthermore, two case studies are presented. The first case is a data set on fraction subtraction (educational application) and the second case is a subsample of the National Epidemiological Survey on Alcohol and Related Conditions concerning the social anxiety disorder (psychiatric application). PMID:26294801

  6. Probabilistic atlas and geometric variability estimation to drive tissue segmentation.

    PubMed

    Xu, Hao; Thirion, Bertrand; Allassonnière, Stéphanie

    2014-09-10

    Computerized anatomical atlases play an important role in medical image analysis. While an atlas usually refers to a standard or mean image also called template, which presumably represents well a given population, it is not enough to characterize the observed population in detail. A template image should be learned jointly with the geometric variability of the shapes represented in the observations. These two quantities will in the sequel form the atlas of the corresponding population. The geometric variability is modeled as deformations of the template image so that it fits the observations. In this paper, we provide a detailed analysis of a new generative statistical model based on dense deformable templates that represents several tissue types observed in medical images. Our atlas contains both an estimation of probability maps of each tissue (called class) and the deformation metric. We use a stochastic algorithm for the estimation of the probabilistic atlas given a dataset. This atlas is then used for atlas-based segmentation method to segment the new images. Experiments are shown on brain T1 MRI datasets. Copyright © 2014 John Wiley & Sons, Ltd.

  7. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    PubMed

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  8. Space-filling designs for computer experiments: A review

    DOE PAGES

    Joseph, V. Roshan

    2016-01-29

    Improving the quality of a product/process using a computer simulator is a much less expensive option than the real physical testing. However, simulation using computationally intensive computer models can be time consuming and therefore, directly doing the optimization on the computer simulator can be infeasible. Experimental design and statistical modeling techniques can be used for overcoming this problem. This article reviews experimental designs known as space-filling designs that are suitable for computer simulations. In the review, a special emphasis is given for a recently developed space-filling design called maximum projection design. Furthermore, its advantages are illustrated using a simulation conductedmore » for optimizing a milling process.« less

  9. The role of communication and imitation in limit order markets

    NASA Astrophysics Data System (ADS)

    Tedeschi, G.; Iori, G.; Gallegati, M.

    2009-10-01

    In this paper we develop an order driver market model with heterogeneous traders that imitate each other on different network structures. We assess how imitations among otherway noise traders, can give rise to well known stylized facts such as fat tails and volatility clustering. We examine the impact of communication and imitation on the statistical properties of prices and order flows when changing the networks' structure, and show that the imitation of a given, fixed agent, called “guru", can generate clustering of volatility in the model. We also find a positive correlation between volatility and bid-ask spread, and between fat-tailed fluctuations in asset prices and gap sizes in the order book. in here

  10. Space-filling designs for computer experiments: A review

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joseph, V. Roshan

    Improving the quality of a product/process using a computer simulator is a much less expensive option than the real physical testing. However, simulation using computationally intensive computer models can be time consuming and therefore, directly doing the optimization on the computer simulator can be infeasible. Experimental design and statistical modeling techniques can be used for overcoming this problem. This article reviews experimental designs known as space-filling designs that are suitable for computer simulations. In the review, a special emphasis is given for a recently developed space-filling design called maximum projection design. Furthermore, its advantages are illustrated using a simulation conductedmore » for optimizing a milling process.« less

  11. Ranking Theory and Conditional Reasoning.

    PubMed

    Skovgaard-Olsen, Niels

    2016-05-01

    Ranking theory is a formal epistemology that has been developed in over 600 pages in Spohn's recent book The Laws of Belief, which aims to provide a normative account of the dynamics of beliefs that presents an alternative to current probabilistic approaches. It has long been received in the AI community, but it has not yet found application in experimental psychology. The purpose of this paper is to derive clear, quantitative predictions by exploiting a parallel between ranking theory and a statistical model called logistic regression. This approach is illustrated by the development of a model for the conditional inference task using Spohn's (2013) ranking theoretic approach to conditionals. Copyright © 2015 Cognitive Science Society, Inc.

  12. Normalization, bias correction, and peak calling for ChIP-seq

    PubMed Central

    Diaz, Aaron; Park, Kiyoub; Lim, Daniel A.; Song, Jun S.

    2012-01-01

    Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods. PMID:22499706

  13. Statistical inference of dynamic resting-state functional connectivity using hierarchical observation modeling.

    PubMed

    Sojoudi, Alireza; Goodyear, Bradley G

    2016-12-01

    Spontaneous fluctuations of blood-oxygenation level-dependent functional magnetic resonance imaging (BOLD fMRI) signals are highly synchronous between brain regions that serve similar functions. This provides a means to investigate functional networks; however, most analysis techniques assume functional connections are constant over time. This may be problematic in the case of neurological disease, where functional connections may be highly variable. Recently, several methods have been proposed to determine moment-to-moment changes in the strength of functional connections over an imaging session (so called dynamic connectivity). Here a novel analysis framework based on a hierarchical observation modeling approach was proposed, to permit statistical inference of the presence of dynamic connectivity. A two-level linear model composed of overlapping sliding windows of fMRI signals, incorporating the fact that overlapping windows are not independent was described. To test this approach, datasets were synthesized whereby functional connectivity was either constant (significant or insignificant) or modulated by an external input. The method successfully determines the statistical significance of a functional connection in phase with the modulation, and it exhibits greater sensitivity and specificity in detecting regions with variable connectivity, when compared with sliding-window correlation analysis. For real data, this technique possesses greater reproducibility and provides a more discriminative estimate of dynamic connectivity than sliding-window correlation analysis. Hum Brain Mapp 37:4566-4580, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  14. A Statistical Method to Distinguish Functional Brain Networks

    PubMed Central

    Fujita, André; Vidal, Maciel C.; Takahashi, Daniel Y.

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism (p < 0.001). PMID:28261045

  15. A Statistical Method to Distinguish Functional Brain Networks.

    PubMed

    Fujita, André; Vidal, Maciel C; Takahashi, Daniel Y

    2017-01-01

    One major problem in neuroscience is the comparison of functional brain networks of different populations, e.g., distinguishing the networks of controls and patients. Traditional algorithms are based on search for isomorphism between networks, assuming that they are deterministic. However, biological networks present randomness that cannot be well modeled by those algorithms. For instance, functional brain networks of distinct subjects of the same population can be different due to individual characteristics. Moreover, networks of subjects from different populations can be generated through the same stochastic process. Thus, a better hypothesis is that networks are generated by random processes. In this case, subjects from the same group are samples from the same random process, whereas subjects from different groups are generated by distinct processes. Using this idea, we developed a statistical test called ANOGVA to test whether two or more populations of graphs are generated by the same random graph model. Our simulations' results demonstrate that we can precisely control the rate of false positives and that the test is powerful to discriminate random graphs generated by different models and parameters. The method also showed to be robust for unbalanced data. As an example, we applied ANOGVA to an fMRI dataset composed of controls and patients diagnosed with autism or Asperger. ANOGVA identified the cerebellar functional sub-network as statistically different between controls and autism ( p < 0.001).

  16. A Fe K Line in GRB 970508

    NASA Astrophysics Data System (ADS)

    Protassov, R.; van Dyk, D.; Connors, A.; Kashyap, V.; Siemiginowska, A.

    2000-12-01

    We examine the x-ray spectrum of the afterglow of GRB 970508, analyzed for Fe line emission by Piro et al (1999, ApJL, 514, L73). This is a difficult and extremely important measurement: the detection of x-ray afterglows from γ -ray bursts is at best a tricky business, relying on near-real satellite time response to unpredictable events; and a great deal of luck in catching a burst bright enough for a useful spectral analysis. Detecting a clear atomic (or cyclotron) line in the generally smooth and featureless afterglow (or burst) emission not only gives one of the few very specific keys to the physics local to the emission region, but also provides clues or confirmation of its distance (via redshift). Unfortunately, neither the likelihood ratio test or the related F-statistic commonly used to detect spectral lines adhere to their nominal Chi square and F-distributions. Thus we begin by calibrating the F-statistic used in Piro et al (1999, ApJL, 514, L73) via a simulation study. The simulation study relies on a completely specified source model, i.e. we do Monte Carlo simulations with all model parameters fixed (so--called ``parametric bootstrapping''). Second, we employ the method of posterior predictive p-values to calibrate a LRT statistic while accounting for the uncertainty in the parameters of the source model. Our analysis reveals evidence for the Fe K line.

  17. Effect Sizes in Gifted Education Research

    ERIC Educational Resources Information Center

    Gentry, Marcia; Peters, Scott J.

    2009-01-01

    Recent calls for reporting and interpreting effect sizes have been numerous, with the 5th edition of the "Publication Manual of the American Psychological Association" (2001) calling for the inclusion of effect sizes to interpret quantitative findings. Many top journals have required that effect sizes accompany claims of statistical significance.…

  18. Pili-taxis: Clustering of Neisseria gonorrhoeae bacteria

    NASA Astrophysics Data System (ADS)

    Taktikos, Johannes; Zaburdaev, Vasily; Biais, Nicolas; Stark, Holger; Weitz, David A.

    2012-02-01

    The first step of colonization of Neisseria gonorrhoeae bacteria, the etiological agent of gonorrhea, is the attachment to human epithelial cells. The attachment of N. gonorrhoeae bacteria to surfaces or other cells is primarily mediated by filamentous appendages, called type IV pili (Tfp). Cycles of elongation and retraction of Tfp are responsible for a common bacterial motility called twitching motility which allows the bacteria to crawl over surfaces. Experimentally, N. gonorrhoeae cells initially dispersed over a surface agglomerate into round microcolonies within hours. It is so far not known whether this clustering is driven entirely by the Tfp dynamics or if chemotactic interactions are needed. Thus, we investigate whether the agglomeration may stem solely from the pili-mediated attraction between cells. By developing a statistical model for pili-taxis, we try to explain the experimental measurements of the time evolution of the mean cluster size, number of clusters, and area fraction covered by the cells.

  19. Asteroid shape and spin statistics from convex models

    NASA Astrophysics Data System (ADS)

    Torppa, J.; Hentunen, V.-P.; Pääkkönen, P.; Kehusmaa, P.; Muinonen, K.

    2008-11-01

    We introduce techniques for characterizing convex shape models of asteroids with a small number of parameters, and apply these techniques to a set of 87 models from convex inversion. We present three different approaches for determining the overall dimensions of an asteroid. With the first technique, we measured the dimensions of the shapes in the direction of the rotation axis and in the equatorial plane and with the two other techniques, we derived the best-fit ellipsoid. We also computed the inertia matrix of the model shape to test how well it represents the target asteroid, i.e., to find indications of possible non-convex features or albedo variegation, which the convex shape model cannot reproduce. We used shape models for 87 asteroids to perform statistical analyses and to study dependencies between shape and rotation period, size, and taxonomic type. We detected correlations, but more data are required, especially on small and large objects, as well as slow and fast rotators, to reach a more thorough understanding about the dependencies. Results show, e.g., that convex models of asteroids are not that far from ellipsoids in root-mean-square sense, even though clearly irregular features are present. We also present new spin and shape solutions for Asteroids (31) Euphrosyne, (54) Alexandra, (79) Eurynome, (93) Minerva, (130) Elektra, (376) Geometria, (471) Papagena, and (776) Berbericia. We used a so-called semi-statistical approach to obtain a set of possible spin state solutions. The number of solutions depends on the abundancy of the data, which for Eurynome, Elektra, and Geometria was extensive enough for determining an unambiguous spin and shape solution. Data of Euphrosyne, on the other hand, provided a wide distribution of possible spin solutions, whereas the rest of the targets have two or three possible solutions.

  20. Measurement invariance via multigroup SEM: Issues and solutions with chi-square-difference tests.

    PubMed

    Yuan, Ke-Hai; Chan, Wai

    2016-09-01

    Multigroup structural equation modeling (SEM) plays a key role in studying measurement invariance and in group comparison. When population covariance matrices are deemed not equal across groups, the next step to substantiate measurement invariance is to see whether the sample covariance matrices in all the groups can be adequately fitted by the same factor model, called configural invariance. After configural invariance is established, cross-group equalities of factor loadings, error variances, and factor variances-covariances are then examined in sequence. With mean structures, cross-group equalities of intercepts and factor means are also examined. The established rule is that if the statistic at the current model is not significant at the level of .05, one then moves on to testing the next more restricted model using a chi-square-difference statistic. This article argues that such an established rule is unable to control either Type I or Type II errors. Analysis, an example, and Monte Carlo results show why and how chi-square-difference tests are easily misused. The fundamental issue is that chi-square-difference tests are developed under the assumption that the base model is sufficiently close to the population, and a nonsignificant chi-square statistic tells little about how good the model is. To overcome this issue, this article further proposes that null hypothesis testing in multigroup SEM be replaced by equivalence testing, which allows researchers to effectively control the size of misspecification before moving on to testing a more restricted model. R code is also provided to facilitate the applications of equivalence testing for multigroup SEM. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  1. Seasonal ENSO forecasting: Where does a simple model stand amongst other operational ENSO models?

    NASA Astrophysics Data System (ADS)

    Halide, Halmar

    2017-01-01

    We apply a simple linear multiple regression model called IndOzy for predicting ENSO up to 7 seasonal lead times. The model still used 5 (five) predictors of the past seasonal Niño 3.4 ENSO indices derived from chaos theory and it was rolling-validated to give a one-step ahead forecast. The model skill was evaluated against data from the season of May-June-July (MJJ) 2003 to November-December-January (NDJ) 2015/2016. There were three skill measures such as: Pearson correlation, RMSE, and Euclidean distance were used for forecast verification. The skill of this simple model was than compared to those of combined Statistical and Dynamical models compiled at the IRI (International Research Institute) website. It was found that the simple model was only capable of producing a useful ENSO prediction only up to 3 seasonal leads, while the IRI statistical and Dynamical model skill were still useful up to 4 and 6 seasonal leads, respectively. Even with its short-range seasonal prediction skills, however, the simple model still has a potential to give ENSO-derived tailored products such as probabilistic measures of precipitation and air temperature. Both meteorological conditions affect the presence of wild-land fire hot-spots in Sumatera and Kalimantan. It is suggested that to improve its long-range skill, the simple INDOZY model needs to incorporate a nonlinear model such as an artificial neural network technique.

  2. Modeling time-to-event (survival) data using classification tree analysis.

    PubMed

    Linden, Ariel; Yarnold, Paul R

    2017-12-01

    Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

  3. Turnover intentions in a call center: The role of emotional dissonance, job resources, and job satisfaction.

    PubMed

    Zito, Margherita; Emanuel, Federica; Molino, Monica; Cortese, Claudio Giovanni; Ghislieri, Chiara; Colombo, Lara

    2018-01-01

    Turnover intentions refer to employees' intent to leave the organization and, within call centers, it can be influenced by factors such as relational variables or the perception of the quality of working life, which can be affected by emotional dissonance. This specific job demand to express emotions not felt is peculiar in call centers, and can influence job satisfaction and turnover intentions, a crucial problem among these working contexts. This study aims to detect, within the theoretical framework of the Job Demands-Resources Model, the role of emotional dissonance (job demand), and two resources, job autonomy and supervisors' support, in the perception of job satisfaction and turnover intentions among an Italian call center. The study involved 318 call center agents of an Italian Telecommunication Company. Data analysis first performed descriptive statistics through SPSS 22. A path analysis was then performed through LISREL 8.72 and tested both direct and indirect effects. Results suggest the role of resources in fostering job satisfaction and in decreasing turnover intentions. Emotional dissonance reveals a negative relation with job satisfaction and a positive relation with turnover. Moreover, job satisfaction is negatively related with turnover and mediates the relationship between job resources and turnover. This study contributes to extend the knowledge about the variables influencing turnover intentions, a crucial problem among call centers. Moreover, the study identifies theoretical considerations and practical implications to promote well-being among call center employees. To foster job satisfaction and reduce turnover intentions, in fact, it is important to make resources available, but also to offer specific training programs to make employees and supervisors aware about the consequences of emotional dissonance.

  4. Turnover intentions in a call center: The role of emotional dissonance, job resources, and job satisfaction

    PubMed Central

    Zito, Margherita; Molino, Monica; Cortese, Claudio Giovanni; Ghislieri, Chiara; Colombo, Lara

    2018-01-01

    Background Turnover intentions refer to employees’ intent to leave the organization and, within call centers, it can be influenced by factors such as relational variables or the perception of the quality of working life, which can be affected by emotional dissonance. This specific job demand to express emotions not felt is peculiar in call centers, and can influence job satisfaction and turnover intentions, a crucial problem among these working contexts. This study aims to detect, within the theoretical framework of the Job Demands-Resources Model, the role of emotional dissonance (job demand), and two resources, job autonomy and supervisors’ support, in the perception of job satisfaction and turnover intentions among an Italian call center. Method The study involved 318 call center agents of an Italian Telecommunication Company. Data analysis first performed descriptive statistics through SPSS 22. A path analysis was then performed through LISREL 8.72 and tested both direct and indirect effects. Results Results suggest the role of resources in fostering job satisfaction and in decreasing turnover intentions. Emotional dissonance reveals a negative relation with job satisfaction and a positive relation with turnover. Moreover, job satisfaction is negatively related with turnover and mediates the relationship between job resources and turnover. Conclusion This study contributes to extend the knowledge about the variables influencing turnover intentions, a crucial problem among call centers. Moreover, the study identifies theoretical considerations and practical implications to promote well-being among call center employees. To foster job satisfaction and reduce turnover intentions, in fact, it is important to make resources available, but also to offer specific training programs to make employees and supervisors aware about the consequences of emotional dissonance. PMID:29401507

  5. Bio-inspired computational heuristics to study Lane-Emden systems arising in astrophysics model.

    PubMed

    Ahmad, Iftikhar; Raja, Muhammad Asif Zahoor; Bilal, Muhammad; Ashraf, Farooq

    2016-01-01

    This study reports novel hybrid computational methods for the solutions of nonlinear singular Lane-Emden type differential equation arising in astrophysics models by exploiting the strength of unsupervised neural network models and stochastic optimization techniques. In the scheme the neural network, sub-part of large field called soft computing, is exploited for modelling of the equation in an unsupervised manner. The proposed approximated solutions of higher order ordinary differential equation are calculated with the weights of neural networks trained with genetic algorithm, and pattern search hybrid with sequential quadratic programming for rapid local convergence. The results of proposed solvers for solving the nonlinear singular systems are in good agreements with the standard solutions. Accuracy and convergence the design schemes are demonstrated by the results of statistical performance measures based on the sufficient large number of independent runs.

  6. Automatic classification of animal vocalizations

    NASA Astrophysics Data System (ADS)

    Clemins, Patrick J.

    2005-11-01

    Bioacoustics, the study of animal vocalizations, has begun to use increasingly sophisticated analysis techniques in recent years. Some common tasks in bioacoustics are repertoire determination, call detection, individual identification, stress detection, and behavior correlation. Each research study, however, uses a wide variety of different measured variables, called features, and classification systems to accomplish these tasks. The well-established field of human speech processing has developed a number of different techniques to perform many of the aforementioned bioacoustics tasks. Melfrequency cepstral coefficients (MFCCs) and perceptual linear prediction (PLP) coefficients are two popular feature sets. The hidden Markov model (HMM), a statistical model similar to a finite autonoma machine, is the most commonly used supervised classification model and is capable of modeling both temporal and spectral variations. This research designs a framework that applies models from human speech processing for bioacoustic analysis tasks. The development of the generalized perceptual linear prediction (gPLP) feature extraction model is one of the more important novel contributions of the framework. Perceptual information from the species under study can be incorporated into the gPLP feature extraction model to represent the vocalizations as the animals might perceive them. By including this perceptual information and modifying parameters of the HMM classification system, this framework can be applied to a wide range of species. The effectiveness of the framework is shown by analyzing African elephant and beluga whale vocalizations. The features extracted from the African elephant data are used as input to a supervised classification system and compared to results from traditional statistical tests. The gPLP features extracted from the beluga whale data are used in an unsupervised classification system and the results are compared to labels assigned by experts. The development of a framework from which to build animal vocalization classifiers will provide bioacoustics researchers with a consistent platform to analyze and classify vocalizations. A common framework will also allow studies to compare results across species and institutions. In addition, the use of automated classification techniques can speed analysis and uncover behavioral correlations not readily apparent using traditional techniques.

  7. Pressure balance inconsistency exhibited in a statistical model of magnetospheric plasma

    NASA Astrophysics Data System (ADS)

    Garner, T. W.; Wolf, R. A.; Spiro, R. W.; Thomsen, M. F.; Korth, H.

    2003-08-01

    While quantitative theories of plasma flow from the magnetotail to the inner magnetosphere typically assume adiabatic convection, it has long been understood that these convection models tend to overestimate the plasma pressure in the inner magnetosphere. This phenomenon is called the pressure crisis or the pressure balance inconsistency. In order to analyze it in a new and more detailed manner we utilize an empirical model of the proton and electron distribution functions in the near-Earth plasma sheet (-50 RE < X < -10 RE), which uses the [1989] magnetic field model and a plasma sheet representation based upon several previously published statistical studies. We compare our results to a statistically derived particle distribution function at geosynchronous orbit. In this analysis the particle distribution function is characterized by the isotropic energy invariant λ = EV2/3, where E is the particle's kinetic energy and V is the magnetic flux tube volume. The energy invariant is conserved in guiding center drift under the assumption of strong, elastic pitch angle scattering. If, in addition, loss is negligible, the phase space density f(λ) is also conserved along the same path. The statistical model indicates that f(λ, ?) is approximately independent of X for X ≤ -35 RE but decreases with increasing X for X ≥ -35 RE. The tailward gradient of f(λ, ?) might be attributed to gradient/curvature drift for large isotropic energy invariants but not for small invariants. The tailward gradient of the distribution function indicates a violation of the adiabatic drift condition in the plasma sheet. It also confirms the existence of a "number crisis" in addition to the pressure crisis. In addition, plasma sheet pressure gradients, when crossed with the gradient of flux tube volume computed from the [1989] magnetic field model, indicate Region 1 currents on the dawn and dusk sides of the outer plasma sheet.

  8. Risk estimation using probability machines

    PubMed Central

    2014-01-01

    Background Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. PMID:24581306

  9. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    NASA Astrophysics Data System (ADS)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  10. Risk estimation using probability machines.

    PubMed

    Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H; Bailey-Wilson, Joan E; Malley, James D

    2014-03-01

    Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.

  11. A semiparametric Bayesian proportional hazards model for interval censored data with frailty effects.

    PubMed

    Henschel, Volkmar; Engel, Jutta; Hölzel, Dieter; Mansmann, Ulrich

    2009-02-10

    Multivariate analysis of interval censored event data based on classical likelihood methods is notoriously cumbersome. Likelihood inference for models which additionally include random effects are not available at all. Developed algorithms bear problems for practical users like: matrix inversion, slow convergence, no assessment of statistical uncertainty. MCMC procedures combined with imputation are used to implement hierarchical models for interval censored data within a Bayesian framework. Two examples from clinical practice demonstrate the handling of clustered interval censored event times as well as multilayer random effects for inter-institutional quality assessment. The software developed is called survBayes and is freely available at CRAN. The proposed software supports the solution of complex analyses in many fields of clinical epidemiology as well as health services research.

  12. Functional form diagnostics for Cox's proportional hazards model.

    PubMed

    León, Larry F; Tsai, Chih-Ling

    2004-03-01

    We propose a new type of residual and an easily computed functional form test for the Cox proportional hazards model. The proposed test is a modification of the omnibus test for testing the overall fit of a parametric regression model, developed by Stute, González Manteiga, and Presedo Quindimil (1998, Journal of the American Statistical Association93, 141-149), and is based on what we call censoring consistent residuals. In addition, we develop residual plots that can be used to identify the correct functional forms of covariates. We compare our test with the functional form test of Lin, Wei, and Ying (1993, Biometrika80, 557-572) in a simulation study. The practical application of the proposed residuals and functional form test is illustrated using both a simulated data set and a real data set.

  13. Reinventing Biostatistics Education for Basic Scientists

    PubMed Central

    Weissgerber, Tracey L.; Garovic, Vesna D.; Milin-Lazovic, Jelena S.; Winham, Stacey J.; Obradovic, Zoran; Trzeciakowski, Jerome P.; Milic, Natasa M.

    2016-01-01

    Numerous studies demonstrating that statistical errors are common in basic science publications have led to calls to improve statistical training for basic scientists. In this article, we sought to evaluate statistical requirements for PhD training and to identify opportunities for improving biostatistics education in the basic sciences. We provide recommendations for improving statistics training for basic biomedical scientists, including: 1. Encouraging departments to require statistics training, 2. Tailoring coursework to the students’ fields of research, and 3. Developing tools and strategies to promote education and dissemination of statistical knowledge. We also provide a list of statistical considerations that should be addressed in statistics education for basic scientists. PMID:27058055

  14. The advertisement calls of Brazilian anurans: Historical review, current knowledge and future directions

    PubMed Central

    Gambale, Priscilla Guedes; de Morais, Alessandro Ribeiro; Márquez, Rafael; Bastos, Rogério Pereira

    2018-01-01

    Advertisement calls are often used as essential basic information in studies of animal behaviour, ecology, evolution, conservation, taxonomy or biodiversity inventories. Yet the description of this type of acoustic signals is far to be completed, especially in tropical regions, and is frequently non-standardized or limited in information, restricting the application of bioacoustics in science. Here we conducted a scientometric review of the described adverstisement calls of anuran species of Brazil, the world richest territory in anurans, to evaluate the amount, standard and trends of the knowledge on this key life-history trait and to identify gaps and directions for future research strategies. Based on our review, 607 studies have been published between 1960 to 2016 describing the calls of 719 Brazilian anuran species (68.8% of all species), a publication rate of 10.6 descriptions per year. From each of these studies, thirty-one variables were recorded and examined with descriptive and inferential statistics. In spite of an exponential rise over the last six decades in the number of studies, described calls, and quantity of published metadata, as revealed by regression models, clear shortfalls were identified with regard to anuran families, biomes, and categories of threat. More than 55% of these species belong to the two richest families, Hylidae or Leptodactylidae. The lowest percentage of species with described calls corresponds to the most diverse biomes, namely Atlantic Forest (65.1%) and Amazon (71.5%), and to the IUCN categories of threat (56.8%), relative to the less-than-threatened categories (74.3%). Moreover, only 52.3% of the species have some of its calls deposited in the main scientific sound collections. Our findings evidence remarkable knowledge gaps on advertisement calls of Brazilian anuran species, emphasizing the need of further efforts in standardizing and increasing the description of anuran calls for their application in studies of the behaviour, ecology, biogeography or taxonomy of the species. PMID:29381750

  15. The advertisement calls of Brazilian anurans: Historical review, current knowledge and future directions.

    PubMed

    Guerra, Vinicius; Llusia, Diego; Gambale, Priscilla Guedes; Morais, Alessandro Ribeiro de; Márquez, Rafael; Bastos, Rogério Pereira

    2018-01-01

    Advertisement calls are often used as essential basic information in studies of animal behaviour, ecology, evolution, conservation, taxonomy or biodiversity inventories. Yet the description of this type of acoustic signals is far to be completed, especially in tropical regions, and is frequently non-standardized or limited in information, restricting the application of bioacoustics in science. Here we conducted a scientometric review of the described adverstisement calls of anuran species of Brazil, the world richest territory in anurans, to evaluate the amount, standard and trends of the knowledge on this key life-history trait and to identify gaps and directions for future research strategies. Based on our review, 607 studies have been published between 1960 to 2016 describing the calls of 719 Brazilian anuran species (68.8% of all species), a publication rate of 10.6 descriptions per year. From each of these studies, thirty-one variables were recorded and examined with descriptive and inferential statistics. In spite of an exponential rise over the last six decades in the number of studies, described calls, and quantity of published metadata, as revealed by regression models, clear shortfalls were identified with regard to anuran families, biomes, and categories of threat. More than 55% of these species belong to the two richest families, Hylidae or Leptodactylidae. The lowest percentage of species with described calls corresponds to the most diverse biomes, namely Atlantic Forest (65.1%) and Amazon (71.5%), and to the IUCN categories of threat (56.8%), relative to the less-than-threatened categories (74.3%). Moreover, only 52.3% of the species have some of its calls deposited in the main scientific sound collections. Our findings evidence remarkable knowledge gaps on advertisement calls of Brazilian anuran species, emphasizing the need of further efforts in standardizing and increasing the description of anuran calls for their application in studies of the behaviour, ecology, biogeography or taxonomy of the species.

  16. A local structure model for network analysis

    DOE PAGES

    Casleton, Emily; Nordman, Daniel; Kaiser, Mark

    2017-04-01

    The statistical analysis of networks is a popular research topic with ever widening applications. Exponential random graph models (ERGMs), which specify a model through interpretable, global network features, are common for this purpose. In this study we introduce a new class of models for network analysis, called local structure graph models (LSGMs). In contrast to an ERGM, a LSGM specifies a network model through local features and allows for an interpretable and controllable local dependence structure. In particular, LSGMs are formulated by a set of full conditional distributions for each network edge, e.g., the probability of edge presence/absence, depending onmore » neighborhoods of other edges. Additional model features are introduced to aid in specification and to help alleviate a common issue (occurring also with ERGMs) of model degeneracy. Finally, the proposed models are demonstrated on a network of tornadoes in Arkansas where a LSGM is shown to perform significantly better than a model without local dependence.« less

  17. A local structure model for network analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Casleton, Emily; Nordman, Daniel; Kaiser, Mark

    The statistical analysis of networks is a popular research topic with ever widening applications. Exponential random graph models (ERGMs), which specify a model through interpretable, global network features, are common for this purpose. In this study we introduce a new class of models for network analysis, called local structure graph models (LSGMs). In contrast to an ERGM, a LSGM specifies a network model through local features and allows for an interpretable and controllable local dependence structure. In particular, LSGMs are formulated by a set of full conditional distributions for each network edge, e.g., the probability of edge presence/absence, depending onmore » neighborhoods of other edges. Additional model features are introduced to aid in specification and to help alleviate a common issue (occurring also with ERGMs) of model degeneracy. Finally, the proposed models are demonstrated on a network of tornadoes in Arkansas where a LSGM is shown to perform significantly better than a model without local dependence.« less

  18. Nonlinear Schrödinger approach to European option pricing

    NASA Astrophysics Data System (ADS)

    Wróblewski, Marcin

    2017-05-01

    This paper deals with numerical option pricing methods based on a Schrödinger model rather than the Black-Scholes model. Nonlinear Schrödinger boundary value problems seem to be alternatives to linear models which better reflect the complexity and behavior of real markets. Therefore, based on the nonlinear Schrödinger option pricing model proposed in the literature, in this paper a model augmented by external atomic potentials is proposed and numerically tested. In terms of statistical physics the developed model describes the option in analogy to a pair of two identical quantum particles occupying the same state. The proposed model is used to price European call options on a stock index. the model is calibrated using the Levenberg-Marquardt algorithm based on market data. A Runge-Kutta method is used to solve the discretized boundary value problem numerically. Numerical results are provided and discussed. It seems that our proposal more accurately models phenomena observed in the real market than do linear models.

  19. EPR paradox, quantum nonlocality and physical reality

    NASA Astrophysics Data System (ADS)

    Kupczynski, M.

    2016-03-01

    Eighty years ago Einstein, Podolsky and Rosen demonstrated that instantaneous reduction of wave function, believed to describe completely a pair of entangled physical systems, led to EPR paradox. The paradox disappears in statistical interpretation of quantum mechanics (QM) according to which a wave function describes only an ensemble of identically prepared physical systems. QM predicts strong correlations between outcomes of measurements performed on different members of EPR pairs in far-away locations. Searching for an intuitive explanation of these correlations John Bell analysed so called local realistic hidden variable models and proved that correlations consistent with these models satisfy Bell inequalities which are violated by some predictions of QM and by experimental data. Several different local models were constructed and inequalities proven. Some eminent physicists concluded that Nature is definitely nonlocal and that it is acting according to a law of nonlocal randomness. According to these law perfectly random, but strongly correlated events, can be produced at the same time in far away locations and a local and causal explanation of their occurrence cannot be given. We strongly disagree with this conclusion and we prove the contrary by analysing in detail some influential finite sample proofs of Bell and CHSH inequalities and so called Quantum Randi Challenges. We also show how one can win so called Bell's game without violating locality of Nature. Nonlocal randomness is inconsistent with local quantum field theory, with standard model in elementary particle physics and with causal laws and adaptive dynamics prevailing in the surrounding us world. The experimental violation of Bell-type inequalities does not prove the nonlocality of Nature but it only confirms a contextual character of quantum observables and gives a strong argument against counterfactual definiteness and against a point of view according to which experimental outcomes are produced in irreducible random way.

  20. Supersonic projectile models for asynchronous shooter localization

    NASA Astrophysics Data System (ADS)

    Kozick, Richard J.; Whipps, Gene T.; Ash, Joshua N.

    2011-06-01

    In this work we consider the localization of a gunshot using a distributed sensor network measuring time differences of arrival between a firearm's muzzle blast and the shockwave induced by a supersonic bullet. This so-called MB-SW approach is desirable because time synchronization is not required between the sensors, however it suffers from increased computational complexity and requires knowledge of the bullet's velocity at all points along its trajectory. While the actual velocity profile of a particular gunshot is unknown, one may use a parameterized model for the velocity profile and simultaneously fit the model and localize the shooter. In this paper we study efficient solutions for the localization problem and identify deceleration models that trade off localization accuracy and computational complexity. We also develop a statistical analysis that includes bias due to mismatch between the true and actual deceleration models and covariance due to additive noise.

  1. Applications of MIDAS regression in analysing trends in water quality

    NASA Astrophysics Data System (ADS)

    Penev, Spiridon; Leonte, Daniela; Lazarov, Zdravetz; Mann, Rob A.

    2014-04-01

    We discuss novel statistical methods in analysing trends in water quality. Such analysis uses complex data sets of different classes of variables, including water quality, hydrological and meteorological. We analyse the effect of rainfall and flow on trends in water quality utilising a flexible model called Mixed Data Sampling (MIDAS). This model arises because of the mixed frequency in the data collection. Typically, water quality variables are sampled fortnightly, whereas the rain data is sampled daily. The advantage of using MIDAS regression is in the flexible and parsimonious modelling of the influence of the rain and flow on trends in water quality variables. We discuss the model and its implementation on a data set from the Shoalhaven Supply System and Catchments in the state of New South Wales, Australia. Information criteria indicate that MIDAS modelling improves upon simplistic approaches that do not utilise the mixed data sampling nature of the data.

  2. Statistical Teleodynamics: Toward a Theory of Emergence.

    PubMed

    Venkatasubramanian, Venkat

    2017-10-24

    The central scientific challenge of the 21st century is developing a mathematical theory of emergence that can explain and predict phenomena such as consciousness and self-awareness. The most successful research program of the 20th century, reductionism, which goes from the whole to parts, seems unable to address this challenge. This is because addressing this challenge inherently requires an opposite approach, going from parts to the whole. In addition, reductionism, by the very nature of its inquiry, typically does not concern itself with teleology or purposeful behavior. Modeling emergence, in contrast, requires the addressing of teleology. Together, these two requirements present a formidable challenge in developing a successful mathematical theory of emergence. In this article, I describe a new theory of emergence, called statistical teleodynamics, that addresses certain aspects of the general problem. Statistical teleodynamics is a mathematical framework that unifies three seemingly disparate domains-purpose-free entities in statistical mechanics, human engineered teleological systems in systems engineering, and nature-evolved teleological systems in biology and sociology-within the same conceptual formalism. This theory rests on several key conceptual insights, the most important one being the recognition that entropy mathematically models the concept of fairness in economics and philosophy and, equivalently, the concept of robustness in systems engineering. These insights help prove that the fairest inequality of income is a log-normal distribution, which will emerge naturally at equilibrium in an ideal free market society. Similarly, the theory predicts the emergence of the three classes of network organization-exponential, scale-free, and Poisson-seen widely in a variety of domains. Statistical teleodynamics is the natural generalization of statistical thermodynamics, the most successful parts-to-whole systems theory to date, but this generalization is only a modest step toward a more comprehensive mathematical theory of emergence.

  3. Environmental statistics and optimal regulation

    NASA Astrophysics Data System (ADS)

    Sivak, David; Thomson, Matt

    2015-03-01

    The precision with which an organism can detect its environment, and the timescale for and statistics of environmental change, will affect the suitability of different strategies for regulating protein levels in response to environmental inputs. We propose a general framework--here applied to the enzymatic regulation of metabolism in response to changing nutrient concentrations--to predict the optimal regulatory strategy given the statistics of fluctuations in the environment and measurement apparatus, and the costs associated with enzyme production. We find: (i) relative convexity of enzyme expression cost and benefit influences the fitness of thresholding or graded responses; (ii) intermediate levels of measurement uncertainty call for a sophisticated Bayesian decision rule; and (iii) in dynamic contexts, intermediate levels of uncertainty call for retaining memory of the past. Statistical properties of the environment, such as variability and correlation times, set optimal biochemical parameters, such as thresholds and decay rates in signaling pathways. Our framework provides a theoretical basis for interpreting molecular signal processing algorithms and a classification scheme that organizes known regulatory strategies and may help conceptualize heretofore unknown ones.

  4. Evolution and mass extinctions as lognormal stochastic processes

    NASA Astrophysics Data System (ADS)

    Maccone, Claudio

    2014-10-01

    In a series of recent papers and in a book, this author put forward a mathematical model capable of embracing the search for extra-terrestrial intelligence (SETI), Darwinian Evolution and Human History into a single, unified statistical picture, concisely called Evo-SETI. The relevant mathematical tools are: (1) Geometric Brownian motion (GBM), the stochastic process representing evolution as the stochastic increase of the number of species living on Earth over the last 3.5 billion years. This GBM is well known in the mathematics of finances (Black-Sholes models). Its main features are that its probability density function (pdf) is a lognormal pdf, and its mean value is either an increasing or, more rarely, decreasing exponential function of the time. (2) The probability distributions known as b-lognormals, i.e. lognormals starting at a certain positive instant b>0 rather than at the origin. These b-lognormals were then forced by us to have their peak value located on the exponential mean-value curve of the GBM (Peak-Locus theorem). In the framework of Darwinian Evolution, the resulting mathematical construction was shown to be what evolutionary biologists call Cladistics. (3) The (Shannon) entropy of such b-lognormals is then seen to represent the `degree of progress' reached by each living organism or by each big set of living organisms, like historic human civilizations. Having understood this fact, human history may then be cast into the language of b-lognormals that are more and more organized in time (i.e. having smaller and smaller entropy, or smaller and smaller `chaos'), and have their peaks on the increasing GBM exponential. This exponential is thus the `trend of progress' in human history. (4) All these results also match with SETI in that the statistical Drake equation (generalization of the ordinary Drake equation to encompass statistics) leads just to the lognormal distribution as the probability distribution for the number of extra-terrestrial civilizations existing in the Galaxy (as a consequence of the central limit theorem of statistics). (5) But the most striking new result is that the well-known `Molecular Clock of Evolution', namely the `constant rate of Evolution at the molecular level' as shown by Kimura's Neutral Theory of Molecular Evolution, identifies with growth rate of the entropy of our Evo-SETI model, because they both grew linearly in time since the origin of life. (6) Furthermore, we apply our Evo-SETI model to lognormal stochastic processes other than GBMs. For instance, we provide two models for the mass extinctions that occurred in the past: (a) one based on GBMs and (b) the other based on a parabolic mean value capable of covering both the extinction and the subsequent recovery of life forms. (7) Finally, we show that the Markov & Korotayev (2007, 2008) model for Darwinian Evolution identifies with an Evo-SETI model for which the mean value of the underlying lognormal stochastic process is a cubic function of the time. In conclusion: we have provided a new mathematical model capable of embracing molecular evolution, SETI and entropy into a simple set of statistical equations based upon b-lognormals and lognormal stochastic processes with arbitrary mean, of which the GBMs are the particular case of exponential growth.

  5. Red-shouldered hawk occupancy surveys in central Minnesota, USA

    USGS Publications Warehouse

    Henneman, C.; McLeod, M.A.; Andersen, D.E.

    2007-01-01

    Forest-dwelling raptors are often difficult to detect because many species occur at low density or are secretive. Broadcasting conspecific vocalizations can increase the probability of detecting forest-dwelling raptors and has been shown to be an effective method for locating raptors and assessing their relative abundance. Recent advances in statistical techniques based on presence-absence data use probabilistic arguments to derive probability of detection when it is <1 and to provide a model and likelihood-based method for estimating proportion of sites occupied. We used these maximum-likelihood models with data from red-shouldered hawk (Buteo lineatus) call-broadcast surveys conducted in central Minnesota, USA, in 1994-1995 and 2004-2005. Our objectives were to obtain estimates of occupancy and detection probability 1) over multiple sampling seasons (yr), 2) incorporating within-season time-specific detection probabilities, 3) with call type and breeding stage included as covariates in models of probability of detection, and 4) with different sampling strategies. We visited individual survey locations 2-9 times per year, and estimates of both probability of detection (range = 0.28-0.54) and site occupancy (range = 0.81-0.97) varied among years. Detection probability was affected by inclusion of a within-season time-specific covariate, call type, and breeding stage. In 2004 and 2005 we used survey results to assess the effect that number of sample locations, double sampling, and discontinued sampling had on parameter estimates. We found that estimates of probability of detection and proportion of sites occupied were similar across different sampling strategies, and we suggest ways to reduce sampling effort in a monitoring program.

  6. Data communications in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  7. A SIGNIFICANCE TEST FOR THE LASSO1

    PubMed Central

    Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert

    2014-01-01

    In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and shrinkage—and its null distribution is tractable and asymptotically Exp(1). PMID:25574062

  8. Final Report: Quantification of Uncertainty in Extreme Scale Computations (QUEST)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marzouk, Youssef; Conrad, Patrick; Bigoni, Daniele

    QUEST (\\url{www.quest-scidac.org}) is a SciDAC Institute that is focused on uncertainty quantification (UQ) in large-scale scientific computations. Our goals are to (1) advance the state of the art in UQ mathematics, algorithms, and software; and (2) provide modeling, algorithmic, and general UQ expertise, together with software tools, to other SciDAC projects, thereby enabling and guiding a broad range of UQ activities in their respective contexts. QUEST is a collaboration among six institutions (Sandia National Laboratories, Los Alamos National Laboratory, the University of Southern California, Massachusetts Institute of Technology, the University of Texas at Austin, and Duke University) with a historymore » of joint UQ research. Our vision encompasses all aspects of UQ in leadership-class computing. This includes the well-founded setup of UQ problems; characterization of the input space given available data/information; local and global sensitivity analysis; adaptive dimensionality and order reduction; forward and inverse propagation of uncertainty; handling of application code failures, missing data, and hardware/software fault tolerance; and model inadequacy, comparison, validation, selection, and averaging. The nature of the UQ problem requires the seamless combination of data, models, and information across this landscape in a manner that provides a self-consistent quantification of requisite uncertainties in predictions from computational models. Accordingly, our UQ methods and tools span an interdisciplinary space across applied math, information theory, and statistics. The MIT QUEST effort centers on statistical inference and methods for surrogate or reduced-order modeling. MIT personnel have been responsible for the development of adaptive sampling methods, methods for approximating computationally intensive models, and software for both forward uncertainty propagation and statistical inverse problems. A key software product of the MIT QUEST effort is the MIT Uncertainty Quantification library, called MUQ (\\url{muq.mit.edu}).« less

  9. Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses

    NASA Astrophysics Data System (ADS)

    Huang, Haiping

    2017-05-01

    Revealing hidden features in unlabeled data is called unsupervised feature learning, which plays an important role in pretraining a deep neural network. Here we provide a statistical mechanics analysis of the unsupervised learning in a restricted Boltzmann machine with binary synapses. A message passing equation to infer the hidden feature is derived, and furthermore, variants of this equation are analyzed. A statistical analysis by replica theory describes the thermodynamic properties of the model. Our analysis confirms an entropy crisis preceding the non-convergence of the message passing equation, suggesting a discontinuous phase transition as a key characteristic of the restricted Boltzmann machine. Continuous phase transition is also confirmed depending on the embedded feature strength in the data. The mean-field result under the replica symmetric assumption agrees with that obtained by running message passing algorithms on single instances of finite sizes. Interestingly, in an approximate Hopfield model, the entropy crisis is absent, and a continuous phase transition is observed instead. We also develop an iterative equation to infer the hyper-parameter (temperature) hidden in the data, which in physics corresponds to iteratively imposing Nishimori condition. Our study provides insights towards understanding the thermodynamic properties of the restricted Boltzmann machine learning, and moreover important theoretical basis to build simplified deep networks.

  10. Static Methods in the Design of Nonlinear Automatic Control Systems,

    DTIC Science & Technology

    1984-06-27

    227 Chapter VI. Ways of Decrease of the Number of Statistical Nodes During the Research of Nonlinear Systems...at present occupies the central place. This region of research was called the statistical dynamics of nonlinear H automatic control systems...receives further development in the numerous research of Soviet and C foreign scientists. Special role in the development of the statistical dynamics of

  11. Turbulent Concentration of mm-Size Particles in the Protoplanetary Nebula: Scale-Dependent Cascades

    NASA Technical Reports Server (NTRS)

    Cuzzi, J. N.; Hartlep, T.

    2015-01-01

    The initial accretion of primitive bodies (here, asteroids in particular) from freely-floating nebula particles remains problematic. Traditional growth-by-sticking models encounter a formidable "meter-size barrier" (or even a mm-to-cm-size barrier) in turbulent nebulae, making the preconditions for so-called "streaming instabilities" difficult to achieve even for so-called "lucky" particles. Even if growth by sticking could somehow breach the meter size barrier, turbulent nebulae present further obstacles through the 1-10km size range. On the other hand, nonturbulent nebulae form large asteroids too quickly to explain long spreads in formation times, or the dearth of melted asteroids. Theoretical understanding of nebula turbulence is itself in flux; recent models of MRI (magnetically-driven) turbulence favor low-or- no-turbulence environments, but purely hydrodynamic turbulence is making a comeback, with two recently discovered mechanisms generating robust turbulence which do not rely on magnetic fields at all. An important clue regarding planetesimal formation is an apparent 100km diameter peak in the pre-depletion, pre-erosion mass distribution of asteroids; scenarios leading directly from independent nebula particulates to large objects of this size, which avoid the problematic m-km size range, could be called "leapfrog" scenarios. The leapfrog scenario we have studied in detail involves formation of dense clumps of aerodynamically selected, typically mm-size particles in turbulence, which can under certain conditions shrink inexorably on 100-1000 orbit timescales and form 10-100km diameter sandpile planetesimals. There is evidence that at least the ordinary chondrite parent bodies were initially composed entirely of a homogeneous mix of such particles. Thus, while they are arcane, turbulent concentration models acting directly on chondrule size particles are worthy of deeper study. The typical sizes of planetesimals and the rate of their formation can be estimated using a statistical model with properties inferred from large numerical simulations of turbulence. Nebula turbulence is described by its Reynolds number Re = (L/eta)(exp 4/3), where L = H alpha(exp 1/2) is the largest eddy scale, H is the nebula gas vertical scale height, alpha the turbulent viscosity parameter, and eta is the Kolmogorov or smallest scale in turbulence (typically about 1km), with eddy turnover time t(sub eta). In the nebula, Re is far larger than any numerical simulation can handle, so some physical arguments are needed to extend the results of numerical simulations to nebula conditions. In this paper, we report new physics to be incorporated into our statistical models.

  12. Role of hydrogen in volatile behaviour of defects in SiO2-based electronic devices

    NASA Astrophysics Data System (ADS)

    Wimmer, Yannick; El-Sayed, Al-Moatasem; Gös, Wolfgang; Grasser, Tibor; Shluger, Alexander L.

    2016-06-01

    Charge capture and emission by point defects in gate oxides of metal-oxide-semiconductor field-effect transistors (MOSFETs) strongly affect reliability and performance of electronic devices. Recent advances in experimental techniques used for probing defect properties have led to new insights into their characteristics. In particular, these experimental data show a repeated dis- and reappearance (the so-called volatility) of the defect-related signals. We use multiscale modelling to explain the charge capture and emission as well as defect volatility in amorphous SiO2 gate dielectrics. We first briefly discuss the recent experimental results and use a multiphonon charge capture model to describe the charge-trapping behaviour of defects in silicon-based MOSFETs. We then link this model to ab initio calculations that investigate the three most promising defect candidates. Statistical distributions of defect characteristics obtained from ab initio calculations in amorphous SiO2 are compared with the experimentally measured statistical properties of charge traps. This allows us to suggest an atomistic mechanism to explain the experimentally observed volatile behaviour of defects. We conclude that the hydroxyl-E' centre is a promising candidate to explain all the observed features, including defect volatility.

  13. Limit order book and its modeling in terms of Gibbs Grand-Canonical Ensemble

    NASA Astrophysics Data System (ADS)

    Bicci, Alberto

    2016-12-01

    In the domain of so called Econophysics some attempts have been already made for applying the theory of thermodynamics and statistical mechanics to economics and financial markets. In this paper a similar approach is made from a different perspective, trying to model the limit order book and price formation process of a given stock by the Grand-Canonical Gibbs Ensemble for the bid and ask orders. The application of the Bose-Einstein statistics to this ensemble allows then to derive the distribution of the sell and buy orders as a function of price. As a consequence we can define in a meaningful way expressions for the temperatures of the ensembles of bid orders and of ask orders, which are a function of minimum bid, maximum ask and closure prices of the stock as well as of the exchanged volume of shares. It is demonstrated that the difference between the ask and bid orders temperatures can be related to the VAO (Volume Accumulation Oscillator), an indicator empirically defined in Technical Analysis of stock markets. Furthermore the derived distributions for aggregate bid and ask orders can be subject to well defined validations against real data, giving a falsifiable character to the model.

  14. 75 FR 7445 - Pacific Fishery Management Council; Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-02-19

    ... to Order 1. Opening Remarks and Introductions 2. Roll Call 3. Executive Director's Report 4. Approve... Permits for 2010 SCHEDULE OF ANCILLARY MEETINGS Friday, March 5, 2010 Scientific and Statistical Committee... California Ballroom Salon 2 Scientific and Statistical Committee 8 am California Ballroom Salon 4 Legislative...

  15. ASURV: Astronomical SURVival Statistics

    NASA Astrophysics Data System (ADS)

    Feigelson, E. D.; Nelson, P. I.; Isobe, T.; LaValley, M.

    2014-06-01

    ASURV (Astronomical SURVival Statistics) provides astronomy survival analysis for right- and left-censored data including the maximum-likelihood Kaplan-Meier estimator and several univariate two-sample tests, bivariate correlation measures, and linear regressions. ASURV is written in FORTRAN 77, and is stand-alone and does not call any specialized libraries.

  16. Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis

    PubMed Central

    Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.

    2006-01-01

    In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709

  17. Predictive models in urology.

    PubMed

    Cestari, Andrea

    2013-01-01

    Predictive modeling is emerging as an important knowledge-based technology in healthcare. The interest in the use of predictive modeling reflects advances on different fronts such as the availability of health information from increasingly complex databases and electronic health records, a better understanding of causal or statistical predictors of health, disease processes and multifactorial models of ill-health and developments in nonlinear computer models using artificial intelligence or neural networks. These new computer-based forms of modeling are increasingly able to establish technical credibility in clinical contexts. The current state of knowledge is still quite young in understanding the likely future direction of how this so-called 'machine intelligence' will evolve and therefore how current relatively sophisticated predictive models will evolve in response to improvements in technology, which is advancing along a wide front. Predictive models in urology are gaining progressive popularity not only for academic and scientific purposes but also into the clinical practice with the introduction of several nomograms dealing with the main fields of onco-urology.

  18. Concordance measure and discriminatory accuracy in transformation cure models.

    PubMed

    Zhang, Yilong; Shao, Yongzhao

    2018-01-01

    Many populations of early-stage cancer patients have non-negligible latent cure fractions that can be modeled using transformation cure models. However, there is a lack of statistical metrics to evaluate prognostic utility of biomarkers in this context due to the challenges associated with unknown cure status and heavy censorship. In this article, we develop general concordance measures as evaluation metrics for the discriminatory accuracy of transformation cure models including the so-called promotion time cure models and mixture cure models. We introduce explicit formulas for the consistent estimates of the concordance measures, and show that their asymptotically normal distributions do not depend on the unknown censoring distribution. The estimates work for both parametric and semiparametric transformation models as well as transformation cure models. Numerical feasibility of the estimates and their robustness to the censoring distributions are illustrated via simulation studies and demonstrated using a melanoma data set. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  19. Estimation of genetic variance for macro- and micro-environmental sensitivity using double hierarchical generalized linear models

    PubMed Central

    2013-01-01

    Background Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring. PMID:23827014

  20. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study.

    PubMed

    van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-08-07

    Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.

  1. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study

    PubMed Central

    Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-01-01

    Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160

  2. Assessing the statistical significance of the achieved classification error of classifiers constructed using serum peptide profiles, and a prescription for random sampling repeated studies for massive high-throughput genomic and proteomic studies.

    PubMed

    Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos

    2005-01-01

    Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.

  3. Hidden Markov model tracking of continuous gravitational waves from a binary neutron star with wandering spin. II. Binary orbital phase tracking

    NASA Astrophysics Data System (ADS)

    Suvorova, S.; Clearwater, P.; Melatos, A.; Sun, L.; Moran, W.; Evans, R. J.

    2017-11-01

    A hidden Markov model (HMM) scheme for tracking continuous-wave gravitational radiation from neutron stars in low-mass x-ray binaries (LMXBs) with wandering spin is extended by introducing a frequency-domain matched filter, called the J -statistic, which sums the signal power in orbital sidebands coherently. The J -statistic is similar but not identical to the binary-modulated F -statistic computed by demodulation or resampling. By injecting synthetic LMXB signals into Gaussian noise characteristic of the Advanced Laser Interferometer Gravitational-wave Observatory (Advanced LIGO), it is shown that the J -statistic HMM tracker detects signals with characteristic wave strain h0≥2 ×10-26 in 370 d of data from two interferometers, divided into 37 coherent blocks of equal length. When applied to data from Stage I of the Scorpius X-1 Mock Data Challenge organized by the LIGO Scientific Collaboration, the tracker detects all 50 closed injections (h0≥6.84 ×10-26), recovering the frequency with a root-mean-square accuracy of ≤1.95 ×10-5 Hz . Of the 50 injections, 43 (with h0≥1.09 ×10-25) are detected in a single, coherent 10 d block of data. The tracker employs an efficient, recursive HMM solver based on the Viterbi algorithm, which requires ˜105 CPU-hours for a typical broadband (0.5 kHz) LMXB search.

  4. Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

    PubMed Central

    Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul

    2016-01-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257

  5. A non-parametric peak calling algorithm for DamID-Seq.

    PubMed

    Li, Renhua; Hempel, Leonie U; Jiang, Tingbo

    2015-01-01

    Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  6. Structurally Sound Statistics Instruction

    ERIC Educational Resources Information Center

    Casey, Stephanie A.; Bostic, Jonathan D.

    2016-01-01

    The Common Core's Standards for Mathematical Practice (SMP) call for all K-grade 12 students to develop expertise in the processes and proficiencies of doing mathematics. However, the Common Core State Standards for Mathematics (CCSSM) (CCSSI 2010) as a whole addresses students' learning of not only mathematics but also statistics. This situation…

  7. ENVIRONMENTAL MONITORING AND ASSESSMENT PROGRAM (EMAP): WESTERN STREAMS AND RIVERS STATISTICAL SUMMARY

    EPA Science Inventory

    This statistical summary reports data from the Environmental Monitoring and Assessment Program (EMAP) Western Pilot (EMAP-W). EMAP-W was a sample survey (or probability survey, often simply called 'random') of streams and rivers in 12 states of the western U.S. (Arizona, Californ...

  8. Survival analysis, or what to do with upper limits in astronomical surveys

    NASA Technical Reports Server (NTRS)

    Isobe, Takashi; Feigelson, Eric D.

    1986-01-01

    A field of applied statistics called survival analysis has been developed over several decades to deal with censored data, which occur in astronomical surveys when objects are too faint to be detected. How these methods can assist in the statistical interpretation of astronomical data are reviewed.

  9. Soldier Decision-Making for Allocation of Intelligence, Surveillance, and Reconnaissance Assets

    DTIC Science & Technology

    2014-06-01

    Judgments; also called Algoritmic or Statistical Judgements Computer Science , Psychology, and Statistics Actuarial or algorithmic...Jan. 2011. [17] R. M. Dawes, D. Faust, and P. E. Meehl, “Clinical versus Actuarial Judgment,” Science , vol. 243, no. 4899, pp. 1668–1674, 1989. [18...School of Computer Science

  10. 17 CFR Appendix A to Part 145 - Compilation of Commission Records Available to the Public

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... photographs. (10) Statistical data concerning the Commission's budget. (11) Statistical data concerning...) Complaint packages, which contain the Reparation Rules, Brochure “Questions and Answers About How You Can... grain reports. (2) Weekly cotton or call reports. (f) Division of Enforcement. Complaint package...

  11. Cosmological Distance Scale to Gamma-Ray Bursts

    NASA Astrophysics Data System (ADS)

    Azzam, W. J.; Linder, E. V.; Petrosian, V.

    1993-05-01

    The source counts or the so-called log N -- log S relations are the primary data that constrain the spatial distribution of sources with unknown distances, such as gamma-ray bursts. In order to test galactic, halo, and cosmological models for gamma-ray bursts we compare theoretical characteristics of the log N -- log S relations to those obtained from data gathered by the BATSE instrument on board the Compton Observatory (GRO) and other instruments. We use a new and statistically correct method, that takes proper account of the variable nature of the triggering threshold, to analyze the data. Constraints on models obtained by this comparison will be presented. This work is supported by NASA grants NAGW 2290, NAG5 2036, and NAG5 1578.

  12. Molecular Dynamic Simulation of Water Vapor and Determination of Diffusion Characteristics in the Pore

    NASA Astrophysics Data System (ADS)

    Nikonov, Eduard G.; Pavluš, Miron; Popovičová, Mária

    2018-02-01

    One of the varieties of pores, often found in natural or artificial building materials, are the so-called blind pores of dead-end or saccate type. Three-dimensional model of such kind of pore has been developed in this work. This model has been used for simulation of water vapor interaction with individual pore by molecular dynamics in combination with the diffusion equation method. Special investigations have been done to find dependencies between thermostats implementations and conservation of thermodynamic and statistical values of water vapor - pore system. The two types of evolution of water - pore system have been investigated: drying and wetting of the pore. Full research of diffusion coefficient, diffusion velocity and other diffusion parameters has been made.

  13. Spectrophotometry of comets Giacobini-Zinner and Halley

    NASA Technical Reports Server (NTRS)

    Tegler, Stephen C.; O'Dell, C. R.

    1987-01-01

    Optical window spectrophotometry was performed on comets Giacobini-Zinner and Halley over the interval 300-1000 nm. Band and band-sequence fluxes were obtained for the brightest features of OH, CN, NH, and C2, special care having been given to determinations of extinction, instrumental sensitivities, and corrections for Fraunhofer lines. C2 Swan band-sequence flux ratios were determined with unprecedented accuracy and compared with the predictions of the detailed equilibrium models of Krishna Swamy et al. (1977, 1979, 1981, and 1987). It is found that these band sequences do not agree with the predictions, which calls into question the assumptions made in deriving the model, namely resonance fluorescence statistical equilibrium. Suggestions are made as to how to resolve this discrepancy.

  14. Refining the detection of the zero crossing for the three-gluon vertex in symmetric and asymmetric momentum subtraction schemes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boucaud, Ph.; De Soto, F.; Rodriguez-Quintero, J.

    This article reports on the detailed study of the three-gluon vertex in four-dimensional $SU(3)$ Yang-Mills theory employing lattice simulations with large physical volumes and high statistics. A meticulous scrutiny of the so-called symmetric and asymmetric kinematical configurations is performed and it is shown that the associated form-factor changes sign at a given range of momenta. Here, the lattice results are compared to the model independent predictions of Schwinger-Dyson equations and a very good agreement among the two is found.

  15. Refining the detection of the zero crossing for the three-gluon vertex in symmetric and asymmetric momentum subtraction schemes

    DOE PAGES

    Boucaud, Ph.; De Soto, F.; Rodriguez-Quintero, J.; ...

    2017-06-14

    This article reports on the detailed study of the three-gluon vertex in four-dimensional $SU(3)$ Yang-Mills theory employing lattice simulations with large physical volumes and high statistics. A meticulous scrutiny of the so-called symmetric and asymmetric kinematical configurations is performed and it is shown that the associated form-factor changes sign at a given range of momenta. Here, the lattice results are compared to the model independent predictions of Schwinger-Dyson equations and a very good agreement among the two is found.

  16. The cardiorespiratory interaction: a nonlinear stochastic model and its synchronization properties

    NASA Astrophysics Data System (ADS)

    Bahraminasab, A.; Kenwright, D.; Stefanovska, A.; McClintock, P. V. E.

    2007-06-01

    We address the problem of interactions between the phase of cardiac and respiration oscillatory components. The coupling between these two quantities is experimentally investigated by the theory of stochastic Markovian processes. The so-called Markov analysis allows us to derive nonlinear stochastic equations for the reconstruction of the cardiorespiratory signals. The properties of these equations provide interesting new insights into the strength and direction of coupling which enable us to divide the couplings to two parts: deterministic and stochastic. It is shown that the synchronization behaviors of the reconstructed signals are statistically identical with original one.

  17. Constrained Stochastic Extended Redundancy Analysis.

    PubMed

    DeSarbo, Wayne S; Hwang, Heungsun; Stadler Blank, Ashley; Kappe, Eelco

    2015-06-01

    We devise a new statistical methodology called constrained stochastic extended redundancy analysis (CSERA) to examine the comparative impact of various conceptual factors, or drivers, as well as the specific predictor variables that contribute to each driver on designated dependent variable(s). The technical details of the proposed methodology, the maximum likelihood estimation algorithm, and model selection heuristics are discussed. A sports marketing consumer psychology application is provided in a Major League Baseball (MLB) context where the effects of six conceptual drivers of game attendance and their defining predictor variables are estimated. Results compare favorably to those obtained using traditional extended redundancy analysis (ERA).

  18. Somatic experiencing treatment with social service workers following Hurricanes Katrina and Rita.

    PubMed

    Leitch, M Laurie; Vanslyke, Jan; Allen, Marisa

    2009-01-01

    In a disaster, social service workers are often survivors themselves.This study examines whether somatic intervention using a brief (one to two session) stabilization model now called the Trauma Resiliency Model (TRM), which uses the skills of Somatic Experiencing (SE), can reduce the postdisaster symptoms of social service workers involved in postdisaster service delivery.The study was implemented with a nonrandom sample of 142 social service workers who were survivors of Hurricanes Katrina and Rita in New Orleans and Baton Rouge, Louisiana, two to three months after the disasters. Ninety-one participants received SE/TRM and were compared with a matched comparison group of 51 participants through the use of propensity score matching. All participants first received group psychoeducation. Results support the benefits of the brief intervention inspired by SE. The treatment group showed statistically significant gains in resiliency indicators and decreases in posttraumatic stress disorder symptoms. Although psychological symptoms increased in both groups at the three to four month follow-up, the treatment group's psychological symptoms were statistically lower than those of the comparison group.

  19. Length bias correction in gene ontology enrichment analysis using logistic regression.

    PubMed

    Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

    2012-01-01

    When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

  20. Fingerprint identification: advances since the 2009 National Research Council report

    PubMed Central

    Champod, Christophe

    2015-01-01

    This paper will discuss the major developments in the area of fingerprint identification that followed the publication of the National Research Council (NRC, of the US National Academies of Sciences) report in 2009 entitled: Strengthening Forensic Science in the United States: A Path Forward. The report portrayed an image of a field of expertise used for decades without the necessary scientific research-based underpinning. The advances since the report and the needs in selected areas of fingerprinting will be detailed. It includes the measurement of the accuracy, reliability, repeatability and reproducibility of the conclusions offered by fingerprint experts. The paper will also pay attention to the development of statistical models allowing assessment of fingerprint comparisons. As a corollary of these developments, the next challenge is to reconcile a traditional practice dominated by deterministic conclusions with the probabilistic logic of any statistical model. There is a call for greater candour and fingerprint experts will need to communicate differently on the strengths and limitations of their findings. Their testimony will have to go beyond the blunt assertion of the uniqueness of fingerprints or the opinion delivered ispe dixit. PMID:26101284

  1. The Complete Redistribution Approximation in Optically Thick Line-Driven Winds

    NASA Astrophysics Data System (ADS)

    Gayley, K. G.; Onifer, A. J.

    2001-05-01

    Wolf-Rayet winds are thought to exhibit large momentum fluxes, which has in part been explained by ionization stratification in the wind. However, it the cause of high mass loss, not high momentum flux, that remains largely a mystery, because standard models fail to achieve sufficient acceleration near the surface where the mass-loss rate is set. We consider a radiative transfer approximation that allows for the dynamics of optically thick Wolf-Rayet winds to be modeled without detailed treatment of the radiation field, called the complete redistribution approximation. In it, it is assumed that thermalization processes cause the photon frequencies to be completely randomized over the course of propagating through the wind, which allows the radiation field to be treated statistically rather than in detail. Thus the approach is similar to the statistical treatment of the line list used in the celebrated CAK approach. The results differ from the effectively gray treatment in that the radiation field is influenced by the line distribution, and the role of gaps in the line distribution is enhanced. The ramifications for the driving of large mass-loss rates is explored.

  2. Proof-of-Concept Demonstrations for Computation-Based Human Reliability Analysis. Modeling Operator Performance During Flooding Scenarios

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joe, Jeffrey Clark; Boring, Ronald Laurids; Herberger, Sarah Elizabeth Marie

    The United States (U.S.) Department of Energy (DOE) Light Water Reactor Sustainability (LWRS) program has the overall objective to help sustain the existing commercial nuclear power plants (NPPs). To accomplish this program objective, there are multiple LWRS “pathways,” or research and development (R&D) focus areas. One LWRS focus area is called the Risk-Informed Safety Margin and Characterization (RISMC) pathway. Initial efforts under this pathway to combine probabilistic and plant multi-physics models to quantify safety margins and support business decisions also included HRA, but in a somewhat simplified manner. HRA experts at Idaho National Laboratory (INL) have been collaborating with othermore » experts to develop a computational HRA approach, called the Human Unimodel for Nuclear Technology to Enhance Reliability (HUNTER), for inclusion into the RISMC framework. The basic premise of this research is to leverage applicable computational techniques, namely simulation and modeling, to develop and then, using RAVEN as a controller, seamlessly integrate virtual operator models (HUNTER) with 1) the dynamic computational MOOSE runtime environment that includes a full-scope plant model, and 2) the RISMC framework PRA models already in use. The HUNTER computational HRA approach is a hybrid approach that leverages past work from cognitive psychology, human performance modeling, and HRA, but it is also a significant departure from existing static and even dynamic HRA methods. This report is divided into five chapters that cover the development of an external flooding event test case and associated statistical modeling considerations.« less

  3. Trauma-related dissociation and altered states of consciousness: a call for clinical, treatment, and neuroscience research

    PubMed Central

    Lanius, Ruth A.

    2015-01-01

    The primary aim of this commentary is to describe trauma-related dissociation and altered states of consciousness in the context of a four-dimensional model that has recently been proposed (Frewen & Lanius, 2015). This model categorizes symptoms of trauma-related psychopathology into (1) those that occur within normal waking consciousness and (2) those that are dissociative and are associated with trauma-related altered states of consciousness (TRASC) along four dimensions: (1) time; (2) thought; (3) body; and (4) emotion. Clinical applications and future research directions relevant to each dimension are discussed. Conceptualizing TRASC across the dimensions of time, thought, body, and emotion has transdiagnostic implications for trauma-related disorders described in both the Diagnostic Statistical Manual and the International Classifications of Diseases. The four-dimensional model provides a framework, guided by existing models of dissociation, for future research examining the phenomenological, neurobiological, and physiological underpinnings of trauma-related dissociation. PMID:25994026

  4. Search for a dark photon in e(+)e(-) collisions at BABAR.

    PubMed

    Lees, J P; Poireau, V; Tisserand, V; Grauges, E; Palano, A; Eigen, G; Stugu, B; Brown, D N; Feng, M; Kerth, L T; Kolomensky, Yu G; Lee, M J; Lynch, G; Koch, H; Schroeder, T; Hearty, C; Mattison, T S; McKenna, J A; So, R Y; Khan, A; Blinov, V E; Buzykaev, A R; Druzhinin, V P; Golubev, V B; Kravchenko, E A; Onuchin, A P; Serednyakov, S I; Skovpen, Yu I; Solodov, E P; Todyshev, K Yu; Lankford, A J; Mandelkern, M; Dey, B; Gary, J W; Long, O; Campagnari, C; Franco Sevilla, M; Hong, T M; Kovalskyi, D; Richman, J D; West, C A; Eisner, A M; Lockman, W S; Panduro Vazquez, W; Schumm, B A; Seiden, A; Chao, D S; Cheng, C H; Echenard, B; Flood, K T; Hitlin, D G; Miyashita, T S; Ongmongkolkul, P; Porter, F C; Andreassen, R; Huard, Z; Meadows, B T; Pushpawela, B G; Sokoloff, M D; Sun, L; Bloom, P C; Ford, W T; Gaz, A; Smith, J G; Wagner, S R; Ayad, R; Toki, W H; Spaan, B; Bernard, D; Verderi, M; Playfer, S; Bettoni, D; Bozzi, C; Calabrese, R; Cibinetto, G; Fioravanti, E; Garzia, I; Luppi, E; Piemontese, L; Santoro, V; Calcaterra, A; de Sangro, R; Finocchiaro, G; Martellotti, S; Patteri, P; Peruzzi, I M; Piccolo, M; Rama, M; Zallo, A; Contri, R; Lo Vetere, M; Monge, M R; Passaggio, S; Patrignani, C; Robutti, E; Bhuyan, B; Prasad, V; Adametz, A; Uwer, U; Lacker, H M; Dauncey, P D; Mallik, U; Chen, C; Cochran, J; Prell, S; Ahmed, H; Gritsan, A V; Arnaud, N; Davier, M; Derkach, D; Grosdidier, G; Le Diberder, F; Lutz, A M; Malaescu, B; Roudeau, P; Stocchi, A; Wormser, G; Lange, D J; Wright, D M; Coleman, J P; Fry, J R; Gabathuler, E; Hutchcroft, D E; Payne, D J; Touramanis, C; Bevan, A J; Di Lodovico, F; Sacco, R; Cowan, G; Bougher, J; Brown, D N; Davis, C L; Denig, A G; Fritsch, M; Gradl, W; Griessinger, K; Hafner, A; Schubert, K R; Barlow, R J; Lafferty, G D; Cenci, R; Hamilton, B; Jawahery, A; Roberts, D A; Cowan, R; Sciolla, G; Cheaib, R; Patel, P M; Robertson, S H; Neri, N; Palombo, F; Cremaldi, L; Godang, R; Sonnek, P; Summers, D J; Simard, M; Taras, P; De Nardo, G; Onorato, G; Sciacca, C; Martinelli, M; Raven, G; Jessop, C P; LoSecco, J M; Honscheid, K; Kass, R; Feltresi, E; Margoni, M; Morandin, M; Posocco, M; Rotondo, M; Simi, G; Simonetto, F; Stroili, R; Akar, S; Ben-Haim, E; Bomben, M; Bonneaud, G R; Briand, H; Calderini, G; Chauveau, J; Leruste, Ph; Marchiori, G; Ocariz, J; Biasini, M; Manoni, E; Pacetti, S; Rossi, A; Angelini, C; Batignani, G; Bettarini, S; Carpinelli, M; Casarosa, G; Cervelli, A; Chrzaszcz, M; Forti, F; Giorgi, M A; Lusiani, A; Oberhof, B; Paoloni, E; Perez, A; Rizzo, G; Walsh, J J; Lopes Pegna, D; Olsen, J; Smith, A J S; Faccini, R; Ferrarotto, F; Ferroni, F; Gaspero, M; Li Gioi, L; Pilloni, A; Piredda, G; Bünger, C; Dittrich, S; Grünberg, O; Hartmann, T; Hess, M; Leddig, T; Voß, C; Waldi, R; Adye, T; Olaiya, E O; Wilson, F F; Emery, S; Vasseur, G; Anulli, F; Aston, D; Bard, D J; Cartaro, C; Convery, M R; Dorfan, J; Dubois-Felsmann, G P; Dunwoodie, W; Ebert, M; Field, R C; Fulsom, B G; Graham, M T; Hast, C; Innes, W R; Kim, P; Leith, D W G S; Lewis, P; Lindemann, D; Luitz, S; Luth, V; Lynch, H L; MacFarlane, D B; Muller, D R; Neal, H; Perl, M; Pulliam, T; Ratcliff, B N; Roodman, A; Salnikov, A A; Schindler, R H; Snyder, A; Su, D; Sullivan, M K; Va'vra, J; Wisniewski, W J; Wulsin, H W; Purohit, M V; White, R M; Wilson, J R; Randle-Conde, A; Sekula, S J; Bellis, M; Burchat, P R; Puccio, E M T; Alam, M S; Ernst, J A; Gorodeisky, R; Guttman, N; Peimer, D R; Soffer, A; Spanier, S M; Ritchie, J L; Ruland, A M; Schwitters, R F; Wray, B C; Izen, J M; Lou, X C; Bianchi, F; De Mori, F; Filippi, A; Gamba, D; Lanceri, L; Vitale, L; Martinez-Vidal, F; Oyanguren, A; Villanueva-Perez, P; Albert, J; Banerjee, Sw; Beaulieu, A; Bernlochner, F U; Choi, H H F; King, G J; Kowalewski, R; Lewczuk, M J; Lueck, T; Nugent, I M; Roney, J M; Sobie, R J; Tasneem, N; Gershon, T J; Harrison, P F; Latham, T E; Band, H R; Dasu, S; Pan, Y; Prepost, R; Wu, S L

    2014-11-14

    Dark sectors charged under a new Abelian interaction have recently received much attention in the context of dark matter models. These models introduce a light new mediator, the so-called dark photon (A^{'}), connecting the dark sector to the standard model. We present a search for a dark photon in the reaction e^{+}e^{-}→γA^{'}, A^{'}→e^{+}e^{-}, μ^{+}μ^{-} using 514  fb^{-1} of data collected with the BABAR detector. We observe no statistically significant deviations from the standard model predictions, and we set 90% confidence level upper limits on the mixing strength between the photon and dark photon at the level of 10^{-4}-10^{-3} for dark photon masses in the range 0.02-10.2  GeV. We further constrain the range of the parameter space favored by interpretations of the discrepancy between the calculated and measured anomalous magnetic moment of the muon.

  5. Dark/visible parallel universes and Big Bang nucleosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertulani, C. A.; Frederico, T.; Fuqua, J.

    We develop a model for visible matter-dark matter interaction based on the exchange of a massive gray boson called herein the Mulato. Our model hinges on the assumption that all known particles in the visible matter have their counterparts in the dark matter. We postulate six families of particles five of which are dark. This leads to the unavoidable postulation of six parallel worlds, the visible one and five invisible worlds. A close study of big bang nucleosynthesis (BBN), baryon asymmetries, cosmic microwave background (CMB) bounds, galaxy dynamics, together with the Standard Model assumptions, help us to set a limitmore » on the mass and width of the new gauge boson. Modification of the statistics underlying the kinetic energy distribution of particles during the BBN is also discussed. The changes in reaction rates during the BBN due to a departure from the Debye-Hueckel electron screening model is also investigated.« less

  6. Consistent Partial Least Squares Path Modeling via Regularization.

    PubMed

    Jung, Sunho; Park, JaeHong

    2018-01-01

    Partial least squares (PLS) path modeling is a component-based structural equation modeling that has been adopted in social and psychological research due to its data-analytic capability and flexibility. A recent methodological advance is consistent PLS (PLSc), designed to produce consistent estimates of path coefficients in structural models involving common factors. In practice, however, PLSc may frequently encounter multicollinearity in part because it takes a strategy of estimating path coefficients based on consistent correlations among independent latent variables. PLSc has yet no remedy for this multicollinearity problem, which can cause loss of statistical power and accuracy in parameter estimation. Thus, a ridge type of regularization is incorporated into PLSc, creating a new technique called regularized PLSc. A comprehensive simulation study is conducted to evaluate the performance of regularized PLSc as compared to its non-regularized counterpart in terms of power and accuracy. The results show that our regularized PLSc is recommended for use when serious multicollinearity is present.

  7. Urns and Chameleons: two metaphors for two different types of measurements

    NASA Astrophysics Data System (ADS)

    Accardi, Luigi

    2013-09-01

    The awareness of the physical possibility of models of space, alternative with respect to the Euclidean one, begun to emerge towards the end of the 19-th century. At the end of the 20-th century a similar awareness emerged concerning the physical possibility of models of the laws of chance alternative with respect to the classical probabilistic models (Kolmogorov model). In geometry the mathematical construction of several non-Euclidean models of space preceded of about one century their applications in physics, which came with the theory of relativity. In physics the opposite situation took place. In fact, while the first example of non Kolmogorov probabilistic models emerged in quantum physics approximately one century ago, at the beginning of 1900, the awareness of the fact that this new mathematical formalism reflected a new mathematical model of the laws of chance had to wait until the early 1980's. In this long time interval the classical and the new probabilistic models were both used in the description and the interpretation of quantum phenomena and negatively interfered with each other because of the absence (for many decades) of a mathematical theory that clearly delimited the respective domains of application. The result of this interference was the emergence of the so-called the "paradoxes of quantum theory". For several decades there have been many different attempts to solve these paradoxes giving rise to what K. Popper baptized "the great quantum muddle": a debate which has been at the core of the philosophy of science for more than 50 years. However these attempts have led to contradictions between the two fundamental theories of the contemporary physical: the quantum theory and the theory of the relativity. Quantum probability identifies the reason of the emergence of non Kolmogorov models, and therefore of the so-called the paradoxes of quantum theory, in the difference between the notion of passive measurements like "reading pre-existent properties" (urn metaphor) and measurements consisting in reading "a response to an interaction" (chameleon metaphor). The non-trivial point is that one can prove that, while the urn scheme cannot lead to empirical data outside of classic probability, response based measurements can give rise to non classical statistics. The talk will include entirely classical examples of non classical statistics and potential applications to economic, sociological or biomedical phenomena.

  8. Muver, a computational framework for accurately calling accumulated mutations.

    PubMed

    Burkholder, Adam B; Lujan, Scott A; Lavender, Christopher A; Grimm, Sara A; Kunkel, Thomas A; Fargo, David C

    2018-05-09

    Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.

  9. On statistical inference in time series analysis of the evolution of road safety.

    PubMed

    Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora

    2013-11-01

    Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.

  10. Extreme current fluctuations in lattice gases: Beyond nonequilibrium steady states

    NASA Astrophysics Data System (ADS)

    Meerson, Baruch; Sasorov, Pavel V.

    2014-01-01

    We use the macroscopic fluctuation theory (MFT) to study large current fluctuations in nonstationary diffusive lattice gases. We identify two universality classes of these fluctuations, which we call elliptic and hyperbolic. They emerge in the limit when the deterministic mass flux is small compared to the mass flux due to the shot noise. The two classes are determined by the sign of compressibility of effective fluid, obtained by mapping the MFT into an inviscid hydrodynamics. An example of the elliptic class is the symmetric simple exclusion process, where, for some initial conditions, we can solve the effective hydrodynamics exactly. This leads to a super-Gaussian extreme current statistics conjectured by Derrida and Gerschenfeld [J. Stat. Phys. 137, 978 (2009), 10.1007/s10955-009-9830-1] and yields the optimal path of the system. For models of the hyperbolic class, the deterministic mass flux cannot be neglected, leading to a different extreme current statistics.

  11. Unbiased estimation of oceanic mean rainfall from satellite borne radiometer measurements

    NASA Technical Reports Server (NTRS)

    Mittal, M. C.

    1981-01-01

    The statistical properties of the radar derived rainfall obtained during the GARP Atlantic Tropical Experiment (GATE) are used to derive quantitative estimates of the spatial and temporal sampling errors associated with estimating rainfall from brightness temperature measurements such as would be obtained from a satelliteborne microwave radiometer employing a practical size antenna aperture. A basis for a method of correcting the so called beam filling problem, i.e., for the effect of nonuniformity of rainfall over the radiometer beamwidth is provided. The method presented employs the statistical properties of the observations themselves without need for physical assumptions beyond those associated with the radiative transfer model. The simulation results presented offer a validation of the estimated accuracy that can be achieved and the graphs included permit evaluation of the effect of the antenna resolution on both the temporal and spatial sampling errors.

  12. Pseudochaotic dynamics near global periodicity

    NASA Astrophysics Data System (ADS)

    Fan, Rong; Zaslavsky, George M.

    2007-09-01

    In this paper, we study a piecewise linear version of kicked oscillator model: saw-tooth map. A special case of global periodicity, in which every phase point belongs to a periodic orbit, is presented. With few analytic results known for the corresponding map on torus, we numerically investigate transport properties and statistical behavior of Poincaré recurrence time in two cases of deviation from global periodicity. A non-KAM behavior of the system, as well as subdiffusion and superdiffusion, are observed through numerical simulations. Statistics of Poincaré recurrences shows Kac lemma is valid in the system and there is a relation between the transport exponent and the Poincaré recurrence exponent. We also perform careful numerical computation of capacity, information and correlation dimensions of the so-called exceptional set in both cases. Our results show that the fractal dimension of the exceptional set is strictly less than 2 and that the fractal structures are unifractal rather than multifractal.

  13. Spontaneous collective synchronization in the Kuramoto model with additional non-local interactions

    NASA Astrophysics Data System (ADS)

    Gupta, Shamik

    2017-10-01

    In the context of the celebrated Kuramoto model of globally-coupled phase oscillators of distributed natural frequencies, which serves as a paradigm to investigate spontaneous collective synchronization in many-body interacting systems, we report on a very rich phase diagram in presence of thermal noise and an additional non-local interaction on a one-dimensional periodic lattice. Remarkably, the phase diagram involves both equilibrium and non-equilibrium phase transitions. In two contrasting limits of the dynamics, we obtain exact analytical results for the phase transitions. These two limits correspond to (i) the absence of thermal noise, when the dynamics reduces to that of a non-linear dynamical system, and (ii) the oscillators having the same natural frequency, when the dynamics becomes that of a statistical system in contact with a heat bath and relaxing to a statistical equilibrium state. In the former case, our exact analysis is based on the use of the so-called Ott-Antonsen ansatz to derive a reduced set of nonlinear partial differential equations for the macroscopic evolution of the system. Our results for the case of statistical equilibrium are on the other hand obtained by extending the well-known transfer matrix approach for nearest-neighbor Ising model to consider non-local interactions. The work offers a case study of exact analysis in many-body interacting systems. The results obtained underline the crucial role of additional non-local interactions in either destroying or enhancing the possibility of observing synchrony in mean-field systems exhibiting spontaneous synchronization.

  14. Generation of dense statistical connectomes from sparse morphological data

    PubMed Central

    Egger, Robert; Dercksen, Vincent J.; Udvary, Daniel; Hege, Hans-Christian; Oberlaender, Marcel

    2014-01-01

    Sensory-evoked signal flow, at cellular and network levels, is primarily determined by the synaptic wiring of the underlying neuronal circuitry. Measurements of synaptic innervation, connection probabilities and subcellular organization of synaptic inputs are thus among the most active fields of research in contemporary neuroscience. Methods to measure these quantities range from electrophysiological recordings over reconstructions of dendrite-axon overlap at light-microscopic levels to dense circuit reconstructions of small volumes at electron-microscopic resolution. However, quantitative and complete measurements at subcellular resolution and mesoscopic scales to obtain all local and long-range synaptic in/outputs for any neuron within an entire brain region are beyond present methodological limits. Here, we present a novel concept, implemented within an interactive software environment called NeuroNet, which allows (i) integration of sparsely sampled (sub)cellular morphological data into an accurate anatomical reference frame of the brain region(s) of interest, (ii) up-scaling to generate an average dense model of the neuronal circuitry within the respective brain region(s) and (iii) statistical measurements of synaptic innervation between all neurons within the model. We illustrate our approach by generating a dense average model of the entire rat vibrissal cortex, providing the required anatomical data, and illustrate how to measure synaptic innervation statistically. Comparing our results with data from paired recordings in vitro and in vivo, as well as with reconstructions of synaptic contact sites at light- and electron-microscopic levels, we find that our in silico measurements are in line with previous results. PMID:25426033

  15. Building flexible real-time systems using the Flex language

    NASA Technical Reports Server (NTRS)

    Kenny, Kevin B.; Lin, Kwei-Jay

    1991-01-01

    The design and implementation of a real-time programming language called Flex, which is a derivative of C++, are presented. It is shown how different types of timing requirements might be expressed and enforced in Flex, how they might be fulfilled in a flexible way using different program models, and how the programming environment can help in making binding and scheduling decisions. The timing constraint primitives in Flex are easy to use yet powerful enough to define both independent and relative timing constraints. Program models like imprecise computation and performance polymorphism can carry out flexible real-time programs. In addition, programmers can use a performance measurement tool that produces statistically correct timing models to predict the expected execution time of a program and to help make binding decisions. A real-time programming environment is also presented.

  16. The beta Burr type X distribution properties with application.

    PubMed

    Merovci, Faton; Khaleel, Mundher Abdullah; Ibrahim, Noor Akma; Shitan, Mahendran

    2016-01-01

    We develop a new continuous distribution called the beta-Burr type X distribution that extends the Burr type X distribution. The properties provide a comprehensive mathematical treatment of this distribution. Further more, various structural properties of the new distribution are derived, that includes moment generating function and the rth moment thus generalizing some results in the literature. We also obtain expressions for the density, moment generating function and rth moment of the order statistics. We consider the maximum likelihood estimation to estimate the parameters. Additionally, the asymptotic confidence intervals for the parameters are derived from the Fisher information matrix. Finally, simulation study is carried at under varying sample size to assess the performance of this model. Illustration the real dataset indicates that this new distribution can serve as a good alternative model to model positive real data in many areas.

  17. SurfKin: an ab initio kinetic code for modeling surface reactions.

    PubMed

    Le, Thong Nguyen-Minh; Liu, Bin; Huynh, Lam K

    2014-10-05

    In this article, we describe a C/C++ program called SurfKin (Surface Kinetics) to construct microkinetic mechanisms for modeling gas-surface reactions. Thermodynamic properties of reaction species are estimated based on density functional theory calculations and statistical mechanics. Rate constants for elementary steps (including adsorption, desorption, and chemical reactions on surfaces) are calculated using the classical collision theory and transition state theory. Methane decomposition and water-gas shift reaction on Ni(111) surface were chosen as test cases to validate the code implementations. The good agreement with literature data suggests this is a powerful tool to facilitate the analysis of complex reactions on surfaces, and thus it helps to effectively construct detailed microkinetic mechanisms for such surface reactions. SurfKin also opens a possibility for designing nanoscale model catalysts. Copyright © 2014 Wiley Periodicals, Inc.

  18. Evolutionary diversification of the auditory organ sensilla in Neoconocephalus katydids (Orthoptera: Tettigoniidae) correlates with acoustic signal diversification over phylogenetic relatedness and life history.

    PubMed

    Strauß, J; Alt, J A; Ekschmitt, K; Schul, J; Lakes-Harlan, R

    2017-06-01

    Neoconocephalus Tettigoniidae are a model for the evolution of acoustic signals as male calls have diversified in temporal structure during the radiation of the genus. The call divergence and phylogeny in Neoconocephalus are established, but in tettigoniids in general, accompanying evolutionary changes in hearing organs are not studied. We investigated anatomical changes of the tympanal hearing organs during the evolutionary radiation and divergence of intraspecific acoustic signals. We compared the neuroanatomy of auditory sensilla (crista acustica) from nine Neoconocephalus species for the number of auditory sensilla and the crista acustica length. These parameters were correlated with differences in temporal call features, body size, life histories and different phylogenetic positions. By this, adaptive responses to shifting frequencies of male calls and changes in their temporal patterns can be evaluated against phylogenetic constraints and allometry. All species showed well-developed auditory sensilla, on average 32-35 between species. Crista acustica length and sensillum numbers correlated with body size, but not with phylogenetic position or life history. Statistically significant correlations existed also with specific call patterns: a higher number of auditory sensilla occurred in species with continuous calls or slow pulse rates, and a longer crista acustica occurred in species with double pulses or slow pulse rates. The auditory sensilla show significant differences between species despite their recent radiation, and morphological and ecological similarities. This indicates the responses to natural and sexual selection, including divergence of temporal and spectral signal properties. Phylogenetic constraints are unlikely to limit these changes of the auditory systems. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.

  19. Leveraging Genomic Annotations and Pleiotropic Enrichment for Improved Replication Rates in Schizophrenia GWAS

    PubMed Central

    Wang, Yunpeng; Thompson, Wesley K.; Schork, Andrew J.; Holland, Dominic; Chen, Chi-Hua; Bettella, Francesco; Desikan, Rahul S.; Li, Wen; Witoelar, Aree; Zuber, Verena; Devor, Anna; Nöthen, Markus M.; Rietschel, Marcella; Chen, Qiang; Werge, Thomas; Cichon, Sven; Weinberger, Daniel R.; Djurovic, Srdjan; O’Donovan, Michael; Visscher, Peter M.; Andreassen, Ole A.; Dale, Anders M.

    2016-01-01

    Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic (“z-score”) of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a “relative enrichment score” for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3. PMID:26808560

  20. Two new algorithms to combine kriging with stochastic modelling

    NASA Astrophysics Data System (ADS)

    Venema, Victor; Lindau, Ralf; Varnai, Tamas; Simmer, Clemens

    2010-05-01

    Two main groups of statistical methods used in the Earth sciences are geostatistics and stochastic modelling. Geostatistical methods, such as various kriging algorithms, aim at estimating the mean value for every point as well as possible. In case of sparse measurements, such fields have less variability at small scales and a narrower distribution as the true field. This can lead to biases if a nonlinear process is simulated driven by such a kriged field. Stochastic modelling aims at reproducing the statistical structure of the data in space and time. One of the stochastic modelling methods, the so-called surrogate data approach, replicates the value distribution and power spectrum of a certain data set. While stochastic methods reproduce the statistical properties of the data, the location of the measurement is not considered. This requires the use of so-called constrained stochastic models. Because radiative transfer through clouds is a highly nonlinear process, it is essential to model the distribution (e.g. of optical depth, extinction, liquid water content or liquid water path) accurately. In addition, the correlations within the cloud field are important, especially because of horizontal photon transport. This explains the success of surrogate cloud fields for use in 3D radiative transfer studies. Up to now, however, we could only achieve good results for the radiative properties averaged over the field, but not for a radiation measurement located at a certain position. Therefore we have developed a new algorithm that combines the accuracy of stochastic (surrogate) modelling with the positioning capabilities of kriging. In this way, we can automatically profit from the large geostatistical literature and software. This algorithm is similar to the standard iterative amplitude adjusted Fourier transform (IAAFT) algorithm, but has an additional iterative step in which the surrogate field is nudged towards the kriged field. The nudging strength is gradually reduced to zero during successive iterations. A second algorithm, which we call step-wise kriging, pursues the same aim. Each time the kriging algorithm estimates a value, noise is added to it, after which this new point is accounted for in the estimation of all the later points. In this way, the autocorrelation of the step-krigged field is close to that found in the pseudo measurements. The amount of noise is determined by the kriging uncertainty. The algorithms are tested on cloud fields from large eddy simulations (LES). On these clouds, a measurement is simulated. From these pseudo-measurements, we estimated the power spectrum for the surrogates, the semi-variogram for the (stepwise) kriging and the distribution. Furthermore, the pseudo-measurement is kriged. Because we work with LES clouds and the truth is known, we can validate the algorithm by performing 3D radiative transfer calculations on the original LES clouds and on the two new types of stochastic clouds. For comparison, also the radiative properties of the kriged fields and standard surrogate fields are computed. Preliminary results show that both algorithms reproduce the structure of the original clouds well, and the minima and maxima are located where the pseudo-measurements see them. The main problem for the quality of the structure and the root mean square error is the amount of data, which is especially very limited in case of just one zenith pointing measurement.

  1. Effect of call organization on burnout and quality of life in psychiatry residents.

    PubMed

    Scarella, Timothy M; Nelligan, Julia; Roberts, Jacqueline; Boland, Robert J

    2017-02-01

    We aimed to measure the effects of a residency program's mid-year shift from 24-h call to night float on resident burnout and quality of life. At the end of the year, residents who started the year with 24-h call had worse burnout and quality of life, with statistical significance and large effect sizes. Exposure to a twenty-four hour call system, when compared to a full year of night float, may be associated with increased burnout and decreased quality of life, though measuring this effect is not straightforward. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Time, frequency, and time-varying Granger-causality measures in neuroscience.

    PubMed

    Cekic, Sezen; Grandjean, Didier; Renaud, Olivier

    2018-05-20

    This article proposes a systematic methodological review and an objective criticism of existing methods enabling the derivation of time, frequency, and time-varying Granger-causality statistics in neuroscience. The capacity to describe the causal links between signals recorded at different brain locations during a neuroscience experiment is indeed of primary interest for neuroscientists, who often have very precise prior hypotheses about the relationships between recorded brain signals. The increasing interest and the huge number of publications related to this topic calls for this systematic review, which describes the very complex methodological aspects underlying the derivation of these statistics. In this article, we first present a general framework that allows us to review and compare Granger-causality statistics in the time domain, and the link with transfer entropy. Then, the spectral and the time-varying extensions are exposed and discussed together with their estimation and distributional properties. Although not the focus of this article, partial and conditional Granger causality, dynamical causal modelling, directed transfer function, directed coherence, partial directed coherence, and their variant are also mentioned. Copyright © 2018 John Wiley & Sons, Ltd.

  3. Publication Bias ( The "File-Drawer Problem") in Scientific Inference

    NASA Technical Reports Server (NTRS)

    Scargle, Jeffrey D.; DeVincenzi, Donald (Technical Monitor)

    1999-01-01

    Publication bias arises whenever the probability that a study is published depends on the statistical significance of its results. This bias, often called the file-drawer effect since the unpublished results are imagined to be tucked away in researchers' file cabinets, is potentially a severe impediment to combining the statistical results of studies collected from the literature. With almost any reasonable quantitative model for publication bias, only a small number of studies lost in the file-drawer will produce a significant bias. This result contradicts the well known Fail Safe File Drawer (FSFD) method for setting limits on the potential harm of publication bias, widely used in social, medical and psychic research. This method incorrectly treats the file drawer as unbiased, and almost always miss-estimates the seriousness of publication bias. A large body of not only psychic research, but medical and social science studies, has mistakenly relied on this method to validate claimed discoveries. Statistical combination can be trusted only if it is known with certainty that all studies that have been carried out are included. Such certainty is virtually impossible to achieve in literature surveys.

  4. PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides.

    PubMed

    Jacob, Laurent; Combes, Florence; Burger, Thomas

    2018-06-18

    We propose a new hypothesis test for the differential abundance of proteins in mass-spectrometry based relative quantification. An important feature of this type of high-throughput analyses is that it involves an enzymatic digestion of the sample proteins into peptides prior to identification and quantification. Due to numerous homology sequences, different proteins can lead to peptides with identical amino acid chains, so that their parent protein is ambiguous. These so-called shared peptides make the protein-level statistical analysis a challenge and are often not accounted for. In this article, we use a linear model describing peptide-protein relationships to build a likelihood ratio test of differential abundance for proteins. We show that the likelihood ratio statistic can be computed in linear time with the number of peptides. We also provide the asymptotic null distribution of a regularized version of our statistic. Experiments on both real and simulated datasets show that our procedures outperforms state-of-the-art methods. The procedures are available via the pepa.test function of the DAPAR Bioconductor R package.

  5. Inverse problems-based maximum likelihood estimation of ground reflectivity for selected regions of interest from stripmap SAR data [Regularized maximum likelihood estimation of ground reflectivity from stripmap SAR data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    West, R. Derek; Gunther, Jacob H.; Moon, Todd K.

    In this study, we derive a comprehensive forward model for the data collected by stripmap synthetic aperture radar (SAR) that is linear in the ground reflectivity parameters. It is also shown that if the noise model is additive, then the forward model fits into the linear statistical model framework, and the ground reflectivity parameters can be estimated by statistical methods. We derive the maximum likelihood (ML) estimates for the ground reflectivity parameters in the case of additive white Gaussian noise. Furthermore, we show that obtaining the ML estimates of the ground reflectivity requires two steps. The first step amounts tomore » a cross-correlation of the data with a model of the data acquisition parameters, and it is shown that this step has essentially the same processing as the so-called convolution back-projection algorithm. The second step is a complete system inversion that is capable of mitigating the sidelobes of the spatially variant impulse responses remaining after the correlation processing. We also state the Cramer-Rao lower bound (CRLB) for the ML ground reflectivity estimates.We show that the CRLB is linked to the SAR system parameters, the flight path of the SAR sensor, and the image reconstruction grid.We demonstrate the ML image formation and the CRLB bound for synthetically generated data.« less

  6. Predicting changes of glass optical properties in polluted atmospheric environment by a neural network model

    NASA Astrophysics Data System (ADS)

    Verney-Carron, A.; Dutot, A. L.; Lombardo, T.; Chabas, A.

    2012-07-01

    Soiling results from the deposition of pollutants on materials. On glass, it leads to an alteration of its intrinsic optical properties. The nature and intensity of this phenomenon mirrors the pollution of an environment. This paper proposes a new statistical model in order to predict the evolution of haze (H) (i.e. diffuse/direct transmitted light ratio) as a function of time and major pollutant concentrations in the atmosphere (SO2, NO2, and PM10 (Particulate Matter < 10 μm)). The model was parameterized by using a large set of data collected in European cities (especially, Paris and its suburbs, Athens, Krakow, Prague, and Rome) during field exposure campaigns (French, European, and international programs). This statistical model, called NEUROPT-Glass, comes from an artificial neural network with two hidden layers and uses a non-linear parametric regression named Multilayer Perceptron (MLP). The results display a high determination coefficient (R2 = 0.88) between the measured and the predicted hazes and minimizes the dispersion of data compared to existing multilinear dose-response functions. Therefore, this model can be used with a great confidence in order to predict the soiling of glass as a function of time in world cities with different levels of pollution or to assess the effect of pollution reduction policies on glass soiling problems in urban environments.

  7. Inverse problems-based maximum likelihood estimation of ground reflectivity for selected regions of interest from stripmap SAR data [Regularized maximum likelihood estimation of ground reflectivity from stripmap SAR data

    DOE PAGES

    West, R. Derek; Gunther, Jacob H.; Moon, Todd K.

    2016-12-01

    In this study, we derive a comprehensive forward model for the data collected by stripmap synthetic aperture radar (SAR) that is linear in the ground reflectivity parameters. It is also shown that if the noise model is additive, then the forward model fits into the linear statistical model framework, and the ground reflectivity parameters can be estimated by statistical methods. We derive the maximum likelihood (ML) estimates for the ground reflectivity parameters in the case of additive white Gaussian noise. Furthermore, we show that obtaining the ML estimates of the ground reflectivity requires two steps. The first step amounts tomore » a cross-correlation of the data with a model of the data acquisition parameters, and it is shown that this step has essentially the same processing as the so-called convolution back-projection algorithm. The second step is a complete system inversion that is capable of mitigating the sidelobes of the spatially variant impulse responses remaining after the correlation processing. We also state the Cramer-Rao lower bound (CRLB) for the ML ground reflectivity estimates.We show that the CRLB is linked to the SAR system parameters, the flight path of the SAR sensor, and the image reconstruction grid.We demonstrate the ML image formation and the CRLB bound for synthetically generated data.« less

  8. Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation

    PubMed Central

    Palmer, Cameron; Pe’er, Itsik

    2016-01-01

    Missing data are an unavoidable component of modern statistical genetics. Different array or sequencing technologies cover different single nucleotide polymorphisms (SNPs), leading to a complicated mosaic pattern of missingness where both individual genotypes and entire SNPs are sporadically absent. Such missing data patterns cannot be ignored without introducing bias, yet cannot be inferred exclusively from nonmissing data. In genome-wide association studies, the accepted solution to missingness is to impute missing data using external reference haplotypes. The resulting probabilistic genotypes may be analyzed in the place of genotype calls. A general-purpose paradigm, called Multiple Imputation (MI), is known to model uncertainty in many contexts, yet it is not widely used in association studies. Here, we undertake a systematic evaluation of existing imputed data analysis methods and MI. We characterize biases related to uncertainty in association studies, and find that bias is introduced both at the imputation level, when imputation algorithms generate inconsistent genotype probabilities, and at the association level, when analysis methods inadequately model genotype uncertainty. We find that MI performs at least as well as existing methods or in some cases much better, and provides a straightforward paradigm for adapting existing genotype association methods to uncertain data. PMID:27310603

  9. Chaos as an intermittently forced linear system.

    PubMed

    Brunton, Steven L; Brunton, Bingni W; Proctor, Joshua L; Kaiser, Eurika; Kutz, J Nathan

    2017-05-30

    Understanding the interplay of order and disorder in chaos is a central challenge in modern quantitative science. Approximate linear representations of nonlinear dynamics have long been sought, driving considerable interest in Koopman theory. We present a universal, data-driven decomposition of chaos as an intermittently forced linear system. This work combines delay embedding and Koopman theory to decompose chaotic dynamics into a linear model in the leading delay coordinates with forcing by low-energy delay coordinates; this is called the Hankel alternative view of Koopman (HAVOK) analysis. This analysis is applied to the Lorenz system and real-world examples including Earth's magnetic field reversal and measles outbreaks. In each case, forcing statistics are non-Gaussian, with long tails corresponding to rare intermittent forcing that precedes switching and bursting phenomena. The forcing activity demarcates coherent phase space regions where the dynamics are approximately linear from those that are strongly nonlinear.The huge amount of data generated in fields like neuroscience or finance calls for effective strategies that mine data to reveal underlying dynamics. Here Brunton et al.develop a data-driven technique to analyze chaotic systems and predict their dynamics in terms of a forced linear model.

  10. Boosting Bayesian parameter inference of nonlinear stochastic differential equation models by Hamiltonian scale separation.

    PubMed

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.

  11. On the selection of ordinary differential equation models with application to predator-prey dynamical models.

    PubMed

    Zhang, Xinyu; Cao, Jiguo; Carroll, Raymond J

    2015-03-01

    We consider model selection and estimation in a context where there are competing ordinary differential equation (ODE) models, and all the models are special cases of a "full" model. We propose a computationally inexpensive approach that employs statistical estimation of the full model, followed by a combination of a least squares approximation (LSA) and the adaptive Lasso. We show the resulting method, here called the LSA method, to be an (asymptotically) oracle model selection method. The finite sample performance of the proposed LSA method is investigated with Monte Carlo simulations, in which we examine the percentage of selecting true ODE models, the efficiency of the parameter estimation compared to simply using the full and true models, and coverage probabilities of the estimated confidence intervals for ODE parameters, all of which have satisfactory performances. Our method is also demonstrated by selecting the best predator-prey ODE to model a lynx and hare population dynamical system among some well-known and biologically interpretable ODE models. © 2014, The International Biometric Society.

  12. Expert Systems on Multiprocessor Architectures. Volume 4. Technical Reports

    DTIC Science & Technology

    1991-06-01

    Floated-Current-Time0 -> The time that this function is called in user time uflts, expressed as a floating point number. Halt- Poligono Arrests the...default a statistics file will be printed out, if it can be. To prevent this make No-Statistics true. Unhalt- Poligono Unarrests the process in which the

  13. 78 FR 30922 - Agency Information Collection Activities: Submission for OMB Review; Joint Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-05-23

    ..., Division of Research and Statistics, Board of Governors of the Federal Reserve System, 20th and C Streets... individual institutions and the industry as a whole. Call Report data provide the most current statistical data available for evaluating institutions' corporate applications, identifying areas of focus for on...

  14. Hispanics in 1979--A Statistical Appraisal.

    ERIC Educational Resources Information Center

    Martinez, Douglas R.

    1979-01-01

    Public Law 94-311, passed by Congress in 1976, called for the expansion of statistics reflecting the socioeconomic status of Hispanics. The article discusses the state of the Hispanic community in early 1979, one agency's difficulties in implementing the mandates of P.L. 94-311, and the status of other agencies' work on the law's requirements. (NQ)

  15. 75 FR 29517 - Pacific Fishery Management Council; Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-26

    ... Remarks and Introductions 2. Roll Call 3. Executive Director's Report 4. Approve Agenda B. Groundfish... and Statistical Committee 8 a.m Habitat Committee 8:30 a.m Budget Committee 1:15 p.m Saturday, June 12... Statistical Committee 8 a.m Council Chair's Reception 6 p.m Sunday, June 13, 2010 California State Delegation...

  16. 75 FR 51240 - Pacific Fishery Management Council; Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-19

    ... Remarks and Introductions 2. Council Member Appointments 3. Roll Call 4. Executive Director's Report 5... Technical Team/Salmon Amendment Committee Joint 8 a.m. Session Scientific and Statistical Committee 8 a.m.... Salmon Technical Team 8 a.m. Scientific and Statistical Committee 8 a.m. Enforcement Consultants 4:30 p.m...

  17. Statistical Literacy for Active Citizenship: A Call for Data Science Education

    ERIC Educational Resources Information Center

    Engel, Joachim

    2017-01-01

    Data are abundant, quantitative information about the state of society and the wider world is around us more than ever. Paradoxically, recent trends in the public discourse point towards a post-factual world that seems content to ignore or misrepresent empirical evidence. As statistics educators we are challenged to promote understanding of…

  18. A bilayer Double Semion Model with Symmetry-Enriched Topological Order

    NASA Astrophysics Data System (ADS)

    Ortiz, Laura; Martin-Delgado, Miguel Angel

    We construct a new model of two-dimensional quantum spin systems that combines intrinsic topological orders and a global symmetry called flavour symmetry. It is referred as the bilayer Doubled Semion model (bDS) and is an instance of symmetry-enriched topological order. A honeycomb bilayer lattice is introduced to combine a Double Semion Topolgical Order with a global spin-flavour symmetry to get the fractionalization of its quasiparticles. The bDS model exhibits non-trival braiding self-statistics of excitations and its dual model constitutes a Symmetry-Protected Topological Order with novel edge states. This dual model gives rise to a bilayer Non-Trivial Paramagnet that is invariant under the flavour symmetry and the well-known spin flip symmetry. We acknowledge financial support from the Spanish MINECO Grants FIS2012-33152, FIS2015-67411, and the CAM research consortium QUITEMAD+, Grant No. S2013/ICE-2801. The research of M.A.M.-D. has been supported in part by the U.S. Army Research Office throu.

  19. Guide to Using Onionskin Analysis Code (U)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fugate, Michael Lynn; Morzinski, Jerome Arthur

    2016-09-15

    This document is a guide to using R-code written for the purpose of analyzing onionskin experiments. We expect the user to be very familiar with statistical methods and the R programming language. For more details about onionskin experiments and the statistical methods mentioned in this document see Storlie, Fugate, et al. (2013). Engineers at LANL experiment with detonators and high explosives to assess performance. The experimental unit, called an onionskin, is a hemisphere consisting of a detonator and a booster pellet surrounded by explosive material. When the detonator explodes, a streak camera mounted above the pole of the hemisphere recordsmore » when the shock wave arrives at the surface. The output from the camera is a two-dimensional image that is transformed into a curve that shows the arrival time as a function of polar angle. The statistical challenge is to characterize a baseline population of arrival time curves and to compare the baseline curves to curves from a new, so-called, test series. The hope is that the new test series of curves is statistically similar to the baseline population.« less

  20. Educator and participant perceptions and cost analysis of stage-tailored educational telephone calls.

    PubMed

    Esters, Onikia N; Boeckner, Linda S; Hubert, Melanie; Horacek, Tanya; Kritsch, Karen R; Oakland, Mary J; Lohse, Barbara; Greene, Geoffrey; Nitzke, Susan

    2008-01-01

    To identify strengths and weaknesses of nutrition education via telephone calls as part of a larger stage-of-change tailored intervention with mailed materials. Evaluative feedback was elicited from educators who placed the calls and respondents who received the calls. An internet and telephone survey of 10 states in the midwestern United States. 21 educators in 10 states reached via the internet and 50 young adults reached via telephone. VARIABLES MEASURED AND ANALYSIS: Rankings of intervention components, ratings of key aspects of educational calls, and cost data (as provided by a lead researcher in each state) were summarized via descriptive statistics. RESULTS, CONCLUSIONS, AND IMPLICATIONS: Educational calls used 6 to 17 minutes of preparation time, required 8 to 15 minutes of contact time, and had a mean estimated cost of $5.82 per call. Low-income young adults favored print materials over educational calls. However, the calls were reported to have positive effects on motivating participants to set goals. Educators who use educational telephone calls to reach young adults, a highly mobile target audience, may require a robust and flexible contact plan.

  1. Cold Calling and Web Postings: Do They Improve Students' Preparation and Learning in Statistics?

    ERIC Educational Resources Information Center

    Levy, Dan

    2014-01-01

    Getting students to prepare well for class is a common challenge faced by instructors all over the world. This study investigates the effects that two frequently used techniques to increase student preparation--web postings and cold calling--have on student outcomes. The study is based on two experiments and a qualitative study conducted in a…

  2. No Place to Call Home: Child & Youth Homelessness in the United States. Poverty Fact Sheet

    ERIC Educational Resources Information Center

    Damron, Neil

    2015-01-01

    "No Place to Call Home: Child and Youth Homelessness in the United States," prepared by intern Neil Damron and released in May 2015, presents the statistics on child and youth homelessness and recent trends in Wisconsin and the United States. It explores the major challenges faced by homeless minors, and, drawing from recent research by…

  3. Preventive Intervention in Families at Risk: The Limits of Liberalism

    ERIC Educational Resources Information Center

    Snik, Ger; De Jong, Johan; Van Haaften, Wouter

    2004-01-01

    There is an increasing call for preventive state interventions in so-called families at risk - that is, interventions before any overt harm has been done by parents to their children or by the children to a third party, in families that are statistically known to be liable to harm children. One of the basic principles of liberal morality, however,…

  4. Offset Stream Technology Test-Summary of Results

    NASA Technical Reports Server (NTRS)

    Brown, Clifford A.; Bridges, James E.; Henderson, Brenda

    2007-01-01

    Statistical jet noise prediction codes that accurately predict spectral directivity for both cold and hot jets are highly sought both in industry and academia. Their formulation, whether based upon manipulations of the Navier-Stokes equations or upon heuristic arguments, require substantial experimental observation of jet turbulence statistics. Unfortunately, the statistics of most interest involve the space-time correlation of flow quantities, especially velocity. Until the last 10 years, all turbulence statistics were made with single-point probes, such as hotwires or laser Doppler anemometry. Particle image velocimetry (PIV) brought many new insights with its ability to measure velocity fields over large regions of jets simultaneously; however, it could not measure velocity at rates higher than a few fields per second, making it unsuitable for obtaining temporal spectra and correlations. The development of time-resolved PIV, herein called TR-PIV, has removed this limitation, enabling measurement of velocity fields at high resolution in both space and time. In this paper, ground-breaking results from the application of TR-PIV to single-flow hot jets are used to explore the impact of heat on turbulent statistics of interest to jet noise models. First, a brief summary of validation studies is reported, undertaken to show that the new technique produces the same trusted results as hotwire at cold, low-speed jets. Second, velocity spectra from cold and hot jets are compared to see the effect of heat on the spectra. It is seen that heated jets possess 10 percent more turbulence intensity compared to the unheated jets with the same velocity. The spectral shapes, when normalized using Strouhal scaling, are insensitive to temperature if the stream-wise location is normalized relative to the potential core length. Similarly, second order velocity correlations, of interest in modeling of jet noise sources, are also insensitive to temperature as well.

  5. Effect of Temperature on Jet Velocity Spectra

    NASA Technical Reports Server (NTRS)

    Bridges, James E.; Wernet, Mark P.

    2007-01-01

    Statistical jet noise prediction codes that accurately predict spectral directivity for both cold and hot jets are highly sought both in industry and academia. Their formulation, whether based upon manipulations of the Navier-Stokes equations or upon heuristic arguments, require substantial experimental observation of jet turbulence statistics. Unfortunately, the statistics of most interest involve the space-time correlation of flow quantities, especially velocity. Until the last 10 years, all turbulence statistics were made with single-point probes, such as hotwires or laser Doppler anemometry. Particle image velocimetry (PIV) brought many new insights with its ability to measure velocity fields over large regions of jets simultaneously; however, it could not measure velocity at rates higher than a few fields per second, making it unsuitable for obtaining temporal spectra and correlations. The development of time-resolved PIV, herein called TR-PIV, has removed this limitation, enabling measurement of velocity fields at high resolution in both space and time. In this paper, ground-breaking results from the application of TR-PIV to single-flow hot jets are used to explore the impact of heat on turbulent statistics of interest to jet noise models. First, a brief summary of validation studies is reported, undertaken to show that the new technique produces the same trusted results as hotwire at cold, low-speed jets. Second, velocity spectra from cold and hot jets are compared to see the effect of heat on the spectra. It is seen that heated jets possess 10 percent more turbulence intensity compared to the unheated jets with the same velocity. The spectral shapes, when normalized using Strouhal scaling, are insensitive to temperature if the stream-wise location is normalized relative to the potential core length. Similarly, second order velocity correlations, of interest in modeling of jet noise sources, are also insensitive to temperature as well.

  6. A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion

    NASA Astrophysics Data System (ADS)

    Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em

    2017-09-01

    Exascale-level simulations require fault-resilient algorithms that are robust against repeated and expected software and/or hardware failures during computations, which may render the simulation results unsatisfactory. If each processor can share some global information about the simulation from a coarse, limited accuracy but relatively costless auxiliary simulator we can effectively fill-in the missing spatial data at the required times by a statistical learning technique - multi-level Gaussian process regression, on the fly; this has been demonstrated in previous work [1]. Based on the previous work, we also employ another (nonlinear) statistical learning technique, Diffusion Maps, that detects computational redundancy in time and hence accelerate the simulation by projective time integration, giving the overall computation a "patch dynamics" flavor. Furthermore, we are now able to perform information fusion with multi-fidelity and heterogeneous data (including stochastic data). Finally, we set the foundations of a new framework in CFD, called patch simulation, that combines information fusion techniques from, in principle, multiple fidelity and resolution simulations (and even experiments) with a new adaptive timestep refinement technique. We present two benchmark problems (the heat equation and the Navier-Stokes equations) to demonstrate the new capability that statistical learning tools can bring to traditional scientific computing algorithms. For each problem, we rely on heterogeneous and multi-fidelity data, either from a coarse simulation of the same equation or from a stochastic, particle-based, more "microscopic" simulation. We consider, as such "auxiliary" models, a Monte Carlo random walk for the heat equation and a dissipative particle dynamics (DPD) model for the Navier-Stokes equations. More broadly, in this paper we demonstrate the symbiotic and synergistic combination of statistical learning, domain decomposition, and scientific computing in exascale simulations.

  7. Instability of political preferences and the role of mass media: a dynamical representation in a quantum framework.

    PubMed

    Khrennikova, Polina; Haven, Emmanuel

    2016-01-13

    We search to devise a new paradigm borrowed from concepts and mathematical tools of quantum physics, to model the decision-making process of the US electorate. The statistical data of the election outcomes in the period between 2008 and 2014 is analysed, in order to explore in more depth the emergence of the so-called divided government. There is an increasing urge in the political literature which indicates that preference reversal (strictly speaking the violation of the transitivity axiom) is a consequence of the so-called non-separability phenomenon (i.e. a strong interrelation of choices). In the political science literature, non-separable behaviour is characterized by a conditioning of decisions on the outcomes of some issues of interest. An additional source of preference reversal is ascribed to the time dynamics of the voters' cognitive states, in the context of new upcoming political information. As we discuss in this paper, the primary source of political information can be attributed to the mass media. In order to shed more light on the phenomenon of preference reversal among the US electorate, we accommodate the obtained statistical data in a classical probabilistic (Kolmogorovian) scheme. Based on the obtained results, we attribute the strong ties between the voters non-separable decisions that cannot be explained by conditioning with the Bayes scheme, to the quantum phenomenon of entanglement. Second, we compute the degree of interference of voters' belief states with the aid of the quantum analogue of the formula of total probability. Lastly, a model, based on the quantum master equation, to incorporate the impact of the mass media bath is proposed. © 2015 The Author(s).

  8. Site specific passive acoustic detection and densities of humpback whale calls off the coast of California

    NASA Astrophysics Data System (ADS)

    Helble, Tyler Adam

    Passive acoustic monitoring of marine mammal calls is an increasingly important method for assessing population numbers, distribution, and behavior. Automated methods are needed to aid in the analyses of the recorded data. When a mammal vocalizes in the marine environment, the received signal is a filtered version of the original waveform emitted by the marine mammal. The waveform is reduced in amplitude and distorted due to propagation effects that are influenced by the bathymetry and environment. It is important to account for these effects to determine a site-specific probability of detection for marine mammal calls in a given study area. A knowledge of that probability function over a range of environmental and ocean noise conditions allows vocalization statistics from recordings of single, fixed, omnidirectional sensors to be compared across sensors and at the same sensor over time with less bias and uncertainty in the results than direct comparison of the raw statistics. This dissertation focuses on both the development of new tools needed to automatically detect humpback whale vocalizations from single-fixed omnidirectional sensors as well as the determination of the site-specific probability of detection for monitoring sites off the coast of California. Using these tools, detected humpback calls are "calibrated" for environmental properties using the site-specific probability of detection values, and presented as call densities (calls per square kilometer per time). A two-year monitoring effort using these calibrated call densities reveals important biological and ecological information on migrating humpback whales off the coast of California. Call density trends are compared between the monitoring sites and at the same monitoring site over time. Call densities also are compared to several natural and human-influenced variables including season, time of day, lunar illumination, and ocean noise. The results reveal substantial differences in call densities between the two sites which were not noticeable using uncorrected (raw) call counts. Additionally, a Lombard effect was observed for humpback whale vocalizations in response to increasing ocean noise. The results presented in this thesis develop techniques to accurately measure marine mammal abundances from passive acoustic sensors.

  9. A Review of the Historical, Criminological, and Theoretical Understandings of the Cambodian American Population: A Call for More Comprehensive Research.

    PubMed

    Chheang, Dany; Connolly, Eric J

    2017-09-01

    The collective view of Asian Americans as model minorities is evident with the extensive amount of statistical data showing support for the academic and socioeconomic success of Asian Americans in the United States. This perception, however, often presents an inaccurate portrayal of Asian Americans, in general, as it overlooks many of the difficulties and hardships experienced by Asian American ethnic groups such as Southeast Asians. Within this group, Cambodian Americans are at the highest risk for experiencing socioeconomic hardships, behavioral health problems, substance use disorders, and contact with the criminal justice system, with deportation also being a prevailing issue. Unfortunately, research in this area is scant and contemporary research on Cambodian Americans has several limitations. To begin to address this issue, the present article merges information from existing research on this population from a sociohistorical, criminological, and theoretical standpoint to call for more comprehensive research on Cambodian Americans.

  10. Social Communication and Vocal Recognition in Free-Ranging Rhesus Monkeys

    NASA Astrophysics Data System (ADS)

    Rendall, Christopher Andrew

    Kinship and individual identity are key determinants of primate sociality, and the capacity for vocal recognition of individuals and kin is hypothesized to be an important adaptation facilitating intra-group social communication. Research was conducted on adult female rhesus monkeys on Cayo Santiago, Puerto Rico to test this hypothesis for three acoustically distinct calls characterized by varying selective pressures on communicating identity: coos (contact calls), grunts (close range social calls), and noisy screams (agonistic recruitment calls). Vocalization playback experiments confirmed a capacity for both individual and kin recognition of coos, but not screams (grunts were not tested). Acoustic analyses, using traditional spectrographic methods as well as linear predictive coding techniques, indicated that coos (but not grunts or screams) were highly distinctive, and that the effects of vocal tract filtering--formants --contributed more to statistical discriminations of both individuals and kin groups than did temporal or laryngeal source features. Formants were identified from very short (23 ms.) segments of coos and were stable within calls, indicating that formant cues to individual and kin identity were available throughout a call. This aspect of formant cues is predicted to be an especially important design feature for signaling identity efficiently in complex acoustic environments. Results of playback experiments involving manipulated coo stimuli provided preliminary perceptual support for the statistical inference that formant cues take precedence in facilitating vocal recognition. The similarity of formants among female kin suggested a mechanism for the development of matrilineal vocal signatures from the genetic and environmental determinants of vocal tract morphology shared among relatives. The fact that screams --calls strongly expected to communicate identity--were not individually distinctive nor recognized suggested the possibility that their acoustic structure and role in signaling identity might be constrained by functional or morphological design requirements associated with their role in signaling submission.

  11. Sex-specific developmental models for Creophilus maxillosus (L.) (Coleoptera: Staphylinidae): searching for larger accuracy of insect age estimates.

    PubMed

    Frątczak-Łagiewska, Katarzyna; Matuszewski, Szymon

    2018-05-01

    Differences in size between males and females, called the sexual size dimorphism, are common in insects. These differences may be followed by differences in the duration of development. Accordingly, it is believed that insect sex may be used to increase the accuracy of insect age estimates in forensic entomology. Here, the sex-specific differences in the development of Creophilus maxillosus were studied at seven constant temperatures. We have also created separate developmental models for males and females of C. maxillosus and tested them in a validation study to answer a question whether sex-specific developmental models improve the accuracy of insect age estimates. Results demonstrate that males of C. maxillosus developed significantly longer than females. The sex-specific and general models for the total immature development had the same optimal temperature range and similar developmental threshold but different thermal constant K, which was the largest in the case of the male-specific model and the smallest in the case of the female-specific model. Despite these differences, validation study revealed just minimal and statistically insignificant differences in the accuracy of age estimates using sex-specific and general thermal summation models. This finding indicates that in spite of statistically significant differences in the duration of immature development between females and males of C. maxillosus, there is no increase in the accuracy of insect age estimates while using the sex-specific thermal summation models compared to the general model. Accordingly, this study does not support the use of sex-specific developmental data for the estimation of insect age in forensic entomology.

  12. An Alternative Approach to Analyze Ipsative Data. Revisiting Experiential Learning Theory.

    PubMed

    Batista-Foguet, Joan M; Ferrer-Rosell, Berta; Serlavós, Ricard; Coenders, Germà; Boyatzis, Richard E

    2015-01-01

    The ritualistic use of statistical models regardless of the type of data actually available is a common practice across disciplines which we dare to call type zero error. Statistical models involve a series of assumptions whose existence is often neglected altogether, this is specially the case with ipsative data. This paper illustrates the consequences of this ritualistic practice within Kolb's Experiential Learning Theory (ELT) operationalized through its Learning Style Inventory (KLSI). We show how using a well-known methodology in other disciplines-compositional data analysis (CODA) and log ratio transformations-KLSI data can be properly analyzed. In addition, the method has theoretical implications: a third dimension of the KLSI is unveiled providing room for future research. This third dimension describes an individual's relative preference for learning by prehension rather than by transformation. Using a sample of international MBA students, we relate this dimension with another self-assessment instrument, the Philosophical Orientation Questionnaire (POQ), and with an observer-assessed instrument, the Emotional and Social Competency Inventory (ESCI-U). Both show plausible statistical relationships. An intellectual operating philosophy (IOP) is linked to a preference for prehension, whereas a pragmatic operating philosophy (POP) is linked to transformation. Self-management and social awareness competencies are linked to a learning preference for transforming knowledge, whereas relationship management and cognitive competencies are more related to approaching learning by prehension.

  13. An Alternative Approach to Analyze Ipsative Data. Revisiting Experiential Learning Theory

    PubMed Central

    Batista-Foguet, Joan M.; Ferrer-Rosell, Berta; Serlavós, Ricard; Coenders, Germà; Boyatzis, Richard E.

    2015-01-01

    The ritualistic use of statistical models regardless of the type of data actually available is a common practice across disciplines which we dare to call type zero error. Statistical models involve a series of assumptions whose existence is often neglected altogether, this is specially the case with ipsative data. This paper illustrates the consequences of this ritualistic practice within Kolb's Experiential Learning Theory (ELT) operationalized through its Learning Style Inventory (KLSI). We show how using a well-known methodology in other disciplines—compositional data analysis (CODA) and log ratio transformations—KLSI data can be properly analyzed. In addition, the method has theoretical implications: a third dimension of the KLSI is unveiled providing room for future research. This third dimension describes an individual's relative preference for learning by prehension rather than by transformation. Using a sample of international MBA students, we relate this dimension with another self-assessment instrument, the Philosophical Orientation Questionnaire (POQ), and with an observer-assessed instrument, the Emotional and Social Competency Inventory (ESCI-U). Both show plausible statistical relationships. An intellectual operating philosophy (IOP) is linked to a preference for prehension, whereas a pragmatic operating philosophy (POP) is linked to transformation. Self-management and social awareness competencies are linked to a learning preference for transforming knowledge, whereas relationship management and cognitive competencies are more related to approaching learning by prehension. PMID:26617561

  14. A Statistical Bias Correction Tool for Generating Climate Change Scenarios in Indonesia based on CMIP5 Datasets

    NASA Astrophysics Data System (ADS)

    Faqih, A.

    2017-03-01

    Providing information regarding future climate scenarios is very important in climate change study. The climate scenario can be used as basic information to support adaptation and mitigation studies. In order to deliver future climate scenarios over specific region, baseline and projection data from the outputs of global climate models (GCM) is needed. However, due to its coarse resolution, the data have to be downscaled and bias corrected in order to get scenario data with better spatial resolution that match the characteristics of the observed data. Generating this downscaled data is mostly difficult for scientist who do not have specific background, experience and skill in dealing with the complex data from the GCM outputs. In this regards, it is necessary to develop a tool that can be used to simplify the downscaling processes in order to help scientist, especially in Indonesia, for generating future climate scenario data that can be used for their climate change-related studies. In this paper, we introduce a tool called as “Statistical Bias Correction for Climate Scenarios (SiBiaS)”. The tool is specially designed to facilitate the use of CMIP5 GCM data outputs and process their statistical bias corrections relative to the reference data from observations. It is prepared for supporting capacity building in climate modeling in Indonesia as part of the Indonesia 3rd National Communication (TNC) project activities.

  15. A model for family-based case-control studies of genetic imprinting and epistasis.

    PubMed

    Li, Xin; Sui, Yihan; Liu, Tian; Wang, Jianxin; Li, Yongci; Lin, Zhenwu; Hegarty, John; Koltun, Walter A; Wang, Zuoheng; Wu, Rongling

    2014-11-01

    Genetic imprinting, or called the parent-of-origin effect, has been recognized to play an important role in the formation and pathogenesis of human diseases. Although the epigenetic mechanisms that establish genetic imprinting have been a focus of many genetic studies, our knowledge about the number of imprinting genes and their chromosomal locations and interactions with other genes is still scarce, limiting precise inference of the genetic architecture of complex diseases. In this article, we present a statistical model for testing and estimating the effects of genetic imprinting on complex diseases using a commonly used case-control design with family structure. For each subject sampled from a case and control population, we not only genotype its own single nucleotide polymorphisms (SNPs) but also collect its parents' genotypes. By tracing the transmission pattern of SNP alleles from parental to offspring generation, the model allows the characterization of genetic imprinting effects based on Pearson tests of a 2 × 2 contingency table. The model is expanded to test the interactions between imprinting effects and additive, dominant and epistatic effects in a complex web of genetic interactions. Statistical properties of the model are investigated, and its practical usefulness is validated by a real data analysis. The model will provide a useful tool for genome-wide association studies aimed to elucidate the picture of genetic control over complex human diseases. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  16. Performance Analysis of Garbage Collection and Dynamic Reordering in a Lisp System. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Llames, Rene Lim

    1991-01-01

    Generation based garbage collection and dynamic reordering of objects are two techniques for improving the efficiency of memory management in Lisp and similar dynamic language systems. An analysis of the effect of generation configuration is presented, focusing on the effect of a number of generations and generation capabilities. Analytic timing and survival models are used to represent garbage collection runtime and to derive structural results on its behavior. The survival model provides bounds on the age of objects surviving a garbage collection at a particular level. Empirical results show that execution time is most sensitive to the capacity of the youngest generation. A technique called scanning for transport statistics, for evaluating the effectiveness of reordering independent of main memory size, is presented.

  17. Phenomenological approach to mechanical damage growth analysis.

    PubMed

    Pugno, Nicola; Bosia, Federico; Gliozzi, Antonio S; Delsanto, Pier Paolo; Carpinteri, Alberto

    2008-10-01

    The problem of characterizing damage evolution in a generic material is addressed with the aim of tracing it back to existing growth models in other fields of research. Based on energetic considerations, a system evolution equation is derived for a generic damage indicator describing a material system subjected to an increasing external stress. The latter is found to fit into the framework of a recently developed phenomenological universality (PUN) approach and, more specifically, the so-called U2 class. Analytical results are confirmed by numerical simulations based on a fiber-bundle model and statistically assigned local strengths at the microscale. The fits with numerical data prove, with an excellent degree of reliability, that the typical evolution of the damage indicator belongs to the aforementioned PUN class. Applications of this result are briefly discussed and suggested.

  18. Associative memory model for searching an image database by image snippet

    NASA Astrophysics Data System (ADS)

    Khan, Javed I.; Yun, David Y.

    1994-09-01

    This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lieou, Charles K. C.; Daub, Eric G.; Guyer, Robert A.

    In this paper, we model laboratory earthquakes in a biaxial shear apparatus using the Shear-Transformation-Zone (STZ) theory of dense granular flow. The theory is based on the observation that slip events in a granular layer are attributed to grain rearrangement at soft spots called STZs, which can be characterized according to principles of statistical physics. We model lab data on granular shear using STZ theory and document direct connections between the STZ approach and rate-and-state friction. We discuss the stability transition from stable shear to stick-slip failure and show that stick slip is predicted by STZ when the applied shearmore » load exceeds a threshold value that is modulated by elastic stiffness and frictional rheology. Finally, we also show that STZ theory mimics fault zone dilation during the stick phase, consistent with lab observations.« less

  20. Spatial Ensemble Postprocessing of Precipitation Forecasts Using High Resolution Analyses

    NASA Astrophysics Data System (ADS)

    Lang, Moritz N.; Schicker, Irene; Kann, Alexander; Wang, Yong

    2017-04-01

    Ensemble prediction systems are designed to account for errors or uncertainties in the initial and boundary conditions, imperfect parameterizations, etc. However, due to sampling errors and underestimation of the model errors, these ensemble forecasts tend to be underdispersive, and to lack both reliability and sharpness. To overcome such limitations, statistical postprocessing methods are commonly applied to these forecasts. In this study, a full-distributional spatial post-processing method is applied to short-range precipitation forecasts over Austria using Standardized Anomaly Model Output Statistics (SAMOS). Following Stauffer et al. (2016), observation and forecast fields are transformed into standardized anomalies by subtracting a site-specific climatological mean and dividing by the climatological standard deviation. Due to the need of fitting only a single regression model for the whole domain, the SAMOS framework provides a computationally inexpensive method to create operationally calibrated probabilistic forecasts for any arbitrary location or for all grid points in the domain simultaneously. Taking advantage of the INCA system (Integrated Nowcasting through Comprehensive Analysis), high resolution analyses are used for the computation of the observed climatology and for model training. The INCA system operationally combines station measurements and remote sensing data into real-time objective analysis fields at 1 km-horizontal resolution and 1 h-temporal resolution. The precipitation forecast used in this study is obtained from a limited area model ensemble prediction system also operated by ZAMG. The so called ALADIN-LAEF provides, by applying a multi-physics approach, a 17-member forecast at a horizontal resolution of 10.9 km and a temporal resolution of 1 hour. The performed SAMOS approach statistically combines the in-house developed high resolution analysis and ensemble prediction system. The station-based validation of 6 hour precipitation sums shows a mean improvement of more than 40% in CRPS when compared to bilinearly interpolated uncalibrated ensemble forecasts. The validation on randomly selected grid points, representing the true height distribution over Austria, still indicates a mean improvement of 35%. The applied statistical model is currently set up for 6-hourly and daily accumulation periods, but will be extended to a temporal resolution of 1-3 hours within a new probabilistic nowcasting system operated by ZAMG.

  1. In search of a statistical probability model for petroleum-resource assessment : a critique of the probabilistic significance of certain concepts and methods used in petroleum-resource assessment : to that end, a probabilistic model is sketched

    USGS Publications Warehouse

    Grossling, Bernardo F.

    1975-01-01

    Exploratory drilling is still in incipient or youthful stages in those areas of the world where the bulk of the potential petroleum resources is yet to be discovered. Methods of assessing resources from projections based on historical production and reserve data are limited to mature areas. For most of the world's petroleum-prospective areas, a more speculative situation calls for a critical review of resource-assessment methodology. The language of mathematical statistics is required to define more rigorously the appraisal of petroleum resources. Basically, two approaches have been used to appraise the amounts of undiscovered mineral resources in a geologic province: (1) projection models, which use statistical data on the past outcome of exploration and development in the province; and (2) estimation models of the overall resources of the province, which use certain known parameters of the province together with the outcome of exploration and development in analogous provinces. These two approaches often lead to widely different estimates. Some of the controversy that arises results from a confusion of the probabilistic significance of the quantities yielded by each of the two approaches. Also, inherent limitations of analytic projection models-such as those using the logistic and Gomperts functions --have often been ignored. The resource-assessment problem should be recast in terms that provide for consideration of the probability of existence of the resource and of the probability of discovery of a deposit. Then the two above-mentioned models occupy the two ends of the probability range. The new approach accounts for (1) what can be expected with reasonably high certainty by mere projections of what has been accomplished in the past; (2) the inherent biases of decision-makers and resource estimators; (3) upper bounds that can be set up as goals for exploration; and (4) the uncertainties in geologic conditions in a search for minerals. Actual outcomes can then be viewed as phenomena subject to statistical uncertainty and responsive to changes in economic and technologic factors.

  2. Perspectives on statistics education: observations from statistical consulting in an academic nursing environment.

    PubMed

    Hayat, Matthew J; Schmiege, Sarah J; Cook, Paul F

    2014-04-01

    Statistics knowledge is essential for understanding the nursing and health care literature, as well as for applying rigorous science in nursing research. Statistical consultants providing services to faculty and students in an academic nursing program have the opportunity to identify gaps and challenges in statistics education for nursing students. This information may be useful to curriculum committees and statistics educators. This article aims to provide perspective on statistics education stemming from the experiences of three experienced statistics educators who regularly collaborate and consult with nurse investigators. The authors share their knowledge and express their views about data management, data screening and manipulation, statistical software, types of scientific investigation, and advanced statistical topics not covered in the usual coursework. The suggestions provided promote a call for data to study these topics. Relevant data about statistics education can assist educators in developing comprehensive statistics coursework for nursing students. Copyright 2014, SLACK Incorporated.

  3. Stochastic Partial Differential Equation Solver for Hydroacoustic Modeling: Improvements to Paracousti Sound Propagation Solver

    NASA Astrophysics Data System (ADS)

    Preston, L. A.

    2017-12-01

    Marine hydrokinetic (MHK) devices offer a clean, renewable alternative energy source for the future. Responsible utilization of MHK devices, however, requires that the effects of acoustic noise produced by these devices on marine life and marine-related human activities be well understood. Paracousti is a 3-D full waveform acoustic modeling suite that can accurately propagate MHK noise signals in the complex bathymetry found in the near-shore to open ocean environment and considers real properties of the seabed, water column, and air-surface interface. However, this is a deterministic simulation that assumes the environment and source are exactly known. In reality, environmental and source characteristics are often only known in a statistical sense. Thus, to fully characterize the expected noise levels within the marine environment, this uncertainty in environmental and source factors should be incorporated into the acoustic simulations. One method is to use Monte Carlo (MC) techniques where simulation results from a large number of deterministic solutions are aggregated to provide statistical properties of the output signal. However, MC methods can be computationally prohibitive since they can require tens of thousands or more simulations to build up an accurate representation of those statistical properties. An alternative method, using the technique of stochastic partial differential equations (SPDE), allows computation of the statistical properties of output signals at a small fraction of the computational cost of MC. We are developing a SPDE solver for the 3-D acoustic wave propagation problem called Paracousti-UQ to help regulators and operators assess the statistical properties of environmental noise produced by MHK devices. In this presentation, we present the SPDE method and compare statistical distributions of simulated acoustic signals in simple models to MC simulations to show the accuracy and efficiency of the SPDE method. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.

  4. Spatiotemporal Patterns of Urban Human Mobility

    NASA Astrophysics Data System (ADS)

    Hasan, Samiul; Schneider, Christian M.; Ukkusuri, Satish V.; González, Marta C.

    2013-04-01

    The modeling of human mobility is adopting new directions due to the increasing availability of big data sources from human activity. These sources enclose digital information about daily visited locations of a large number of individuals. Examples of these data include: mobile phone calls, credit card transactions, bank notes dispersal, check-ins in internet applications, among several others. In this study, we consider the data obtained from smart subway fare card transactions to characterize and model urban mobility patterns. We present a simple mobility model for predicting peoples' visited locations using the popularity of places in the city as an interaction parameter between different individuals. This ingredient is sufficient to reproduce several characteristics of the observed travel behavior such as: the number of trips between different locations in the city, the exploration of new places and the frequency of individual visits of a particular location. Moreover, we indicate the limitations of the proposed model and discuss open questions in the current state of the art statistical models of human mobility.

  5. Bayesian inference for psychology, part IV: parameter estimation and Bayes factors.

    PubMed

    Rouder, Jeffrey N; Haaf, Julia M; Vandekerckhove, Joachim

    2018-02-01

    In the psychological literature, there are two seemingly different approaches to inference: that from estimation of posterior intervals and that from Bayes factors. We provide an overview of each method and show that a salient difference is the choice of models. The two approaches as commonly practiced can be unified with a certain model specification, now popular in the statistics literature, called spike-and-slab priors. A spike-and-slab prior is a mixture of a null model, the spike, with an effect model, the slab. The estimate of the effect size here is a function of the Bayes factor, showing that estimation and model comparison can be unified. The salient difference is that common Bayes factor approaches provide for privileged consideration of theoretically useful parameter values, such as the value corresponding to the null hypothesis, while estimation approaches do not. Both approaches, either privileging the null or not, are useful depending on the goals of the analyst.

  6. Promoting Active Learning When Teaching Introductory Statistics and Probability Using a Portfolio Curriculum Approach

    ERIC Educational Resources Information Center

    Adair, Desmond; Jaeger, Martin; Price, Owen M.

    2018-01-01

    The use of a portfolio curriculum approach, when teaching a university introductory statistics and probability course to engineering students, is developed and evaluated. The portfolio curriculum approach, so called, as the students need to keep extensive records both as hard copies and digitally of reading materials, interactions with faculty,…

  7. Retrieving Essential Material at the End of Lectures Improves Performance on Statistics Exams

    ERIC Educational Resources Information Center

    Lyle, Keith B.; Crawford, Nicole A.

    2011-01-01

    At the end of each lecture in a statistics for psychology course, students answered a small set of questions that required them to retrieve information from the same day's lecture. These exercises constituted retrieval practice for lecture material subsequently tested on four exams throughout the course. This technique is called the PUREMEM…

  8. Doing Research That Matters: A Success Story from Statistics Education

    ERIC Educational Resources Information Center

    Hipkins, Rosemary

    2014-01-01

    This is the first report from a new initiative called TLRI Project Plus. It aims to add value to the Teaching and Learning Research Initiative (TLRI), which NZCER manages on behalf of the government, by synthesising findings across multiple projects. This report focuses on two projects in statistics education and explores the factors that…

  9. Ten Ways to Improve the Use of Statistical Mediation Analysis in the Practice of Child and Adolescent Treatment Research

    ERIC Educational Resources Information Center

    Maric, Marija; Wiers, Reinout W.; Prins, Pier J. M.

    2012-01-01

    Despite guidelines and repeated calls from the literature, statistical mediation analysis in youth treatment outcome research is rare. Even more concerning is that many studies that "have" reported mediation analyses do not fulfill basic requirements for mediation analysis, providing inconclusive data and clinical implications. As a result, after…

  10. 78 FR 63966 - Gulf of Mexico Fishery Management Council; Public Meeting

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-10-25

    ... Scientific and Statistical Committee (SSC). DATES: The meeting will be held from 9 a.m. Until 5 p.m. on.... Recommendations to the Council 4. Other Business For meeting materials, call (813) 348-1630. Although other non-emergency issues not on the agenda may come before the Scientific and Statistical Committees for discussion...

  11. Non-resonant multipactor--A statistical model

    NASA Astrophysics Data System (ADS)

    Rasch, J.; Johansson, J. F.

    2012-12-01

    High power microwave systems operating in vacuum or near vacuum run the risk of multipactor breakdown. In order to avoid multipactor, it is necessary to make theoretical predictions of critical parameter combinations. These treatments are generally based on the assumption of electrons moving in resonance with the electric field while traversing the gap between critical surfaces. Through comparison with experiments, it has been found that only for small system dimensions will the resonant approach give correct predictions. Apparently, the resonance is destroyed due to the statistical spread in electron emission velocity, and for a more valid description it is necessary to resort to rather complicated statistical treatments of the electron population, and extensive simulations. However, in the limit where resonance is completely destroyed it is possible to use a much simpler treatment, here called non-resonant theory. In this paper, we develop the formalism for this theory, use it to calculate universal curves for the existence of multipactor, and compare with previous results. Two important effects that leads to an increase in the multipactor threshold in comparison with the resonant prediction are identified. These are the statistical spread of impact speed, which leads to a lower average electron impact speed, and the impact of electrons in phase regions where the secondary electrons are immediately reabsorbed, leading to an effective removal of electrons from the discharge.

  12. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language

    PubMed Central

    Höhna, Sebastian; Landis, Michael J.

    2016-01-01

    Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697

  13. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language.

    PubMed

    Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik

    2016-07-01

    Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

  14. Three-Dimensional Algebraic Models of the tRNA Code and 12 Graphs for Representing the Amino Acids.

    PubMed

    José, Marco V; Morgado, Eberto R; Guimarães, Romeu Cardoso; Zamudio, Gabriel S; de Farías, Sávio Torres; Bobadilla, Juan R; Sosa, Daniela

    2014-08-11

    Three-dimensional algebraic models, also called Genetic Hotels, are developed to represent the Standard Genetic Code, the Standard tRNA Code (S-tRNA-C), and the Human tRNA code (H-tRNA-C). New algebraic concepts are introduced to be able to describe these models, to wit, the generalization of the 2n-Klein Group and the concept of a subgroup coset with a tail. We found that the H-tRNA-C displayed broken symmetries in regard to the S-tRNA-C, which is highly symmetric. We also show that there are only 12 ways to represent each of the corresponding phenotypic graphs of amino acids. The averages of statistical centrality measures of the 12 graphs for each of the three codes are carried out and they are statistically compared. The phenotypic graphs of the S-tRNA-C display a common triangular prism of amino acids in 10 out of the 12 graphs, whilst the corresponding graphs for the H-tRNA-C display only two triangular prisms. The graphs exhibit disjoint clusters of amino acids when their polar requirement values are used. We contend that the S-tRNA-C is in a frozen-like state, whereas the H-tRNA-C may be in an evolving state.

  15. High-order statistics of weber local descriptors for image representation.

    PubMed

    Han, Xian-Hua; Chen, Yen-Wei; Xu, Gang

    2015-06-01

    Highly discriminant visual features play a key role in different image classification applications. This study aims to realize a method for extracting highly-discriminant features from images by exploring a robust local descriptor inspired by Weber's law. The investigated local descriptor is based on the fact that human perception for distinguishing a pattern depends not only on the absolute intensity of the stimulus but also on the relative variance of the stimulus. Therefore, we firstly transform the original stimulus (the images in our study) into a differential excitation-domain according to Weber's law, and then explore a local patch, called micro-Texton, in the transformed domain as Weber local descriptor (WLD). Furthermore, we propose to employ a parametric probability process to model the Weber local descriptors, and extract the higher-order statistics to the model parameters for image representation. The proposed strategy can adaptively characterize the WLD space using generative probability model, and then learn the parameters for better fitting the training space, which would lead to more discriminant representation for images. In order to validate the efficiency of the proposed strategy, we apply three different image classification applications including texture, food images and HEp-2 cell pattern recognition, which validates that our proposed strategy has advantages over the state-of-the-art approaches.

  16. Isotropic Inelastic Collisions in a Multiterm Atom with Hyperfine Structure

    NASA Astrophysics Data System (ADS)

    Belluzzi, Luca; Landi Degl'Innocenti, Egidio; Trujillo Bueno, Javier

    2015-10-01

    A correct modeling of the scattering polarization profiles observed in some spectral lines of diagnostic interest, the sodium doublet being one of the most important examples, requires taking hyperfine structure (HFS) and quantum interference between different J-levels into account. An atomic model suitable for taking these physical ingredients into account is the so-called multiterm atom with HFS. In this work, we introduce and study the transfer and relaxation rates due to isotropic inelastic collisions with electrons, which enter the statistical equilibrium equations (SEE) for the atomic density matrix of this atomic model. Under the hypothesis that the electron-atom interaction is described by a dipolar operator, we provide useful relations between the rates describing the transfer and relaxation of quantum interference between different levels (whose numerical values are in most cases unknown) and the usual rates for the atomic level populations, for which experimental data and/or approximate theoretical expressions are generally available. For the particular case of a two-term atom with HFS, we present an analytical solution of the SEE for the spherical statistical tensors of the upper term, including both radiative and collisional processes, and we derive the expression of the emission coefficient in the four Stokes parameters. Finally, an illustrative application to the Na i D1 and D2 lines is presented.

  17. The ratio of profile peak separations as a probe of pulsar radio-beam structure

    NASA Astrophysics Data System (ADS)

    Dyks, J.; Pierbattista, M.

    2015-12-01

    The known population of pulsars contains objects with four- and five-component profiles, for which the peak-to-peak separations between the inner and outer components can be measured. These Q- and M-type profiles can be interpreted as a result of sightline cut through a nested-cone beam, or through a set of azimuthal fan beams. We show that the ratio RW of the components' separations provides a useful measure of the beam shape, which is mostly independent of parameters that determine the beam scale and complicate interpretation of simpler profiles. In particular, the method does not depend on the emission altitude and the dipole tilt distribution. The different structures of the radio beam imply manifestly different statistical distributions of RW, with the conal model being several orders of magnitude less consistent with data than the fan-beam model. To bring the conal model into consistency with data, strong effects of observational selection need to be called for, with 80 per cent of Q and M profiles assumed to be undetected because of intrinsic blending effects. It is concluded that the statistical properties of Q and M profiles are more consistent with the fan-shaped beams, than with the traditional nested-cone geometry.

  18. Perceptual quality prediction on authentically distorted images using a bag of features approach

    PubMed Central

    Ghadiyaram, Deepti; Bovik, Alan C.

    2017-01-01

    Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a “bag of feature maps” approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies—or departures therefrom—of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models. PMID:28129417

  19. Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.

    PubMed

    Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R

    2016-12-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  20. A real-time computer model to assess resident work-hours scenarios.

    PubMed

    McDonald, Furman S; Ramakrishna, Gautam; Schultz, Henry J

    2002-07-01

    To accurately model residents' work hours and assess options to forthrightly meet Residency Review Committee-Internal Medicine (RRC-IM) requirements. The requirements limiting residents' work hours are clearly defined by the Accreditation Council for Graduate Medical Education (ACGME) and the RRC-IM: "When averaged over any four-week rotation or assignment, residents must not spend more than 80 hours per week in patient care duties."(1) The call for the profession to realistically address work-hours violations is of paramount importance.(2) Unfortunately, work hours are hard to calculate. We developed an electronic model of residents' work-hours scenarios using Microsoft Excel 97. This model allows the input of multiple parameters (i.e., call frequency, call position, days off, short-call, weeks per rotation, outpatient weeks, clinic day of the week, additional time due to clinic) and start and stop times for post-call, non-call, short-call, and weekend days. For each resident on a rotation, the model graphically demonstrates call schedules, plots clinic days, and portrays all possible and preferred days off. We tested the model for accuracy in several scenarios. For example, the model predicted average work hours of 85.1 hours per week for fourth-night-call rotations. This was compared with logs of actual work hours of 84.6 hours per week. Model accuracy for this scenario was 99.4% (95% CI 96.2%-100%). The model prospectively predicted work hours of 89.9 hours/week in the cardiac intensive care unit (CCU). Subsequent surveys found mean CCU work hours of 88, 1 hours per week. Model accuracy for this scenario was 98% (95% CI 93.2-100%). Thus validated, we then used the model to test proposed scenarios for complying with RRC-IM limits. The flexibility of the model allowed demonstration of the full range of work-hours scenarios in every rotation of our 36-month program. Demonstrations of status-quo work-hours scenarios were presented to faculty as well as real-time demonstrations of the feasibility, or unfeasibility, of their proposed solutions. The model clearly demonstrated that non-call (i.e., short-call) admissions without concomitant decreases in overnight call frequency resulted in substantial increases in total work hours. Attempts to "get the resident out" an hour or two earlier each day had negligible effects on total hours and were unrealistic paper solutions. For fourth-night-call rotations, the addition of a "golden weekend" (i.e., a fifth day off per month) was found to significantly reduce work hours. The electronic model allowed the development of creative schedules for previously third-night-call rotations that limit resident work hours without decreasing continuity of care by scheduling overnight call every sixth night alternating with sixth-night-short-call rotations. Our electronic model is sufficiently robust to accurately estimate work hours on multiple and varied rotations. This model clearly demonstrates that it is very difficult to meet the RRC-IM work-hours limitations under standard fourth-night-call schedules with only four days off per month. We are successfully using our model to test proposed alternative scenarios, to overcome faculty misconceptions about resident work-hours "solutions," and to make changes to our call schedules that both are realistic for residents to accomplish and truly diminish total resident work hours toward the requirements of the RRC-IM.

  1. Pesticide poisoning in Palestine: a retrospective analysis of calls received by Poison Control and Drug Information Center from 2006-2010.

    PubMed

    Sawalha, Ansam F; O'Malley, Gerald F; Sweileh, Waleed M

    2012-01-01

    The agricultural industry is the largest economic sector in Palestine and is characterized by extensive and unregulated use of pesticides. The objective of this study was to analyze phone calls received by the Poison Control and Drug Information Center (PCDIC) in Palestine regarding pesticide poisoning. All phone calls regarding pesticide poisoning received by the PCDIC from 2006 to 2010 were descriptively analyzed. Statistical Package for Social Sciences (SPSS version 16) was used in statistical analysis and to create figures. A total of 290 calls regarding pesticide poisoning were received during the study period. Most calls (83.8%) were made by physicians. The average age of reported cases was 19.6 ± 15 years. Pesticide poisoning occurred mostly in males (56.9%). Pesticide poisoning was most common (75, 25.9%) in the age category of 20-29.9 years. The majority (51.7%) of the cases were deliberate self-harm while the remaining was accidental exposure. The majority of phone calls (250, 86.2%) described oral exposure to pesticides. Approximately one third (32.9%) of the cases had symptoms consistent with organophosphate poisoning. Gastric lavage (31.7%) was the major decontamination method used, while charcoal was only utilized in 1.4% of the cases. Follow up was performed in 45.5% of the cases, two patients died after hospital admission while the remaining had positive outcome. Pesticide poisoning is a major health problem in Palestine, and the PCDIC has a clear mission to help in recommending therapy and gathering information.

  2. Graph wavelet alignment kernels for drug virtual screening.

    PubMed

    Smalter, Aaron; Huan, Jun; Lushington, Gerald

    2009-06-01

    In this paper, we introduce a novel statistical modeling technique for target property prediction, with applications to virtual screening and drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to summarize features capturing graph local topology. We design a novel graph kernel function to utilize the topology features to build predictive models for chemicals via Support Vector Machine classifier. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than ten fold speedup.

  3. COINSTAC: Decentralizing the future of brain imaging analysis

    PubMed Central

    Ming, Jing; Verner, Eric; Sarwate, Anand; Kelly, Ross; Reed, Cory; Kahleck, Torran; Silva, Rogers; Panta, Sandeep; Turner, Jessica; Plis, Sergey; Calhoun, Vince

    2017-01-01

    In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications. PMID:29123643

  4. Wind models for the NSTS ascent trajectory biasing for wind load alleviation

    NASA Technical Reports Server (NTRS)

    Smith, O. E.; Adelfang, S. I.; Batts, G. W.; Hill, C. K.

    1989-01-01

    New concepts are presented for aerospace vehicle ascent wind profile biasing. The purpose for wind biasing the ascent trajectory is to provide ascent wind loads relief and thus decrease the probability for launch delays due to wind loads exceeding critical limits. Wind biasing trajectories to the profile of monthly mean winds have been widely used for this purpose. The wind profile models presented give additional alternatives for wind biased trajectories. They are derived from the properties of the bivariate normal probability function using the available wind statistical parameters for the launch site. The analytical expressions are presented to permit generalizations. Specific examples are given to illustrate the procedures. The wind profile models can be used to establish the ascent trajectory steering commands to guide the vehicle through the first stage. For the National Space Transportation System (NSTS) program these steering commands are called I-loads.

  5. Simulating stick-slip failure in a sheared granular layer using a physics-based constitutive model

    DOE PAGES

    Lieou, Charles K. C.; Daub, Eric G.; Guyer, Robert A.; ...

    2017-01-14

    In this paper, we model laboratory earthquakes in a biaxial shear apparatus using the Shear-Transformation-Zone (STZ) theory of dense granular flow. The theory is based on the observation that slip events in a granular layer are attributed to grain rearrangement at soft spots called STZs, which can be characterized according to principles of statistical physics. We model lab data on granular shear using STZ theory and document direct connections between the STZ approach and rate-and-state friction. We discuss the stability transition from stable shear to stick-slip failure and show that stick slip is predicted by STZ when the applied shearmore » load exceeds a threshold value that is modulated by elastic stiffness and frictional rheology. Finally, we also show that STZ theory mimics fault zone dilation during the stick phase, consistent with lab observations.« less

  6. Energy landscape analysis of neuroimaging data

    NASA Astrophysics Data System (ADS)

    Ezaki, Takahiro; Watanabe, Takamitsu; Ohzeki, Masayuki; Masuda, Naoki

    2017-05-01

    Computational neuroscience models have been used for understanding neural dynamics in the brain and how they may be altered when physiological or other conditions change. We review and develop a data-driven approach to neuroimaging data called the energy landscape analysis. The methods are rooted in statistical physics theory, in particular the Ising model, also known as the (pairwise) maximum entropy model and Boltzmann machine. The methods have been applied to fitting electrophysiological data in neuroscience for a decade, but their use in neuroimaging data is still in its infancy. We first review the methods and discuss some algorithms and technical aspects. Then, we apply the methods to functional magnetic resonance imaging data recorded from healthy individuals to inspect the relationship between the accuracy of fitting, the size of the brain system to be analysed and the data length. This article is part of the themed issue `Mathematical methods in medicine: neuroscience, cardiology and pathology'.

  7. Cross-correlation detection and analysis for California's electricity market based on analogous multifractal analysis

    NASA Astrophysics Data System (ADS)

    Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen

    2013-03-01

    A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF -XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF -XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.

  8. Cross-correlation detection and analysis for California's electricity market based on analogous multifractal analysis.

    PubMed

    Wang, Fang; Liao, Gui-ping; Li, Jian-hui; Zou, Rui-biao; Shi, Wen

    2013-03-01

    A novel method, which we called the analogous multifractal cross-correlation analysis, is proposed in this paper to study the multifractal behavior in the power-law cross-correlation between price and load in California electricity market. In addition, a statistic ρAMF-XA, which we call the analogous multifractal cross-correlation coefficient, is defined to test whether the cross-correlation between two given signals is genuine or not. Our analysis finds that both the price and load time series in California electricity market express multifractal nature. While, as indicated by the ρAMF-XA statistical test, there is a huge difference in the cross-correlation behavior between the years 1999 and 2000 in California electricity markets.

  9. [The epidemiological relationship of periodontitis, intestinal dysbiosis, atherogenic dyslipidemia and metabolic syndrome].

    PubMed

    Petrukhina, N B; Zorina, O A; Rabinovich, I M; Shilov, A M

    2015-01-01

    The study of risk factors for cardio-vascular continuum (CVC), the influence of the digestive tract endobiosis on lipid-carbohydrate metabolism and clinical status, a retrospective analysis of 1000 medical records of patients, suffering from various diseases of internal organs (Gastrointestinal tract, coronary heart disease, type 2 diabetes, obesity) in combination with periodontitis of varying severity, aged 20 to 55 years. A statistically significant relationship is directly proportional to the severity of inflammation of periodontal tissues with body mass index (BMI), especially pronounced in patients with a BMI ≥225 kg/m2 which is the "calling card" of the metabolic syndrome - clinical model polymorbidity.

  10. The Clinical Ethnographic Interview: A user-friendly guide to the cultural formulation of distress and help seeking

    PubMed Central

    Arnault, Denise Saint; Shimabukuro, Shizuka

    2013-01-01

    Transcultural nursing, psychiatry, and medical anthropology have theorized that practitioners and researchers need more flexible instruments to gather culturally relevant illness experience, meaning, and help seeking. The state of the science is sufficiently developed to allow standardized yet ethnographically sound protocols for assessment. However, vigorous calls for culturally adapted assessment models have yielded little real change in routine practice. This paper describes the conversion of the Diagnostic and Statistical Manual IV, Appendix I Outline for Cultural Formulation into a user-friendly Clinical Ethnographic Interview (CEI), and provides clinical examples of its use in a sample of highly distressed Japanese women. PMID:22194348

  11. New Stopping Criteria for Segmenting DNA Sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Wentian

    2001-06-18

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S.cerevisiae and the complete sequence of E.coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genomemore » sequences.« less

  12. Language acquisition and use: learning and applying probabilistic constraints.

    PubMed

    Seidenberg, M S

    1997-03-14

    What kinds of knowledge underlie the use of language and how is this knowledge acquired? Linguists equate knowing a language with knowing a grammar. Classic "poverty of the stimulus" arguments suggest that grammar identification is an intractable inductive problem and that acquisition is possible only because children possess innate knowledge of grammatical structure. An alternative view is emerging from studies of statistical and probabilistic aspects of language, connectionist models, and the learning capacities of infants. This approach emphasizes continuity between how language is acquired and how it is used. It retains the idea that innate capacities constrain language learning, but calls into question whether they include knowledge of grammatical structure.

  13. Interval-based reconstruction for uncertainty quantification in PET

    NASA Astrophysics Data System (ADS)

    Kucharczak, Florentin; Loquin, Kevin; Buvat, Irène; Strauss, Olivier; Mariano-Goulart, Denis

    2018-02-01

    A new directed interval-based tomographic reconstruction algorithm, called non-additive interval based expectation maximization (NIBEM) is presented. It uses non-additive modeling of the forward operator that provides intervals instead of single-valued projections. The detailed approach is an extension of the maximum likelihood—expectation maximization algorithm based on intervals. The main motivation for this extension is that the resulting intervals have appealing properties for estimating the statistical uncertainty associated with the reconstructed activity values. After reviewing previously published theoretical concepts related to interval-based projectors, this paper describes the NIBEM algorithm and gives examples that highlight the properties and advantages of this interval valued reconstruction.

  14. Last call: Passive acoustic monitoring shows continued rapid decline of critically endangered vaquita.

    PubMed

    Thomas, Len; Jaramillo-Legorreta, Armando; Cardenas-Hinojosa, Gustavo; Nieto-Garcia, Edwyna; Rojas-Bracho, Lorenzo; Ver Hoef, Jay M; Moore, Jeffrey; Taylor, Barbara; Barlow, Jay; Tregenza, Nicholas

    2017-11-01

    The vaquita is a critically endangered species of porpoise. It produces echolocation clicks, making it a good candidate for passive acoustic monitoring. A systematic grid of sensors has been deployed for 3 months annually since 2011; results from 2016 are reported here. Statistical models (to compensate for non-uniform data loss) show an overall decline in the acoustic detection rate between 2015 and 2016 of 49% (95% credible interval 82% decline to 8% increase), and total decline between 2011 and 2016 of over 90%. Assuming the acoustic detection rate is proportional to population size, approximately 30 vaquita (95% credible interval 8-96) remained in November 2016.

  15. A computer program (MODFLOWP) for estimating parameters of a transient, three-dimensional ground-water flow model using nonlinear regression

    USGS Publications Warehouse

    Hill, Mary Catherine

    1992-01-01

    This report documents a new version of the U.S. Geological Survey modular, three-dimensional, finite-difference, ground-water flow model (MODFLOW) which, with the new Parameter-Estimation Package that also is documented in this report, can be used to estimate parameters by nonlinear regression. The new version of MODFLOW is called MODFLOWP (pronounced MOD-FLOW*P), and functions nearly identically to MODFLOW when the ParameterEstimation Package is not used. Parameters are estimated by minimizing a weighted least-squares objective function by the modified Gauss-Newton method or by a conjugate-direction method. Parameters used to calculate the following MODFLOW model inputs can be estimated: Transmissivity and storage coefficient of confined layers; hydraulic conductivity and specific yield of unconfined layers; vertical leakance; vertical anisotropy (used to calculate vertical leakance); horizontal anisotropy; hydraulic conductance of the River, Streamflow-Routing, General-Head Boundary, and Drain Packages; areal recharge rates; maximum evapotranspiration; pumpage rates; and the hydraulic head at constant-head boundaries. Any spatial variation in parameters can be defined by the user. Data used to estimate parameters can include existing independent estimates of parameter values, observed hydraulic heads or temporal changes in hydraulic heads, and observed gains and losses along head-dependent boundaries (such as streams). Model output includes statistics for analyzing the parameter estimates and the model; these statistics can be used to quantify the reliability of the resulting model, to suggest changes in model construction, and to compare results of models constructed in different ways.

  16. The statistical mechanics of complex signaling networks: nerve growth factor signaling

    NASA Astrophysics Data System (ADS)

    Brown, K. S.; Hill, C. C.; Calero, G. A.; Myers, C. R.; Lee, K. H.; Sethna, J. P.; Cerione, R. A.

    2004-10-01

    The inherent complexity of cellular signaling networks and their importance to a wide range of cellular functions necessitates the development of modeling methods that can be applied toward making predictions and highlighting the appropriate experiments to test our understanding of how these systems are designed and function. We use methods of statistical mechanics to extract useful predictions for complex cellular signaling networks. A key difficulty with signaling models is that, while significant effort is being made to experimentally measure the rate constants for individual steps in these networks, many of the parameters required to describe their behavior remain unknown or at best represent estimates. To establish the usefulness of our approach, we have applied our methods toward modeling the nerve growth factor (NGF)-induced differentiation of neuronal cells. In particular, we study the actions of NGF and mitogenic epidermal growth factor (EGF) in rat pheochromocytoma (PC12) cells. Through a network of intermediate signaling proteins, each of these growth factors stimulates extracellular regulated kinase (Erk) phosphorylation with distinct dynamical profiles. Using our modeling approach, we are able to predict the influence of specific signaling modules in determining the integrated cellular response to the two growth factors. Our methods also raise some interesting insights into the design and possible evolution of cellular systems, highlighting an inherent property of these systems that we call 'sloppiness.'

  17. Personalised news filtering and recommendation system using Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model

    NASA Astrophysics Data System (ADS)

    Adeniyi, D. A.; Wei, Z.; Yang, Y.

    2017-10-01

    Recommendation problem has been extensively studied by researchers in the field of data mining, database and information retrieval. This study presents the design and realisation of an automated, personalised news recommendations system based on Chi-square statistics-based K-nearest neighbour (χ2SB-KNN) model. The proposed χ2SB-KNN model has the potential to overcome computational complexity and information overloading problems, reduces runtime and speeds up execution process through the use of critical value of χ2 distribution. The proposed recommendation engine can alleviate scalability challenges through combined online pattern discovery and pattern matching for real-time recommendations. This work also showcases the development of a novel method of feature selection referred to as Data Discretisation-Based feature selection method. This is used for selecting the best features for the proposed χ2SB-KNN algorithm at the preprocessing stage of the classification procedures. The implementation of the proposed χ2SB-KNN model is achieved through the use of a developed in-house Java program on an experimental website called OUC newsreaders' website. Finally, we compared the performance of our system with two baseline methods which are traditional Euclidean distance K-nearest neighbour and Naive Bayesian techniques. The result shows a significant improvement of our method over the baseline methods studied.

  18. FADO: a statistical method to detect favored or avoided distances between occurrences of motifs using the Hawkes' model.

    PubMed

    Gusto, Gaelle; Schbath, Sophie

    2005-01-01

    We propose an original statistical method to estimate how the occurrences of a given process along a genome, genes or motifs for instance, may be influenced by the occurrences of a second process. More precisely, the aim is to detect avoided and/or favored distances between two motifs, for instance, suggesting possible interactions at a molecular level. For this, we consider occurrences along the genome as point processes and we use the so-called Hawkes' model. In such model, the intensity at position t depends linearly on the distances to past occurrences of both processes via two unknown profile functions to estimate. We perform a non parametric estimation of both profiles by using B-spline decompositions and a constrained maximum likelihood method. Finally, we use the AIC criterion for the model selection. Simulations show the excellent behavior of our estimation procedure. We then apply it to study (i) the dependence between gene occurrences along the E. coli genome and the occurrences of a motif known to be part of the major promoter for this bacterium, and (ii) the dependence between the yeast S. cerevisiae genes and the occurrences of putative polyadenylation signals. The results are coherent with known biological properties or previous predictions, meaning this method can be of great interest for functional motif detection, or to improve knowledge of some biological mechanisms.

  19. Stochastic reduced order models for inverse problems under uncertainty

    PubMed Central

    Warner, James E.; Aquino, Wilkins; Grigoriu, Mircea D.

    2014-01-01

    This work presents a novel methodology for solving inverse problems under uncertainty using stochastic reduced order models (SROMs). Given statistical information about an observed state variable in a system, unknown parameters are estimated probabilistically through the solution of a model-constrained, stochastic optimization problem. The point of departure and crux of the proposed framework is the representation of a random quantity using a SROM - a low dimensional, discrete approximation to a continuous random element that permits e cient and non-intrusive stochastic computations. Characterizing the uncertainties with SROMs transforms the stochastic optimization problem into a deterministic one. The non-intrusive nature of SROMs facilitates e cient gradient computations for random vector unknowns and relies entirely on calls to existing deterministic solvers. Furthermore, the method is naturally extended to handle multiple sources of uncertainty in cases where state variable data, system parameters, and boundary conditions are all considered random. The new and widely-applicable SROM framework is formulated for a general stochastic optimization problem in terms of an abstract objective function and constraining model. For demonstration purposes, however, we study its performance in the specific case of inverse identification of random material parameters in elastodynamics. We demonstrate the ability to efficiently recover random shear moduli given material displacement statistics as input data. We also show that the approach remains effective for the case where the loading in the problem is random as well. PMID:25558115

  20. Efficient identification and referral of low-income women at high risk for hereditary breast cancer: a practice-based approach.

    PubMed

    Joseph, G; Kaplan, C; Luce, J; Lee, R; Stewart, S; Guerra, C; Pasick, R

    2012-01-01

    Identification of low-income women with the rare but serious risk of hereditary cancer and their referral to appropriate services presents an important public health challenge. We report the results of formative research to reach thousands of women for efficient identification of those at high risk and expedient access to free genetic services. External validity is maximized by emphasizing intervention fit with the two end-user organizations who must connect to make this possible. This study phase informed the design of a subsequent randomized controlled trial. We conducted a randomized controlled pilot study (n = 38) to compare two intervention models for feasibility and impact. The main outcome was receipt of genetic counseling during a two-month intervention period. Model 1 was based on the usual outcall protocol of an academic hospital genetic risk program, and Model 2 drew on the screening and referral procedures of a statewide toll-free phone line through which large numbers of high-risk women can be identified. In Model 1, the risk program proactively calls patients to schedule genetic counseling; for Model 2, women are notified of their eligibility for counseling and make the call themselves. We also developed and pretested a family history screener for administration by phone to identify women appropriate for genetic counseling. There was no statistically significant difference in receipt of genetic counseling between women randomized to Model 1 (3/18) compared with Model 2 (3/20) during the intervention period. However, when unresponsive women in Model 2 were called after 2 months, 7 more obtained counseling; 4 women from Model 1 were also counseled after the intervention. Thus, the intervention model that closely aligned with the risk program's outcall to high-risk women was found to be feasible and brought more low-income women to free genetic counseling. Our screener was easy to administer by phone and appeared to identify high-risk callers effectively. The model and screener are now in use in the main trial to test the effectiveness of this screening and referral intervention. A validation analysis of the screener is also underway. Identification of intervention strategies and tools, and their systematic comparison for impact and efficiency in the context where they will ultimately be used are critical elements of practice-based research. Copyright © 2012 S. Karger AG, Basel.

  1. Conceptual Models of Depression in Primary Care Patients: A Comparative Study

    PubMed Central

    Karasz, Alison; Garcia, Nerina; Ferri, Lucia

    2009-01-01

    Conventional psychiatric treatment models are based on a biopsychiatric model of depression. A plausible explanation for low rates of depression treatment utilization among ethnic minorities and the poor is that members of these communities do not share the cultural assumptions underlying the biopsychiatric model. The study examined conceptual models of depression among depressed patients from various ethnic groups, focusing on the degree to which patients’ conceptual models ‘matched’ a biopsychiatric model of depression. The sample included 74 primary care patients from three ethnic groups screening positive for depression. We administered qualitative interviews assessing patients’ conceptual representations of depression. The analysis proceeded in two phases. The first phase involved a strategy called ‘quantitizing’ the qualitative data. A rating scheme was developed and applied to the data by a rater blind to study hypotheses. The data was subjected to statistical analyses. The second phase of the analysis involved the analysis of thematic data using standard qualitative techniques. Study hypotheses were largely supported. The qualitative analysis provided a detailed picture of primary care patients’ conceptual models of depression and suggested interesting directions for future research. PMID:20182550

  2. Impacts of Austrian Climate Variability on Honey Bee Mortality

    NASA Astrophysics Data System (ADS)

    Switanek, Matt; Brodschneider, Robert; Crailsheim, Karl; Truhetz, Heimo

    2015-04-01

    Global food production, as it is today, is not possible without pollinators such as the honey bee. It is therefore alarming that honey bee populations across the world have seen increased mortality rates in the last few decades. The challenges facing the honey bee calls into question the future of our food supply. Beside various infectious diseases, Varroa destructor is one of the main culprits leading to increased rates of honey bee mortality. Varroa destructor is a parasitic mite which strongly depends on honey bee brood for reproduction and can wipe out entire colonies. However, climate variability may also importantly influence honey bee breeding cycles and bee mortality rates. Persistent weather events affects vegetation and hence foraging possibilities for honey bees. This study first defines critical statistical relationships between key climate indicators (e.g., precipitation and temperature) and bee mortality rates across Austria, using 6 consecutive years of data. Next, these leading indicators, as they vary in space and time, are used to build a statistical model to predict bee mortality rates and the respective number of colonies affected. Using leave-one-out cross validation, the model reduces the Root Mean Square Error (RMSE) by 21% with respect to predictions made with the mean mortality rate and the number of colonies. Furthermore, a Monte Carlo test is used to establish that the model's predictions are statistically significant at the 99.9% confidence level. These results highlight the influence of climate variables on honey bee populations, although variability in climate, by itself, cannot fully explain colony losses. This study was funded by the Austrian project 'Zukunft Biene'.

  3. Optical turbulence forecast: ready for an operational application

    NASA Astrophysics Data System (ADS)

    Masciadri, E.; Lascaux, F.; Turchi, A.; Fini, L.

    2017-04-01

    One of the main goals of the feasibility study MOSE (MOdelling ESO Sites) is to evaluate the performances of a method conceived to forecast the optical turbulence (OT) above the European Southern Observatory (ESO) sites of the Very Large Telescope (VLT) and the European Extremely Large Telescope (E-ELT) in Chile. The method implied the use of a dedicated code conceived for the OT called ASTRO-MESO-NH. In this paper, we present results we obtained at conclusion of this project concerning the performances of this method in forecasting the most relevant parameters related to the OT (CN^2, seeing ɛ, isoplanatic angle θ0 and wavefront coherence time τ0). Numerical predictions related to a very rich statistical sample of nights uniformly distributed along a solar year and belonging to different years have been compared to observations, and different statistical operators have been analysed such as the classical bias, root-mean-squared error, σ and more sophisticated statistical operators derived by the contingency tables that are able to quantify the score of success of a predictive method such as the percentage of correct detection (PC) and the probability to detect a parameter within a specific range of values (POD). The main conclusions of the study tell us that the ASTRO-MESO-NH model provides performances that are already very good to definitely guarantee a not negligible positive impact on the service mode of top-class telescopes and ELTs. A demonstrator for an automatic and operational version of the ASTRO-MESO-NH model will be soon implemented on the sites of VLT and E-ELT.

  4. Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.

    PubMed

    Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe

    2012-01-01

    Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.

  5. Practical interpretation of CYP2D6 haplotypes: Comparison and integration of automated and expert calling.

    PubMed

    Ruaño, Gualberto; Kocherla, Mohan; Graydon, James S; Holford, Theodore R; Makowski, Gregory S; Goethe, John W

    2016-05-01

    We describe a population genetic approach to compare samples interpreted with expert calling (EC) versus automated calling (AC) for CYP2D6 haplotyping. The analysis represents 4812 haplotype calls based on signal data generated by the Luminex xMap analyzers from 2406 patients referred to a high-complexity molecular diagnostics laboratory for CYP450 testing. DNA was extracted from buccal swabs. We compared the results of expert calls (EC) and automated calls (AC) with regard to haplotype number and frequency. The ratio of EC to AC was 1:3. Haplotype frequencies from EC and AC samples were convergent across haplotypes, and their distribution was not statistically different between the groups. Most duplications required EC, as only expansions with homozygous or hemizygous haplotypes could be automatedly called. High-complexity laboratories can offer equivalent interpretation to automated calling for non-expanded CYP2D6 loci, and superior interpretation for duplications. We have validated scientific expert calling specified by scoring rules as standard operating procedure integrated with an automated calling algorithm. The integration of EC with AC is a practical strategy for CYP2D6 clinical haplotyping. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research.

    PubMed

    Tang, Qi-Yi; Zhang, Chuan-Xi

    2013-04-01

    A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology. © 2012 The Authors Insect Science © 2012 Institute of Zoology, Chinese Academy of Sciences.

  7. From seconds to months: an overview of multi-scale dynamics of mobile telephone calls

    NASA Astrophysics Data System (ADS)

    Saramäki, Jari; Moro, Esteban

    2015-06-01

    Big Data on electronic records of social interactions allow approaching human behaviour and sociality from a quantitative point of view with unforeseen statistical power. Mobile telephone Call Detail Records (CDRs), automatically collected by telecom operators for billing purposes, have proven especially fruitful for understanding one-to-one communication patterns as well as the dynamics of social networks that are reflected in such patterns. We present an overview of empirical results on the multi-scale dynamics of social dynamics and networks inferred from mobile telephone calls. We begin with the shortest timescales and fastest dynamics, such as burstiness of call sequences between individuals, and "zoom out" towards longer temporal and larger structural scales, from temporal motifs formed by correlated calls between multiple individuals to long-term dynamics of social groups. We conclude this overview with a future outlook.

  8. An Application of Extreme Value Theory to Learning Analytics: Predicting Collaboration Outcome from Eye-Tracking Data

    ERIC Educational Resources Information Center

    Sharma, Kshitij; Chavez-Demoulin, Valérie; Dillenbourg, Pierre

    2017-01-01

    The statistics used in education research are based on central trends such as the mean or standard deviation, discarding outliers. This paper adopts another viewpoint that has emerged in statistics, called extreme value theory (EVT). EVT claims that the bulk of normal distribution is comprised mainly of uninteresting variations while the most…

  9. Including the Tukey Mean-Difference (Bland-Altman) Plot in a Statistics Course

    ERIC Educational Resources Information Center

    Kozak, Marcin; Wnuk, Agnieszka

    2014-01-01

    The Tukey mean-difference plot, also called the Bland-Altman plot, is a recognized graphical tool in the exploration of biometrical data. We show that this technique deserves a place on an introductory statistics course by encouraging students to think about the kind of graph they wish to create, rather than just creating the default graph for the…

  10. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments

    PubMed Central

    2010-01-01

    Background The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Results Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Conclusions Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/. PMID:20482791

  11. vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments.

    PubMed

    Ma, Jingming; Dykes, Carrie; Wu, Tao; Huang, Yangxin; Demeter, Lisa; Wu, Hulin

    2010-05-18

    The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/.

  12. Observability of ionospheric space-time structure with ISR: A simulation study

    NASA Astrophysics Data System (ADS)

    Swoboda, John; Semeter, Joshua; Zettergren, Matthew; Erickson, Philip J.

    2017-02-01

    The sources of error from electronically steerable array (ESA) incoherent scatter radar (ISR) systems are investigated both theoretically and with use of an open-source ISR simulator, developed by the authors, called Simulator for ISR (SimISR). The main sources of error incorporated in the simulator include statistical uncertainty, which arises due to nature of the measurement mechanism and the inherent space-time ambiguity from the sensor. SimISR can take a field of plasma parameters, parameterized by time and space, and create simulated ISR data at the scattered electric field (i.e., complex receiver voltage) level, subsequently processing these data to show possible reconstructions of the original parameter field. To demonstrate general utility, we show a number of simulation examples, with two cases using data from a self-consistent multifluid transport model. Results highlight the significant influence of the forward model of the ISR process and the resulting statistical uncertainty on plasma parameter measurements and the core experiment design trade-offs that must be made when planning observations. These conclusions further underscore the utility of this class of measurement simulator as a design tool for more optimal experiment design efforts using flexible ESA class ISR systems.

  13. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

    PubMed

    Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

    2017-01-03

    Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

  15. Motifs in triadic random graphs based on Steiner triple systems

    NASA Astrophysics Data System (ADS)

    Winkler, Marco; Reichardt, Jörg

    2013-08-01

    Conventionally, pairwise relationships between nodes are considered to be the fundamental building blocks of complex networks. However, over the last decade, the overabundance of certain subnetwork patterns, i.e., the so-called motifs, has attracted much attention. It has been hypothesized that these motifs, instead of links, serve as the building blocks of network structures. Although the relation between a network's topology and the general properties of the system, such as its function, its robustness against perturbations, or its efficiency in spreading information, is the central theme of network science, there is still a lack of sound generative models needed for testing the functional role of subgraph motifs. Our work aims to overcome this limitation. We employ the framework of exponential random graph models (ERGMs) to define models based on triadic substructures. The fact that only a small portion of triads can actually be set independently poses a challenge for the formulation of such models. To overcome this obstacle, we use Steiner triple systems (STSs). These are partitions of sets of nodes into pair-disjoint triads, which thus can be specified independently. Combining the concepts of ERGMs and STSs, we suggest generative models capable of generating ensembles of networks with nontrivial triadic Z-score profiles. Further, we discover inevitable correlations between the abundance of triad patterns, which occur solely for statistical reasons and need to be taken into account when discussing the functional implications of motif statistics. Moreover, we calculate the degree distributions of our triadic random graphs analytically.

  16. Estimating global cropland production from 1961 to 2010

    NASA Astrophysics Data System (ADS)

    Han, Pengfei; Zeng, Ning; Zhao, Fang; Lin, Xiaohui

    2017-09-01

    Global cropland net primary production (NPP) has tripled over the last 50 years, contributing 17-45 % to the increase in global atmospheric CO2 seasonal amplitude. Although many regional-scale comparisons have been made between statistical data and modeling results, long-term national comparisons across global croplands are scarce due to the lack of detailed spatiotemporal management data. Here, we conducted a simulation study of global cropland NPP from 1961 to 2010 using a process-based model called Vegetation-Global Atmosphere-Soil (VEGAS) and compared the results with Food and Agriculture Organization of the United Nations (FAO) statistical data on both continental and country scales. According to the FAO data, the global cropland NPP was 1.3, 1.8, 2.2, 2.6, 3.0, and 3.6 PgC yr-1 in the 1960s, 1970s, 1980s, 1990s, 2000s, and 2010s, respectively. The VEGAS model captured these major trends on global and continental scales. The NPP increased most notably in the US Midwest, western Europe, and the North China Plain and increased modestly in Africa and Oceania. However, significant biases remained in some regions such as Africa and Oceania, especially in temporal evolution. This finding is not surprising as VEGAS is the first global carbon cycle model with full parameterization representing the Green Revolution. To improve model performance for different major regions, we modified the default values of management intensity associated with the agricultural Green Revolution differences across various regions to better match the FAO statistical data at the continental level and for selected countries. Across all the selected countries, the updated results reduced the RMSE from 19.0 to 10.5 TgC yr-1 (˜ 45 % decrease). The results suggest that these regional differences in model parameterization are due to differences in socioeconomic development. To better explain the past changes and predict the future trends, it is important to calibrate key parameters on regional scales and develop data sets for land management history.

  17. Maximizing time from the constraining European Working Time Directive (EWTD): The Heidelberg New Working Time Model.

    PubMed

    Schimmack, Simon; Hinz, Ulf; Wagner, Andreas; Schmidt, Thomas; Strothmann, Hendrik; Büchler, Markus W; Schmitz-Winnenthal, Hubertus

    2014-01-01

    The introduction of the European Working Time Directive (EWTD) has greatly reduced training hours of surgical residents, which translates into 30% less surgical and clinical experience. Such a dramatic drop in attendance has serious implications such compromised quality of medical care. As the surgical department of the University of Heidelberg, our goal was to establish a model that was compliant with the EWTD while avoiding reduction in quality of patient care and surgical training. We first performed workload analyses and performance statistics for all working areas of our department (operation theater, emergency room, specialized consultations, surgical wards and on-call duties) using personal interviews, time cards, medical documentation software as well as data of the financial- and personnel-controlling sector of our administration. Using that information, we specifically designed an EWTD-compatible work model and implemented it. Surgical wards and operating rooms (ORs) were not compliant with the EWTD. Between 5 pm and 8 pm, three ORs were still operating two-thirds of the time. By creating an extended work shift (7:30 am-7:30 pm), we effectively reduced the workload to less than 49% from 4 pm and 8 am, allowing the combination of an eight-hour working day with a 16-hour on call duty; thus, maximizing surgical resident training and ensuring patient continuity of care while maintaining EDTW guidelines. A precise workload analysis is the key to success. The Heidelberg New Working Time Model provides a legal model, which, by avoiding rotating work shifts, assures quality of patient care and surgical training.

  18. On the assessment of the added value of new predictive biomarkers.

    PubMed

    Chen, Weijie; Samuelson, Frank W; Gallas, Brandon D; Kang, Le; Sahiner, Berkman; Petrick, Nicholas

    2013-07-29

    The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC "has vastly inferior statistical properties," i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests. We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper. We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.

  19. PubMed related articles: a probabilistic topic-based model for content similarity

    PubMed Central

    Lin, Jimmy; Wilbur, W John

    2007-01-01

    Background We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document is about a particular topic is computed from term frequencies, modeled as Poisson distributions. Unlike previous probabilistic retrieval models, we do not attempt to estimate relevance–but rather our focus is "relatedness", the probability that a user would want to examine a particular document given known interest in another. We also describe a novel technique for estimating parameters that does not require human relevance judgments; instead, the process is based on the existence of MeSH ® in MEDLINE ®. Results The pmra retrieval model was compared against bm25, a competitive probabilistic model that shares theoretical similarities. Experiments using the test collection from the TREC 2005 genomics track shows a small but statistically significant improvement of pmra over bm25 in terms of precision. Conclusion Our experiments suggest that the pmra model provides an effective ranking algorithm for related article search. PMID:17971238

  20. Searching for a two-factor model of marriage duration: commentary on Gottman and Levenson.

    PubMed

    DeKay, Michael L; Greeno, Catherine G; Houck, Patricia R

    2002-01-01

    Gottman and Levenson (2002) report a number of post hoc ordinary least squares regressions to "predict" the length of marriage, given that divorce has occurred. We argue that the type of statistical model they use is inappropriate for answering clinically relevant questions about the causes and timing of divorce, and present several reasons why an alternative family of models called duration models would be more appropriate. The distribution of marriage length is not bimodal, as Gottman and Levenson suggest, and their search for a two-factor model for explaining marriage length is misguided. Their regression models omit many variables known to affect marriage length, and instead use variables that were pre-screened for their predictive ability. Their final model is based on data for only 15 cases, including one unusual case that has undue influence on the results. For these and other technical reasons presented in the text, we believe that Gottman and Levenson's results are not replicable, and that they should not be used to guide interventions for couples in clinical settings.

  1. Inverse Gaussian gamma distribution model for turbulence-induced fading in free-space optical communication.

    PubMed

    Cheng, Mingjian; Guo, Ya; Li, Jiangting; Zheng, Xiaotong; Guo, Lixin

    2018-04-20

    We introduce an alternative distribution to the gamma-gamma (GG) distribution, called inverse Gaussian gamma (IGG) distribution, which can efficiently describe moderate-to-strong irradiance fluctuations. The proposed stochastic model is based on a modulation process between small- and large-scale irradiance fluctuations, which are modeled by gamma and inverse Gaussian distributions, respectively. The model parameters of the IGG distribution are directly related to atmospheric parameters. The accuracy of the fit among the IGG, log-normal, and GG distributions with the experimental probability density functions in moderate-to-strong turbulence are compared, and results indicate that the newly proposed IGG model provides an excellent fit to the experimental data. As the receiving diameter is comparable with the atmospheric coherence radius, the proposed IGG model can reproduce the shape of the experimental data, whereas the GG and LN models fail to match the experimental data. The fundamental channel statistics of a free-space optical communication system are also investigated in an IGG-distributed turbulent atmosphere, and a closed-form expression for the outage probability of the system is derived with Meijer's G-function.

  2. Modeling and experiments for the time-dependent diffusion coefficient during methane desorption from coal

    NASA Astrophysics Data System (ADS)

    Cheng-Wu, Li; Hong-Lai, Xue; Cheng, Guan; Wen-biao, Liu

    2018-04-01

    Statistical analysis shows that in the coal matrix, the diffusion coefficient for methane is time-varying, and its integral satisfies the formula μt κ /(1 + β κ ). Therefore, a so-called dynamic diffusion coefficient model (DDC model) is developed. To verify the suitability and accuracy of the DDC model, a series of gas diffusion experiments were conducted using coal particles of different sizes. The results show that the experimental data can be accurately described by the DDC and bidisperse models, but the fit to the DDC model is slightly better. For all coal samples, as time increases, the effective diffusion coefficient first shows a sudden drop, followed by a gradual decrease before stabilizing at longer times. The effective diffusion coefficient has a negative relationship with the size of the coal particle. Finally, the relationship between the constants of the DDC model and the effective diffusion coefficient is discussed. The constant α (μ/R 2 ) denotes the effective coefficient at the initial time, and the constants κ and β control the attenuation characteristic of the effective diffusion coefficient.

  3. Probabilistic models of eukaryotic evolution: time for integration

    PubMed Central

    Lartillot, Nicolas

    2015-01-01

    In spite of substantial work and recent progress, a global and fully resolved picture of the macroevolutionary history of eukaryotes is still under construction. This concerns not only the phylogenetic relations among major groups, but also the general characteristics of the underlying macroevolutionary processes, including the patterns of gene family evolution associated with endosymbioses, as well as their impact on the sequence evolutionary process. All these questions raise formidable methodological challenges, calling for a more powerful statistical paradigm. In this direction, model-based probabilistic approaches have played an increasingly important role. In particular, improved models of sequence evolution accounting for heterogeneities across sites and across lineages have led to significant, although insufficient, improvement in phylogenetic accuracy. More recently, one main trend has been to move away from simple parametric models and stepwise approaches, towards integrative models explicitly considering the intricate interplay between multiple levels of macroevolutionary processes. Such integrative models are in their infancy, and their application to the phylogeny of eukaryotes still requires substantial improvement of the underlying models, as well as additional computational developments. PMID:26323768

  4. Summary Statistics for Homemade ?Play Dough? -- Data Acquired at LLNL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kallman, J S; Morales, K E; Whipple, R E

    Using x-ray computerized tomography (CT), we have characterized the x-ray linear attenuation coefficients (LAC) of a homemade Play Dough{trademark}-like material, designated as PDA. Table 1 gives the first-order statistics for each of four CT measurements, estimated with a Gaussian kernel density estimator (KDE) analysis. The mean values of the LAC range from a high of about 2700 LMHU{sub D} 100kVp to a low of about 1200 LMHUD at 300kVp. The standard deviation of each measurement is around 10% to 15% of the mean. The entropy covers the range from 6.0 to 7.4. Ordinarily, we would model the LAC of themore » material and compare the modeled values to the measured values. In this case, however, we did not have the detailed chemical composition of the material and therefore did not model the LAC. Using a method recently proposed by Lawrence Livermore National Laboratory (LLNL), we estimate the value of the effective atomic number, Z{sub eff}, to be near 10. LLNL prepared about 50mL of the homemade 'Play Dough' in a polypropylene vial and firmly compressed it immediately prior to the x-ray measurements. We used the computer program IMGREC to reconstruct the CT images. The values of the key parameters used in the data capture and image reconstruction are given in this report. Additional details may be found in the experimental SOP and a separate document. To characterize the statistical distribution of LAC values in each CT image, we first isolated an 80% central-core segment of volume elements ('voxels') lying completely within the specimen, away from the walls of the polypropylene vial. All of the voxels within this central core, including those comprised of voids and inclusions, are included in the statistics. We then calculated the mean value, standard deviation and entropy for (a) the four image segments and for (b) their digital gradient images. (A digital gradient image of a given image was obtained by taking the absolute value of the difference between the initial image and that same image offset by one voxel horizontally, parallel to the rows of the x-ray detector array.) The statistics of the initial image of LAC values are called 'first order statistics;' those of the gradient image, 'second order statistics.'« less

  5. Impact of a telephonic outreach program on medication adherence in Medicare Advantage Prescription Drug (MAPD) plan beneficiaries.

    PubMed

    Park, Haesuk; Adeyemi, Ayoade; Wang, Wei; Roane, Teresa E

    To determine the impact of a telephone call reminder program provided by a campus-based medication therapy management call center on medication adherence in Medicare Advantage Part D (MAPD) beneficiaries with hypertension. The reminder call services were offered to eligible MAPD beneficiaries, and they included a live interactive conversation with patients to assess the use of their medications. This study used a quasi-experimental design for comparing the change in medication adherence between the intervention and matched control groups. Adherence, defined by proportion of days covered (PDC), was measured using incurred medication claims 6 months before and after the adherence program was implemented. A difference-in-differences approach with propensity score matching was used. After propensity score matching, paired samples included 563 patients in each of the intervention and control groups. The mean PDC (standard deviation) increased significantly during postintervention period by 17.3% (33.6; P <0.001) and 13.8% (32.3; P <0.001) for the intervention and the control groups, respectively; the greater difference-in-differences increase of 3.5% (36.3) in the intervention group over the control group was statistically significant (P = 0.022). A generalized estimating equation model adjusting for covariates further confirmed that the reminder call group had a significant increase in pre-post PDC (P = 0.021), as compared with the control group. Antihypertensive medication adherence increased in both reminder call and control groups, but the increase was significantly higher in the intervention group. A telephonic outreach program was effective in improving antihypertensive medication adherence in MAPD beneficiaries. Copyright © 2017 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  6. The power to detect linkage in complex disease by means of simple LOD-score analyses.

    PubMed Central

    Greenberg, D A; Abreu, P; Hodge, S E

    1998-01-01

    Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328

  7. The power to detect linkage in complex disease by means of simple LOD-score analyses.

    PubMed

    Greenberg, D A; Abreu, P; Hodge, S E

    1998-09-01

    Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.

  8. Increasing the relevance of GCM simulations for Climate Services

    NASA Astrophysics Data System (ADS)

    Smith, L. A.; Suckling, E.

    2012-12-01

    The design and interpretation of model simulations for climate services differ significantly from experimental design for the advancement of the fundamental research on predictability that underpins it. Climate services consider the sources of best information available today; this calls for a frank evaluation of model skill in the face of statistical benchmarks defined by empirical models. The fact that Physical simulation models are thought to provide the only reliable method for extrapolating into conditions not previously observed has no bearing on whether or not today's simulation models outperform empirical models. Evidence on the length scales on which today's simulation models fail to outperform empirical benchmarks is presented; it is illustrated that this occurs even on global scales in decadal prediction. At all timescales considered thus far (as of July 2012), predictions based on simulation models are improved by blending with the output of statistical models. Blending is shown to be more interesting in the climate context than it is in the weather context, where blending with a history-based climatology is straightforward. As GCMs improve and as the Earth's climate moves further from that of the last century, the skill from simulation models and their relevance to climate services is expected to increase. Examples from both seasonal and decadal forecasting will be used to discuss a third approach that may increase the role of current GCMs more quickly. Specifically, aspects of the experimental design in previous hind cast experiments are shown to hinder the use of GCM simulations for climate services. Alternative designs are proposed. The value in revisiting Thompson's classic approach to improving weather forecasting in the fifties in the context of climate services is discussed.

  9. Meta-analysis of Gaussian individual patient data: Two-stage or not two-stage?

    PubMed

    Morris, Tim P; Fisher, David J; Kenward, Michael G; Carpenter, James R

    2018-04-30

    Quantitative evidence synthesis through meta-analysis is central to evidence-based medicine. For well-documented reasons, the meta-analysis of individual patient data is held in higher regard than aggregate data. With access to individual patient data, the analysis is not restricted to a "two-stage" approach (combining estimates and standard errors) but can estimate parameters of interest by fitting a single model to all of the data, a so-called "one-stage" analysis. There has been debate about the merits of one- and two-stage analysis. Arguments for one-stage analysis have typically noted that a wider range of models can be fitted and overall estimates may be more precise. The two-stage side has emphasised that the models that can be fitted in two stages are sufficient to answer the relevant questions, with less scope for mistakes because there are fewer modelling choices to be made in the two-stage approach. For Gaussian data, we consider the statistical arguments for flexibility and precision in small-sample settings. Regarding flexibility, several of the models that can be fitted only in one stage may not be of serious interest to most meta-analysis practitioners. Regarding precision, we consider fixed- and random-effects meta-analysis and see that, for a model making certain assumptions, the number of stages used to fit this model is irrelevant; the precision will be approximately equal. Meta-analysts should choose modelling assumptions carefully. Sometimes relevant models can only be fitted in one stage. Otherwise, meta-analysts are free to use whichever procedure is most convenient to fit the identified model. © 2018 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  10. Climatic Models Ensemble-based Mid-21st Century Runoff Projections: A Bayesian Framework

    NASA Astrophysics Data System (ADS)

    Achieng, K. O.; Zhu, J.

    2017-12-01

    There are a number of North American Regional Climate Change Assessment Program (NARCCAP) climatic models that have been used to project surface runoff in the mid-21st century. Statistical model selection techniques are often used to select the model that best fits data. However, model selection techniques often lead to different conclusions. In this study, ten models are averaged in Bayesian paradigm to project runoff. Bayesian Model Averaging (BMA) is used to project and identify effect of model uncertainty on future runoff projections. Baseflow separation - a two-digital filter which is also called Eckhardt filter - is used to separate USGS streamflow (total runoff) into two components: baseflow and surface runoff. We use this surface runoff as the a priori runoff when conducting BMA of runoff simulated from the ten RCM models. The primary objective of this study is to evaluate how well RCM multi-model ensembles simulate surface runoff, in a Bayesian framework. Specifically, we investigate and discuss the following questions: How well do ten RCM models ensemble jointly simulate surface runoff by averaging over all the models using BMA, given a priori surface runoff? What are the effects of model uncertainty on surface runoff simulation?

  11. Monitoring Method of Cow Anthrax Based on Gis and Spatial Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Li, Lin; Yang, Yong; Wang, Hongbin; Dong, Jing; Zhao, Yujun; He, Jianbin; Fan, Honggang

    Geographic information system (GIS) is a computer application system, which possesses the ability of manipulating spatial information and has been used in many fields related with the spatial information management. Many methods and models have been established for analyzing animal diseases distribution models and temporal-spatial transmission models. Great benefits have been gained from the application of GIS in animal disease epidemiology. GIS is now a very important tool in animal disease epidemiological research. Spatial analysis function of GIS can be widened and strengthened by using spatial statistical analysis, allowing for the deeper exploration, analysis, manipulation and interpretation of spatial pattern and spatial correlation of the animal disease. In this paper, we analyzed the cow anthrax spatial distribution characteristics in the target district A (due to the secret of epidemic data we call it district A) based on the established GIS of the cow anthrax in this district in combination of spatial statistical analysis and GIS. The Cow anthrax is biogeochemical disease, and its geographical distribution is related closely to the environmental factors of habitats and has some spatial characteristics, and therefore the correct analysis of the spatial distribution of anthrax cow for monitoring and the prevention and control of anthrax has a very important role. However, the application of classic statistical methods in some areas is very difficult because of the pastoral nomadic context. The high mobility of livestock and the lack of enough suitable sampling for the some of the difficulties in monitoring currently make it nearly impossible to apply rigorous random sampling methods. It is thus necessary to develop an alternative sampling method, which could overcome the lack of sampling and meet the requirements for randomness. The GIS computer application software ArcGIS9.1 was used to overcome the lack of data of sampling sites.Using ArcGIS 9.1 and GEODA to analyze the cow anthrax spatial distribution of district A. we gained some conclusions about cow anthrax' density: (1) there is a spatial clustering model. (2) there is an intensely spatial autocorrelation. We established a prediction model to estimate the anthrax distribution based on the spatial characteristic of the density of cow anthrax. Comparing with the true distribution, the prediction model has a well coincidence and is feasible to the application. The method using a GIS tool facilitates can be implemented significantly in the cow anthrax monitoring and investigation, and the space statistics - related prediction model provides a fundamental use for other study on space-related animal diseases.

  12. Tobacco Cessation May Improve Lung Cancer Patient Survival.

    PubMed

    Dobson Amato, Katharine A; Hyland, Andrew; Reed, Robert; Mahoney, Martin C; Marshall, James; Giovino, Gary; Bansal-Travers, Maansi; Ochs-Balcom, Heather M; Zevon, Michael A; Cummings, K Michael; Nwogu, Chukwumere; Singh, Anurag K; Chen, Hongbin; Warren, Graham W; Reid, Mary

    2015-07-01

    This study characterizes tobacco cessation patterns and the association of cessation with survival among lung cancer patients at Roswell Park Cancer Institute: an NCI Designated Comprehensive Cancer Center. Lung cancer patients presenting at this institution were screened with a standardized tobacco assessment, and those who had used tobacco within the past 30 days were automatically referred to a telephone-based cessation service. Demographic, clinical information, and self-reported tobacco use at last contact were obtained via electronic medical records and the Roswell Park Cancer Institute tumor registry for all lung cancer patients referred to the service between October 2010 and October 2012. Descriptive statistics and Cox proportional hazards models were used to assess whether tobacco cessation and other factors were associated with lung cancer survival through May 2014. Calls were attempted to 313 of 388 lung cancer patients referred to the cessation service. Eighty percent of patients (250 of 313) were successfully contacted and participated in at least one telephone-based cessation call; 40.8% (102 of 250) of persons contacted reported having quit at the last contact. After controlling for age, pack year history, sex, Eastern Cooperative Oncology Group performance status, time between diagnosis and last contact, tumor histology, and clinical stage, a statistically significant increase in survival was associated with quitting compared with continued tobacco use at last contact (HR = 1.79; 95% confidence interval: 1.14-2.82) with a median 9 month improvement in overall survival. Tobacco cessation among lung cancer patients after diagnosis may increase overall survival.

  13. Tobacco Cessation May Improve Lung Cancer Patient Survival

    PubMed Central

    Dobson Amato, Katharine A.; Hyland, Andrew; Reed, Robert; Mahoney, Martin C.; Marshall, James; Giovino, Gary; Bansal-Travers, Maansi; Ochs-Balcom, Heather M.; Zevon, Michael A.; Cummings, K. Michael; Nwogu, Chukwumere; Singh, Anurag K.; Chen, Hongbin; Warren, Graham W.; Reid, Mary

    2015-01-01

    Introduction This study characterizes tobacco cessation patterns and the association of cessation with survival among lung cancer patients at Roswell Park Cancer Institute: an NCI Designated Comprehensive Cancer Center. Methods Lung cancer patients presenting at this institution were screened with a standardized tobacco assessment, and those who had used tobacco within the past 30 days were automatically referred to a telephone-based cessation service. Demographic, clinical information and self-reported tobacco use at last contact were obtained via electronic medical records and the RPCI tumor registry for all lung cancer patients referred to the service between October 2010 and October 2012. Descriptive statistics and Cox proportional hazards models were used to assess whether tobacco cessation and other factors were associated with lung cancer survival through May 2014. Results Calls were attempted to 313 of 388 lung cancer patients referred to the cessation service. Eighty percent of patients (250/313) were successfully contacted and participated in at least one telephone-based cessation call; 40.8% (102/250) of persons contacted reported having quit at the last contact. After controlling for age, pack year history, sex, ECOG performance status, time between diagnosis and last contact, tumor histology, and clinical stage, a statistically significant increase in survival was associated with quitting compared to continued tobacco use at last contact (HR=1.79; 95% CI: 1.14-2.82) with a median 9 month improvement in overall survival. Conclusions Tobacco cessation among lung cancer patients after diagnosis may increase overall survival. PMID:26102442

  14. Detailed validation of the bidirectional effect in various Case 1 waters for application to Ocean Color imagery

    NASA Astrophysics Data System (ADS)

    Voss, K. J.; Morel, A.; Antoine, D.

    2007-06-01

    The radiance viewed from the ocean depends on the illumination and viewing geometry along with the water properties and this variation is called the bidirectional effect, or BRDF of the water. This BRDF depends on the inherent optical properties of the water, including the volume scattering function, and is important when comparing data from different satellite sensors. The current model by Morel et al. (2002) depends on modeled water parameters, thus must be carefully validated. In this paper we combined upwelling radiance distribution data from several cruises, in varied water types and with a wide range of solar zenith angles. We found that the average error of the model, when compared to the data was less than 1%, while the RMS difference between the model and data was on the order of 0.02-0.03. This is well within the statistical noise of the data, which was on the order of 0.04-0.05, due to environmental noise sources such as wave focusing.

  15. Subcellular localization for Gram positive and Gram negative bacterial proteins using linear interpolation smoothing model.

    PubMed

    Saini, Harsh; Raicar, Gaurav; Dehzangi, Abdollah; Lal, Sunil; Sharma, Alok

    2015-12-07

    Protein subcellular localization is an important topic in proteomics since it is related to a protein׳s overall function, helps in the understanding of metabolic pathways, and in drug design and discovery. In this paper, a basic approximation technique from natural language processing called the linear interpolation smoothing model is applied for predicting protein subcellular localizations. The proposed approach extracts features from syntactical information in protein sequences to build probabilistic profiles using dependency models, which are used in linear interpolation to determine how likely is a sequence to belong to a particular subcellular location. This technique builds a statistical model based on maximum likelihood. It is able to deal effectively with high dimensionality that hinders other traditional classifiers such as Support Vector Machines or k-Nearest Neighbours without sacrificing performance. This approach has been evaluated by predicting subcellular localizations of Gram positive and Gram negative bacterial proteins. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Generating a Multiphase Equation of State with Swarm Intelligence

    NASA Astrophysics Data System (ADS)

    Cox, Geoffrey

    2017-06-01

    Hydrocode calculations require knowledge of the variation of pressure of a material with density and temperature, which is given by the equation of state. An accurate model needs to account for discontinuities in energy, density and properties of a material across a phase boundary. When generating a multiphase equation of state the modeller attempts to balance the agreement between the available data for compression, expansion and phase boundary location. However, this can prove difficult because minor adjustments in the equation of state for a single phase can have a large impact on the overall phase diagram. Recently, Cox and Christie described a method for combining statistical-mechanics-based condensed matter physics models with a stochastic analysis technique called particle swarm optimisation. The models produced show good agreement with experiment over a wide range of pressure-temperature space. This talk details the general implementation of this technique, shows example results, and describes the types of analysis that can be performed with this method.

  17. The Building Game: From Enumerative Combinatorics to Conformational Diffusion

    NASA Astrophysics Data System (ADS)

    Johnson-Chyzhykov, Daniel; Menon, Govind

    2016-08-01

    We study a discrete attachment model for the self-assembly of polyhedra called the building game. We investigate two distinct aspects of the model: (i) enumerative combinatorics of the intermediate states and (ii) a notion of Brownian motion for the polyhedral linkage defined by each intermediate that we term conformational diffusion. The combinatorial configuration space of the model is computed for the Platonic, Archimedean, and Catalan solids of up to 30 faces, and several novel enumerative results are generated. These represent the most exhaustive computations of this nature to date. We further extend the building game to include geometric information. The combinatorial structure of each intermediate yields a systems of constraints specifying a polyhedral linkage and its moduli space. We use a random walk to simulate a reflected Brownian motion in each moduli space. Empirical statistics of the random walk may be used to define the rates of transition for a Markov process modeling the process of self-assembly.

  18. Search for a Dark Photon in e + e - Collisions at BaBar

    DOE PAGES

    Lees, J. P.; Poireau, V.; Tisserand, V.; ...

    2014-11-10

    Dark sectors charged under a new Abelian interaction have recently received much attention in the context of dark matter models. These models introduce a light new mediator, the so-called dark photon (A'), connecting the dark sector to the standard model. We present a search for a dark photon in the reaction e +e -→γA', A'→e +e -, μ +μ - using 514 fb -1 of data collected with the BABAR detector. We observe no statistically significant deviations from the standard model predictions, and we set 90% confidence level upper limits on the mixing strength between the photon and dark photonmore » at the level of10 -4-10 -3 for dark photon masses in the range 0.02–10.2 GeV We further constrain the range of the parameter space favored by interpretations of the discrepancy between the calculated and measured anomalous magnetic moment of the muon.« less

  19. Consistent Partial Least Squares Path Modeling via Regularization

    PubMed Central

    Jung, Sunho; Park, JaeHong

    2018-01-01

    Partial least squares (PLS) path modeling is a component-based structural equation modeling that has been adopted in social and psychological research due to its data-analytic capability and flexibility. A recent methodological advance is consistent PLS (PLSc), designed to produce consistent estimates of path coefficients in structural models involving common factors. In practice, however, PLSc may frequently encounter multicollinearity in part because it takes a strategy of estimating path coefficients based on consistent correlations among independent latent variables. PLSc has yet no remedy for this multicollinearity problem, which can cause loss of statistical power and accuracy in parameter estimation. Thus, a ridge type of regularization is incorporated into PLSc, creating a new technique called regularized PLSc. A comprehensive simulation study is conducted to evaluate the performance of regularized PLSc as compared to its non-regularized counterpart in terms of power and accuracy. The results show that our regularized PLSc is recommended for use when serious multicollinearity is present. PMID:29515491

  20. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data

    PubMed Central

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2017-01-01

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1–10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. PMID:29196497

Top