Sample records for simple statistical approach

  1. New approach in the quantum statistical parton distribution

    NASA Astrophysics Data System (ADS)

    Sohaily, Sozha; Vaziri (Khamedi), Mohammad

    2017-12-01

    An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.

  2. A Critique of One-Tailed Hypothesis Test Procedures in Business and Economics Statistics Textbooks.

    ERIC Educational Resources Information Center

    Liu, Tung; Stone, Courtenay C.

    1999-01-01

    Surveys introductory business and economics statistics textbooks and finds that they differ over the best way to explain one-tailed hypothesis tests: the simple null-hypothesis approach or the composite null-hypothesis approach. Argues that the composite null-hypothesis approach contains methodological shortcomings that make it more difficult for…

  3. Calibration of Response Data Using MIRT Models with Simple and Mixed Structures

    ERIC Educational Resources Information Center

    Zhang, Jinming

    2012-01-01

    It is common to assume during a statistical analysis of a multiscale assessment that the assessment is composed of several unidimensional subtests or that it has simple structure. Under this assumption, the unidimensional and multidimensional approaches can be used to estimate item parameters. These two approaches are equivalent in parameter…

  4. Flipped Statistics Class Results: Better Performance than Lecture over One Year Later

    ERIC Educational Resources Information Center

    Winquist, Jennifer R.; Carlson, Keith A.

    2014-01-01

    In this paper, we compare an introductory statistics course taught using a flipped classroom approach to the same course taught using a traditional lecture based approach. In the lecture course, students listened to lecture, took notes, and completed homework assignments. In the flipped course, students read relatively simple chapters and answered…

  5. Are Statisticians Cold-Blooded Bosses? A New Perspective on the "Old" Concept of Statistical Population

    ERIC Educational Resources Information Center

    Lu, Yonggang; Henning, Kevin S. S.

    2013-01-01

    Spurred by recent writings regarding statistical pragmatism, we propose a simple, practical approach to introducing students to a new style of statistical thinking that models nature through the lens of data-generating processes, not populations. (Contains 5 figures.)

  6. Statistics Using Just One Formula

    ERIC Educational Resources Information Center

    Rosenthal, Jeffrey S.

    2018-01-01

    This article advocates that introductory statistics be taught by basing all calculations on a single simple margin-of-error formula and deriving all of the standard introductory statistical concepts (confidence intervals, significance tests, comparisons of means and proportions, etc) from that one formula. It is argued that this approach will…

  7. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing

    PubMed Central

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-01-01

    Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393

  8. Entropy Is Simple, Qualitatively.

    ERIC Educational Resources Information Center

    Lambert, Frank L.

    2002-01-01

    Suggests that qualitatively, entropy is simple. Entropy increase from a macro viewpoint is a measure of the dispersal of energy from localized to spread out at a temperature T. Fundamentally based on statistical and quantum mechanics, this approach is superior to the non-fundamental "disorder" as a descriptor of entropy change. (MM)

  9. An Experimental Approach to Teaching and Learning Elementary Statistical Mechanics

    ERIC Educational Resources Information Center

    Ellis, Frank B.; Ellis, David C.

    2008-01-01

    Introductory statistical mechanics is studied for a simple two-state system using an inexpensive and easily built apparatus. A large variety of demonstrations, suitable for students in high school and introductory university chemistry courses, are possible. This article details demonstrations for exothermic and endothermic reactions, the dynamic…

  10. Application of Transformations in Parametric Inference

    ERIC Educational Resources Information Center

    Brownstein, Naomi; Pensky, Marianna

    2008-01-01

    The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…

  11. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

    PubMed

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-02-01

    A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  12. A Complementary Note to 'A Lag-1 Smoother Approach to System-Error Estimation': The Intrinsic Limitations of Residual Diagnostics

    NASA Technical Reports Server (NTRS)

    Todling, Ricardo

    2015-01-01

    Recently, this author studied an approach to the estimation of system error based on combining observation residuals derived from a sequential filter and fixed lag-1 smoother. While extending the methodology to a variational formulation, experimenting with simple models and making sure consistency was found between the sequential and variational formulations, the limitations of the residual-based approach came clearly to the surface. This note uses the sequential assimilation application to simple nonlinear dynamics to highlight the issue. Only when some of the underlying error statistics are assumed known is it possible to estimate the unknown component. In general, when considerable uncertainties exist in the underlying statistics as a whole, attempts to obtain separate estimates of the various error covariances are bound to lead to misrepresentation of errors. The conclusions are particularly relevant to present-day attempts to estimate observation-error correlations from observation residual statistics. A brief illustration of the issue is also provided by comparing estimates of error correlations derived from a quasi-operational assimilation system and a corresponding Observing System Simulation Experiments framework.

  13. A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach

    PubMed Central

    Kuan, Pei Fen; Huang, Bo

    2013-01-01

    This paper focuses on statistical analyses in scenarios where some samples from the matched pairs design are missing, resulting in partially matched samples. Motivated by the idea of meta-analysis, we recast the partially matched samples as coming from two experimental designs, and propose a simple yet robust approach based on the weighted Z-test to integrate the p-values computed from these two designs. We show that the proposed approach achieves better operating characteristics in simulations and a case study, compared to existing methods for partially matched samples. PMID:23417968

  14. Reversed inverse regression for the univariate linear calibration and its statistical properties derived using a new methodology

    NASA Astrophysics Data System (ADS)

    Kang, Pilsang; Koo, Changhoi; Roh, Hokyu

    2017-11-01

    Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.

  15. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  16. Self-organization of cosmic radiation pressure instability. II - One-dimensional simulations

    NASA Technical Reports Server (NTRS)

    Hogan, Craig J.; Woods, Jorden

    1992-01-01

    The clustering of statistically uniform discrete absorbing particles moving solely under the influence of radiation pressure from uniformly distributed emitters is studied in a simple one-dimensional model. Radiation pressure tends to amplify statistical clustering in the absorbers; the absorbing material is swept into empty bubbles, the biggest bubbles grow bigger almost as they would in a uniform medium, and the smaller ones get crushed and disappear. Numerical simulations of a one-dimensional system are used to support the conjecture that the system is self-organizing. Simple statistics indicate that a wide range of initial conditions produce structure approaching the same self-similar statistical distribution, whose scaling properties follow those of the attractor solution for an isolated bubble. The importance of the process for large-scale structuring of the interstellar medium is briefly discussed.

  17. Dressed tunneling approximation for electronic transport through molecular transistors

    NASA Astrophysics Data System (ADS)

    Seoane Souto, R.; Yeyati, A. Levy; Martín-Rodero, A.; Monreal, R. C.

    2014-02-01

    A theoretical approach for the nonequilibrium transport properties of nanoscale systems coupled to metallic electrodes with strong electron-phonon interactions is presented. It consists of a resummation of the dominant Feynman diagrams from the perturbative expansion in the coupling to the leads. We show that this scheme eliminates the main pathologies found in previous simple analytical approaches for the polaronic regime. The results for the spectral and transport properties are compared with those from several other approaches for a wide range of parameters. The method can be formulated in a simple way to obtain the full counting statistics. Results for the shot and thermal noise are presented.

  18. NAUSEA and the Principle of Supplementarity of Damping and Isolation in Noise Control.

    DTIC Science & Technology

    1980-02-01

    New approaches and uses of the statistical energy analysis (NAUSEA) have been considered and developed in recent months. The advances were made...possible in that the requirement, in the olde statistical energy analysis , that the dynamic systems be highly reverberant and the couplings between the...analytical consideration in terms of the statistical energy analysis (SEA). A brief discussion and simple examples that relate to these recent advances

  19. A Robust Outlier Approach to Prevent Type I Error Inflation in Differential Item Functioning

    ERIC Educational Resources Information Center

    Magis, David; De Boeck, Paul

    2012-01-01

    The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxies for the ability trait level. One of the most popular approaches, the Mantel-Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source…

  20. Improving the Crossing-SIBTEST Statistic for Detecting Non-uniform DIF.

    PubMed

    Chalmers, R Philip

    2018-06-01

    This paper demonstrates that, after applying a simple modification to Li and Stout's (Psychometrika 61(4):647-677, 1996) CSIBTEST statistic, an improved variant of the statistic could be realized. It is shown that this modified version of CSIBTEST has a more direct association with the SIBTEST statistic presented by Shealy and Stout (Psychometrika 58(2):159-194, 1993). In particular, the asymptotic sampling distributions and general interpretation of the effect size estimates are the same for SIBTEST and the new CSIBTEST. Given the more natural connection to SIBTEST, it is shown that Li and Stout's hypothesis testing approach is insufficient for CSIBTEST; thus, an improved hypothesis testing procedure is required. Based on the presented arguments, a new chi-squared-based hypothesis testing approach is proposed for the modified CSIBTEST statistic. Positive results from a modest Monte Carlo simulation study strongly suggest the original CSIBTEST procedure and randomization hypothesis testing approach should be replaced by the modified statistic and hypothesis testing method.

  1. Statistical primer: how to deal with missing data in scientific research?

    PubMed

    Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

    2018-05-10

    Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.

  2. Variational Approach in the Theory of Liquid-Crystal State

    NASA Astrophysics Data System (ADS)

    Gevorkyan, E. V.

    2018-03-01

    The variational calculus by Leonhard Euler is the basis for modern mathematics and theoretical physics. The efficiency of variational approach in statistical theory of liquid-crystal state and in general case in condensed state theory is shown. The developed approach in particular allows us to introduce correctly effective pair interactions and optimize the simple models of liquid crystals with help of realistic intermolecular potentials.

  3. Structure-guided statistical textural distinctiveness for salient region detection in natural images.

    PubMed

    Scharfenberger, Christian; Wong, Alexander; Clausi, David A

    2015-01-01

    We propose a simple yet effective structure-guided statistical textural distinctiveness approach to salient region detection. Our method uses a multilayer approach to analyze the structural and textural characteristics of natural images as important features for salient region detection from a scale point of view. To represent the structural characteristics, we abstract the image using structured image elements and extract rotational-invariant neighborhood-based textural representations to characterize each element by an individual texture pattern. We then learn a set of representative texture atoms for sparse texture modeling and construct a statistical textural distinctiveness matrix to determine the distinctiveness between all representative texture atom pairs in each layer. Finally, we determine saliency maps for each layer based on the occurrence probability of the texture atoms and their respective statistical textural distinctiveness and fuse them to compute a final saliency map. Experimental results using four public data sets and a variety of performance evaluation metrics show that our approach provides promising results when compared with existing salient region detection approaches.

  4. Respiratory Artefact Removal in Forced Oscillation Measurements: A Machine Learning Approach.

    PubMed

    Pham, Thuy T; Thamrin, Cindy; Robinson, Paul D; McEwan, Alistair L; Leong, Philip H W

    2017-08-01

    Respiratory artefact removal for the forced oscillation technique can be treated as an anomaly detection problem. Manual removal is currently considered the gold standard, but this approach is laborious and subjective. Most existing automated techniques used simple statistics and/or rejected anomalous data points. Unfortunately, simple statistics are insensitive to numerous artefacts, leading to low reproducibility of results. Furthermore, rejecting anomalous data points causes an imbalance between the inspiratory and expiratory contributions. From a machine learning perspective, such methods are unsupervised and can be considered simple feature extraction. We hypothesize that supervised techniques can be used to find improved features that are more discriminative and more highly correlated with the desired output. Features thus found are then used for anomaly detection by applying quartile thresholding, which rejects complete breaths if one of its features is out of range. The thresholds are determined by both saliency and performance metrics rather than qualitative assumptions as in previous works. Feature ranking indicates that our new landmark features are among the highest scoring candidates regardless of age across saliency criteria. F1-scores, receiver operating characteristic, and variability of the mean resistance metrics show that the proposed scheme outperforms previous simple feature extraction approaches. Our subject-independent detector, 1IQR-SU, demonstrated approval rates of 80.6% for adults and 98% for children, higher than existing methods. Our new features are more relevant. Our removal is objective and comparable to the manual method. This is a critical work to automate forced oscillation technique quality control.

  5. Masquerade Detection Using a Taxonomy-Based Multinomial Modeling Approach in UNIX Systems

    DTIC Science & Technology

    2008-08-25

    primarily the modeling of statistical features , such as the frequency of events, the duration of events, the co- occurrence of multiple events...are identified, we can extract features representing such behavior while auditing the user’s behavior. Figure1: Taxonomy of Linux and Unix...achieved when the features are extracted just from simple commands. Method Hit Rate False Positive Rate ocSVM using simple cmds (freq.-based

  6. Statistical mechanics of simple models of protein folding and design.

    PubMed Central

    Pande, V S; Grosberg, A Y; Tanaka, T

    1997-01-01

    It is now believed that the primary equilibrium aspects of simple models of protein folding are understood theoretically. However, current theories often resort to rather heavy mathematics to overcome some technical difficulties inherent in the problem or start from a phenomenological model. To this end, we take a new approach in this pedagogical review of the statistical mechanics of protein folding. The benefit of our approach is a drastic mathematical simplification of the theory, without resort to any new approximations or phenomenological prescriptions. Indeed, the results we obtain agree precisely with previous calculations. Because of this simplification, we are able to present here a thorough and self contained treatment of the problem. Topics discussed include the statistical mechanics of the random energy model (REM), tests of the validity of REM as a model for heteropolymer freezing, freezing transition of random sequences, phase diagram of designed ("minimally frustrated") sequences, and the degree to which errors in the interactions employed in simulations of either folding and design can still lead to correct folding behavior. Images FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 6 PMID:9414231

  7. A Statistical Ontology-Based Approach to Ranking for Multiword Search

    ERIC Educational Resources Information Center

    Kim, Jinwoo

    2013-01-01

    Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships…

  8. Comparison of Unidimensional and Multidimensional Approaches to IRT Parameter Estimation. Research Report. ETS RR-04-44

    ERIC Educational Resources Information Center

    Zhang, Jinming

    2004-01-01

    It is common to assume during statistical analysis of a multiscale assessment that the assessment has simple structure or that it is composed of several unidimensional subtests. Under this assumption, both the unidimensional and multidimensional approaches can be used to estimate item parameters. This paper theoretically demonstrates that these…

  9. A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays.

    PubMed

    McLachlan, G J; Bean, R W; Jones, L Ben-Tovim

    2006-07-01

    An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The usefulness of our approach is demonstrated on three real datasets.

  10. IT Control Deficiencies That Affect the Financial Reporting of Companies since the Enactment of the Sarbanes Oxley Act

    ERIC Educational Resources Information Center

    Harper, Roosevelt

    2014-01-01

    This research study examined the specific categories of IT control deficiencies and their related effects on financial reporting. The approach to this study was considered non-experimental, an approach sometimes called descriptive. Descriptive statistics are used to describe the basic features of the data in a study, providing simple summaries…

  11. On the (In)Validity of Tests of Simple Mediation: Threats and Solutions

    PubMed Central

    Pek, Jolynn; Hoyle, Rick H.

    2015-01-01

    Mediation analysis is a popular framework for identifying underlying mechanisms in social psychology. In the context of simple mediation, we review and discuss the implications of three facets of mediation analysis: (a) conceptualization of the relations between the variables, (b) statistical approaches, and (c) relevant elements of design. We also highlight the issue of equivalent models that are inherent in simple mediation. The extent to which results are meaningful stem directly from choices regarding these three facets of mediation analysis. We conclude by discussing how mediation analysis can be better applied to examine causal processes, highlight the limits of simple mediation, and make recommendations for better practice. PMID:26985234

  12. Information Entropy Production of Maximum Entropy Markov Chains from Spike Trains

    NASA Astrophysics Data System (ADS)

    Cofré, Rodrigo; Maldonado, Cesar

    2018-01-01

    We consider the maximum entropy Markov chain inference approach to characterize the collective statistics of neuronal spike trains, focusing on the statistical properties of the inferred model. We review large deviations techniques useful in this context to describe properties of accuracy and convergence in terms of sampling size. We use these results to study the statistical fluctuation of correlations, distinguishability and irreversibility of maximum entropy Markov chains. We illustrate these applications using simple examples where the large deviation rate function is explicitly obtained for maximum entropy models of relevance in this field.

  13. Statistical Emulation of Climate Model Projections Based on Precomputed GCM Runs*

    DOE PAGES

    Castruccio, Stefano; McInerney, David J.; Stein, Michael L.; ...

    2014-02-24

    The authors describe a new approach for emulating the output of a fully coupled climate model under arbitrary forcing scenarios that is based on a small set of precomputed runs from the model. Temperature and precipitation are expressed as simple functions of the past trajectory of atmospheric CO 2 concentrations, and a statistical model is fit using a limited set of training runs. The approach is demonstrated to be a useful and computationally efficient alternative to pattern scaling and captures the nonlinear evolution of spatial patterns of climate anomalies inherent in transient climates. The approach does as well as patternmore » scaling in all circumstances and substantially better in many; it is not computationally demanding; and, once the statistical model is fit, it produces emulated climate output effectively instantaneously. In conclusion, it may therefore find wide application in climate impacts assessments and other policy analyses requiring rapid climate projections.« less

  14. Using ontology network structure in text mining.

    PubMed

    Berndt, Donald J; McCart, James A; Luther, Stephen L

    2010-11-13

    Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

  15. A Geometrical Approach to Bell's Theorem

    NASA Technical Reports Server (NTRS)

    Rubincam, David Parry

    2000-01-01

    Bell's theorem can be proved through simple geometrical reasoning, without the need for the Psi function, probability distributions, or calculus. The proof is based on N. David Mermin's explication of the Einstein-Podolsky-Rosen-Bohm experiment, which involves Stern-Gerlach detectors which flash red or green lights when detecting spin-up or spin-down. The statistics of local hidden variable theories for this experiment can be arranged in colored strips from which simple inequalities can be deduced. These inequalities lead to a demonstration of Bell's theorem. Moreover, all local hidden variable theories can be graphed in such a way as to enclose their statistics in a pyramid, with the quantum-mechanical result lying a finite distance beneath the base of the pyramid.

  16. Renormalization Group Tutorial

    NASA Technical Reports Server (NTRS)

    Bell, Thomas L.

    2004-01-01

    Complex physical systems sometimes have statistical behavior characterized by power- law dependence on the parameters of the system and spatial variability with no particular characteristic scale as the parameters approach critical values. The renormalization group (RG) approach was developed in the fields of statistical mechanics and quantum field theory to derive quantitative predictions of such behavior in cases where conventional methods of analysis fail. Techniques based on these ideas have since been extended to treat problems in many different fields, and in particular, the behavior of turbulent fluids. This lecture will describe a relatively simple but nontrivial example of the RG approach applied to the diffusion of photons out of a stellar medium when the photons have wavelengths near that of an emission line of atoms in the medium.

  17. Additivity of nonlinear biomass equations

    Treesearch

    Bernard R. Parresol

    2001-01-01

    Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...

  18. Mechanics and statistics of the worm-like chain

    NASA Astrophysics Data System (ADS)

    Marantan, Andrew; Mahadevan, L.

    2018-02-01

    The worm-like chain model is a simple continuum model for the statistical mechanics of a flexible polymer subject to an external force. We offer a tutorial introduction to it using three approaches. First, we use a mesoscopic view, treating a long polymer (in two dimensions) as though it were made of many groups of correlated links or "clinks," allowing us to calculate its average extension as a function of the external force via scaling arguments. We then provide a standard statistical mechanics approach, obtaining the average extension by two different means: the equipartition theorem and the partition function. Finally, we work in a probabilistic framework, taking advantage of the Gaussian properties of the chain in the large-force limit to improve upon the previous calculations of the average extension.

  19. Empirical likelihood-based tests for stochastic ordering

    PubMed Central

    BARMI, HAMMOU EL; MCKEAGUE, IAN W.

    2013-01-01

    This paper develops an empirical likelihood approach to testing for the presence of stochastic ordering among univariate distributions based on independent random samples from each distribution. The proposed test statistic is formed by integrating a localized empirical likelihood statistic with respect to the empirical distribution of the pooled sample. The asymptotic null distribution of this test statistic is found to have a simple distribution-free representation in terms of standard Brownian bridge processes. The approach is used to compare the lengths of rule of Roman Emperors over various historical periods, including the “decline and fall” phase of the empire. In a simulation study, the power of the proposed test is found to improve substantially upon that of a competing test due to El Barmi and Mukerjee. PMID:23874142

  20. Prediction of the dollar to the ruble rate. A system-theoretic approach

    NASA Astrophysics Data System (ADS)

    Borodachev, Sergey M.

    2017-07-01

    Proposed a simple state-space model of dollar rate formation based on changes in oil prices and some mechanisms of money transfer between monetary and stock markets. Comparison of predictions by means of input-output model and state-space model is made. It concludes that with proper use of statistical data (Kalman filter) the second approach provides more adequate predictions of the dollar rate.

  1. Statistical power for nonequivalent pretest-posttest designs. The impact of change-score versus ANCOVA models.

    PubMed

    Oakes, J M; Feldman, H A

    2001-02-01

    Nonequivalent controlled pretest-posttest designs are central to evaluation science, yet no practical and unified approach for estimating power in the two most widely used analytic approaches to these designs exists. This article fills the gap by presenting and comparing useful, unified power formulas for ANCOVA and change-score analyses, indicating the implications of each on sample-size requirements. The authors close with practical recommendations for evaluators. Mathematical details and a simple spreadsheet approach are included in appendices.

  2. A Deterministic Annealing Approach to Clustering AIRS Data

    NASA Technical Reports Server (NTRS)

    Guillaume, Alexandre; Braverman, Amy; Ruzmaikin, Alexander

    2012-01-01

    We will examine the validity of means and standard deviations as a basis for climate data products. We will explore the conditions under which these two simple statistics are inadequate summaries of the underlying empirical probability distributions by contrasting them with a nonparametric, method called Deterministic Annealing technique

  3. Sample Size Estimation: The Easy Way

    ERIC Educational Resources Information Center

    Weller, Susan C.

    2015-01-01

    This article presents a simple approach to making quick sample size estimates for basic hypothesis tests. Although there are many sources available for estimating sample sizes, methods are not often integrated across statistical tests, levels of measurement of variables, or effect sizes. A few parameters are required to estimate sample sizes and…

  4. Chaotic oscillations and noise transformations in a simple dissipative system with delayed feedback

    NASA Astrophysics Data System (ADS)

    Zverev, V. V.; Rubinstein, B. Ya.

    1991-04-01

    We analyze the statistical behavior of signals in nonlinear circuits with delayed feedback in the presence of external Markovian noise. For the special class of circuits with intense phase mixing we develop an approach for the computation of the probability distributions and multitime correlation functions based on the random phase approximation. Both Gaussian and Kubo-Andersen models of external noise statistics are analyzed and the existence of the stationary (asymptotic) random process in the long-time limit is shown. We demonstrate that a nonlinear system with chaotic behavior becomes a noise amplifier with specific statistical transformation properties.

  5. On the Stability of Jump-Linear Systems Driven by Finite-State Machines with Markovian Inputs

    NASA Technical Reports Server (NTRS)

    Patilkulkarni, Sudarshan; Herencia-Zapana, Heber; Gray, W. Steven; Gonzalez, Oscar R.

    2004-01-01

    This paper presents two mean-square stability tests for a jump-linear system driven by a finite-state machine with a first-order Markovian input process. The first test is based on conventional Markov jump-linear theory and avoids the use of any higher-order statistics. The second test is developed directly using the higher-order statistics of the machine s output process. The two approaches are illustrated with a simple model for a recoverable computer control system.

  6. Of pacemakers and statistics: the actuarial method extended.

    PubMed

    Dussel, J; Wolbarst, A B; Scott-Millar, R N; Obel, I W

    1980-01-01

    Pacemakers cease functioning because of either natural battery exhaustion (nbe) or component failure (cf). A study of four series of pacemakers shows that a simple extension of the actuarial method, so as to incorporate Normal statistics, makes possible a quantitative differentiation between the two modes of failure. This involves the separation of the overall failure probability density function PDF(t) into constituent parts pdfnbe(t) and pdfcf(t). The approach should allow a meaningful comparison of the characteristics of different pacemaker types.

  7. Vehicle track segmentation using higher order random fields

    DOE PAGES

    Quach, Tu -Thach

    2017-01-09

    Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less

  8. Biosignature Discovery for Substance Use Disorders Using Statistical Learning.

    PubMed

    Baurley, James W; McMahan, Christopher S; Ervin, Carolyn M; Pardamean, Bens; Bergen, Andrew W

    2018-02-01

    There are limited biomarkers for substance use disorders (SUDs). Traditional statistical approaches are identifying simple biomarkers in large samples, but clinical use cases are still being established. High-throughput clinical, imaging, and 'omic' technologies are generating data from SUD studies and may lead to more sophisticated and clinically useful models. However, analytic strategies suited for high-dimensional data are not regularly used. We review strategies for identifying biomarkers and biosignatures from high-dimensional data types. Focusing on penalized regression and Bayesian approaches, we address how to leverage evidence from existing studies and knowledge bases, using nicotine metabolism as an example. We posit that big data and machine learning approaches will considerably advance SUD biomarker discovery. However, translation to clinical practice, will require integrated scientific efforts. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Vehicle track segmentation using higher order random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quach, Tu -Thach

    Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less

  10. A Diagrammatic Exposition of Regression and Instrumental Variables for the Beginning Student

    ERIC Educational Resources Information Center

    Foster, Gigi

    2009-01-01

    Some beginning students of statistics and econometrics have difficulty with traditional algebraic approaches to explaining regression and related techniques. For these students, a simple and intuitive diagrammatic introduction as advocated by Kennedy (2008) may prove a useful framework to support further study. The author presents a series of…

  11. Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method

    ERIC Educational Resources Information Center

    Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev

    2018-01-01

    The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…

  12. Estimating annual bole biomass production using uncertainty analysis

    Treesearch

    Travis J. Woolley; Mark E. Harmon; Kari B. O' Connell

    2007-01-01

    Two common sampling methodologies coupled with a simple statistical model were evaluated to determine the accuracy and precision of annual bole biomass production (BBP) and inter-annual variability estimates using this type of approach. We performed an uncertainty analysis using Monte Carlo methods in conjunction with radial growth core data from trees in three Douglas...

  13. Simulating Metabolism with Statistical Thermodynamics

    PubMed Central

    Cannon, William R.

    2014-01-01

    New methods are needed for large scale modeling of metabolism that predict metabolite levels and characterize the thermodynamics of individual reactions and pathways. Current approaches use either kinetic simulations, which are difficult to extend to large networks of reactions because of the need for rate constants, or flux-based methods, which have a large number of feasible solutions because they are unconstrained by the law of mass action. This report presents an alternative modeling approach based on statistical thermodynamics. The principles of this approach are demonstrated using a simple set of coupled reactions, and then the system is characterized with respect to the changes in energy, entropy, free energy, and entropy production. Finally, the physical and biochemical insights that this approach can provide for metabolism are demonstrated by application to the tricarboxylic acid (TCA) cycle of Escherichia coli. The reaction and pathway thermodynamics are evaluated and predictions are made regarding changes in concentration of TCA cycle intermediates due to 10- and 100-fold changes in the ratio of NAD+:NADH concentrations. Finally, the assumptions and caveats regarding the use of statistical thermodynamics to model non-equilibrium reactions are discussed. PMID:25089525

  14. Simulating metabolism with statistical thermodynamics.

    PubMed

    Cannon, William R

    2014-01-01

    New methods are needed for large scale modeling of metabolism that predict metabolite levels and characterize the thermodynamics of individual reactions and pathways. Current approaches use either kinetic simulations, which are difficult to extend to large networks of reactions because of the need for rate constants, or flux-based methods, which have a large number of feasible solutions because they are unconstrained by the law of mass action. This report presents an alternative modeling approach based on statistical thermodynamics. The principles of this approach are demonstrated using a simple set of coupled reactions, and then the system is characterized with respect to the changes in energy, entropy, free energy, and entropy production. Finally, the physical and biochemical insights that this approach can provide for metabolism are demonstrated by application to the tricarboxylic acid (TCA) cycle of Escherichia coli. The reaction and pathway thermodynamics are evaluated and predictions are made regarding changes in concentration of TCA cycle intermediates due to 10- and 100-fold changes in the ratio of NAD+:NADH concentrations. Finally, the assumptions and caveats regarding the use of statistical thermodynamics to model non-equilibrium reactions are discussed.

  15. Conditional maximum-entropy method for selecting prior distributions in Bayesian statistics

    NASA Astrophysics Data System (ADS)

    Abe, Sumiyoshi

    2014-11-01

    The conditional maximum-entropy method (abbreviated here as C-MaxEnt) is formulated for selecting prior probability distributions in Bayesian statistics for parameter estimation. This method is inspired by a statistical-mechanical approach to systems governed by dynamics with largely separated time scales and is based on three key concepts: conjugate pairs of variables, dimensionless integration measures with coarse-graining factors and partial maximization of the joint entropy. The method enables one to calculate a prior purely from a likelihood in a simple way. It is shown, in particular, how it not only yields Jeffreys's rules but also reveals new structures hidden behind them.

  16. On a simple molecular–statistical model of a liquid-crystal suspension of anisometric particles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zakhlevnykh, A. N., E-mail: anz@psu.ru; Lubnin, M. S.; Petrov, D. A.

    2016-11-15

    A molecular–statistical mean-field theory is constructed for suspensions of anisometric particles in nematic liquid crystals (NLCs). The spherical approximation, well known in the physics of ferromagnetic materials, is considered that allows one to obtain an analytic expression for the free energy and simple equations for the orientational state of a suspension that describe the temperature dependence of the order parameters of the suspension components. The transition temperature from ordered to isotropic state and the jumps in the order parameters at the phase-transition point are studied as a function of the anchoring energy of dispersed particles to the matrix, the concentrationmore » of the impurity phase, and the size of particles. The proposed approach allows one to generalize the model to the case of biaxial ordering.« less

  17. A common base method for analysis of qPCR data and the application of simple blocking in qPCR experiments.

    PubMed

    Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J

    2017-12-01

    qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.

  18. A computational visual saliency model based on statistics and machine learning.

    PubMed

    Lin, Ru-Je; Lin, Wei-Song

    2014-08-01

    Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.

  19. DETECTORS AND EXPERIMENTAL METHODS: Heuristic approach for peak regions estimation in gamma-ray spectra measured by a NaI detector

    NASA Astrophysics Data System (ADS)

    Zhu, Meng-Hua; Liu, Liang-Gang; You, Zhong; Xu, Ao-Ao

    2009-03-01

    In this paper, a heuristic approach based on Slavic's peak searching method has been employed to estimate the width of peak regions for background removing. Synthetic and experimental data are used to test this method. With the estimated peak regions using the proposed method in the whole spectrum, we find it is simple and effective enough to be used together with the Statistics-sensitive Nonlinear Iterative Peak-Clipping method.

  20. Impact resistance of fiber composites - Energy-absorbing mechanisms and environmental effects

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Sinclair, J. H.

    1985-01-01

    Energy absorbing mechanisms were identified by several approaches. The energy absorbing mechanisms considered are those in unidirectional composite beams subjected to impact. The approaches used include: mechanic models, statistical models, transient finite element analysis, and simple beam theory. Predicted results are correlated with experimental data from Charpy impact tests. The environmental effects on impact resistance are evaluated. Working definitions for energy absorbing and energy releasing mechanisms are proposed and a dynamic fracture progression is outlined. Possible generalizations to angle-plied laminates are described.

  1. Impact resistance of fiber composites: Energy absorbing mechanisms and environmental effects

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Sinclair, J. H.

    1983-01-01

    Energy absorbing mechanisms were identified by several approaches. The energy absorbing mechanisms considered are those in unidirectional composite beams subjected to impact. The approaches used include: mechanic models, statistical models, transient finite element analysis, and simple beam theory. Predicted results are correlated with experimental data from Charpy impact tests. The environmental effects on impact resistance are evaluated. Working definitions for energy absorbing and energy releasing mechanisms are proposed and a dynamic fracture progression is outlined. Possible generalizations to angle-plied laminates are described.

  2. Variation in reaction norms: Statistical considerations and biological interpretation.

    PubMed

    Morrissey, Michael B; Liefting, Maartje

    2016-09-01

    Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  3. Conformity and statistical tolerancing

    NASA Astrophysics Data System (ADS)

    Leblond, Laurent; Pillet, Maurice

    2018-02-01

    Statistical tolerancing was first proposed by Shewhart (Economic Control of Quality of Manufactured Product, (1931) reprinted 1980 by ASQC), in spite of this long history, its use remains moderate. One of the probable reasons for this low utilization is undoubtedly the difficulty for designers to anticipate the risks of this approach. The arithmetic tolerance (worst case) allows a simple interpretation: conformity is defined by the presence of the characteristic in an interval. Statistical tolerancing is more complex in its definition. An interval is not sufficient to define the conformance. To justify the statistical tolerancing formula used by designers, a tolerance interval should be interpreted as the interval where most of the parts produced should probably be located. This tolerance is justified by considering a conformity criterion of the parts guaranteeing low offsets on the latter characteristics. Unlike traditional arithmetic tolerancing, statistical tolerancing requires a sustained exchange of information between design and manufacture to be used safely. This paper proposes a formal definition of the conformity, which we apply successively to the quadratic and arithmetic tolerancing. We introduce a concept of concavity, which helps us to demonstrate the link between tolerancing approach and conformity. We use this concept to demonstrate the various acceptable propositions of statistical tolerancing (in the space decentring, dispersion).

  4. Universality classes of fluctuation dynamics in hierarchical complex systems

    NASA Astrophysics Data System (ADS)

    Macêdo, A. M. S.; González, Iván R. Roa; Salazar, D. S. P.; Vasconcelos, G. L.

    2017-03-01

    A unified approach is proposed to describe the statistics of the short-time dynamics of multiscale complex systems. The probability density function of the relevant time series (signal) is represented as a statistical superposition of a large time-scale distribution weighted by the distribution of certain internal variables that characterize the slowly changing background. The dynamics of the background is formulated as a hierarchical stochastic model whose form is derived from simple physical constraints, which in turn restrict the dynamics to only two possible classes. The probability distributions of both the signal and the background have simple representations in terms of Meijer G functions. The two universality classes for the background dynamics manifest themselves in the signal distribution as two types of tails: power law and stretched exponential, respectively. A detailed analysis of empirical data from classical turbulence and financial markets shows excellent agreement with the theory.

  5. Simulation methods to estimate design power: an overview for applied research.

    PubMed

    Arnold, Benjamin F; Hogan, Daniel R; Colford, John M; Hubbard, Alan E

    2011-06-20

    Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research.

  6. Simulation methods to estimate design power: an overview for applied research

    PubMed Central

    2011-01-01

    Background Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. Methods We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. Results We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Conclusions Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research. PMID:21689447

  7. Information categorization approach to literary authorship disputes

    NASA Astrophysics Data System (ADS)

    Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.

    2003-11-01

    Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.

  8. Modeling epidemics on adaptively evolving networks: A data-mining perspective.

    PubMed

    Kattis, Assimakis A; Holiday, Alexander; Stoica, Ana-Andreea; Kevrekidis, Ioannis G

    2016-01-01

    The exploration of epidemic dynamics on dynamically evolving ("adaptive") networks poses nontrivial challenges to the modeler, such as the determination of a small number of informative statistics of the detailed network state (that is, a few "good observables") that usefully summarize the overall (macroscopic, systems-level) behavior. Obtaining reduced, small size accurate models in terms of these few statistical observables--that is, trying to coarse-grain the full network epidemic model to a small but useful macroscopic one--is even more daunting. Here we describe a data-based approach to solving the first challenge: the detection of a few informative collective observables of the detailed epidemic dynamics. This is accomplished through Diffusion Maps (DMAPS), a recently developed data-mining technique. We illustrate the approach through simulations of a simple mathematical model of epidemics on a network: a model known to exhibit complex temporal dynamics. We discuss potential extensions of the approach, as well as possible shortcomings.

  9. A Statistical Approach to Exoplanetary Molecular Spectroscopy Using Spitzer Eclipses

    NASA Astrophysics Data System (ADS)

    Deming, Drake; Garhart, Emily; Burrows, Adam; Fortney, Jonathan; Knutson, Heather; Todorov, Kamen

    2018-01-01

    Secondary eclipses of exoplanets observed using the Spitzer Space Telescope measure the total emission emergent from exoplanetary atmospheres integrated over broad photometric bands. Spitzer photometry is excellent for measuring day side temperatures, but is less well suited to the detection of molecular absorption or emission features. Even for very hot exoplanets, it can be difficult to attain the accuracy on eclipse depth that is needed to unambiguously interpret the Spitzer results in terms of molecular absorption or emission. However, a statistical approach, wherein we seek deviations from a simple blackbody planet as a function of the planet's equilibrium temperature, shows promise for defining the nature and strength of molecular absorption in ensembles of planets. In this paper, we explore such an approach using secondary eclipses observed for tens of hot exoplanets during Spitzer's Cycles 10, 12, and 13. We focus on the possibility that the hottest planets exhibit molecular features in emission, due to temperature inversions.

  10. Causality

    NASA Astrophysics Data System (ADS)

    Pearl, Judea

    2000-03-01

    Written by one of the pre-eminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, philosophy, cognitive science, and the health and social sciences. Pearl presents a unified account of the probabilistic, manipulative, counterfactual and structural approaches to causation, and devises simple mathematical tools for analyzing the relationships between causal connections, statistical associations, actions and observations. The book will open the way for including causal analysis in the standard curriculum of statistics, artifical intelligence, business, epidemiology, social science and economics. Students in these areas will find natural models, simple identification procedures, and precise mathematical definitions of causal concepts that traditional texts have tended to evade or make unduly complicated. This book will be of interest to professionals and students in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable.

  11. Mortality table construction

    NASA Astrophysics Data System (ADS)

    Sutawanir

    2015-12-01

    Mortality tables play important role in actuarial studies such as life annuities, premium determination, premium reserve, valuation pension plan, pension funding. Some known mortality tables are CSO mortality table, Indonesian Mortality Table, Bowers mortality table, Japan Mortality table. For actuary applications some tables are constructed with different environment such as single decrement, double decrement, and multiple decrement. There exist two approaches in mortality table construction : mathematics approach and statistical approach. Distribution model and estimation theory are the statistical concepts that are used in mortality table construction. This article aims to discuss the statistical approach in mortality table construction. The distributional assumptions are uniform death distribution (UDD) and constant force (exponential). Moment estimation and maximum likelihood are used to estimate the mortality parameter. Moment estimation methods are easier to manipulate compared to maximum likelihood estimation (mle). However, the complete mortality data are not used in moment estimation method. Maximum likelihood exploited all available information in mortality estimation. Some mle equations are complicated and solved using numerical methods. The article focus on single decrement estimation using moment and maximum likelihood estimation. Some extension to double decrement will introduced. Simple dataset will be used to illustrated the mortality estimation, and mortality table.

  12. Statistical mechanics of the vertex-cover problem

    NASA Astrophysics Data System (ADS)

    Hartmann, Alexander K.; Weigt, Martin

    2003-10-01

    We review recent progress in the study of the vertex-cover problem (VC). The VC belongs to the class of NP-complete graph theoretical problems, which plays a central role in theoretical computer science. On ensembles of random graphs, VC exhibits a coverable-uncoverable phase transition. Very close to this transition, depending on the solution algorithm, easy-hard transitions in the typical running time of the algorithms occur. We explain a statistical mechanics approach, which works by mapping the VC to a hard-core lattice gas, and then applying techniques such as the replica trick or the cavity approach. Using these methods, the phase diagram of the VC could be obtained exactly for connectivities c < e, where the VC is replica symmetric. Recently, this result could be confirmed using traditional mathematical techniques. For c > e, the solution of the VC exhibits full replica symmetry breaking. The statistical mechanics approach can also be used to study analytically the typical running time of simple complete and incomplete algorithms for the VC. Finally, we describe recent results for the VC when studied on other ensembles of finite- and infinite-dimensional graphs.

  13. Complex dynamics and empirical evidence (Invited Paper)

    NASA Astrophysics Data System (ADS)

    Delli Gatti, Domenico; Gaffeo, Edoardo; Giulioni, Gianfranco; Gallegati, Mauro; Kirman, Alan; Palestrini, Antonio; Russo, Alberto

    2005-05-01

    Standard macroeconomics, based on a reductionist approach centered on the representative agent, is badly equipped to explain the empirical evidence where heterogeneity and industrial dynamics are the rule. In this paper we show that a simple agent-based model of heterogeneous financially fragile agents is able to replicate a large number of scaling type stylized facts with a remarkable degree of statistical precision.

  14. Housing decision making methods for initiation development phase process

    NASA Astrophysics Data System (ADS)

    Zainal, Rozlin; Kasim, Narimah; Sarpin, Norliana; Wee, Seow Ta; Shamsudin, Zarina

    2017-10-01

    Late delivery and sick housing project problems were attributed to poor decision making. These problems are the string of housing developer that prefers to create their own approach based on their experiences and expertise with the simplest approach by just applying the obtainable standards and rules in decision making. This paper seeks to identify the decision making methods for housing development at the initiation phase in Malaysia. The research involved Delphi method by using questionnaire survey which involved 50 numbers of developers as samples for the primary stage of collect data. However, only 34 developers contributed to the second stage of the information gathering process. At the last stage, only 12 developers were left for the final data collection process. Finding affirms that Malaysian developers prefer to make their investment decisions based on simple interpolation of historical data and using simple statistical or mathematical techniques in producing the required reports. It was suggested that they seemed to skip several important decision-making functions at the primary development stage. These shortcomings were mainly due to time and financial constraints and the lack of statistical or mathematical expertise among the professional and management groups in the developer organisations.

  15. A Monte Carlo–Based Bayesian Approach for Measuring Agreement in a Qualitative Scale

    PubMed Central

    Pérez Sánchez, Carlos Javier

    2014-01-01

    Agreement analysis has been an active research area whose techniques have been widely applied in psychology and other fields. However, statistical agreement among raters has been mainly considered from a classical statistics point of view. Bayesian methodology is a viable alternative that allows the inclusion of subjective initial information coming from expert opinions, personal judgments, or historical data. A Bayesian approach is proposed by providing a unified Monte Carlo–based framework to estimate all types of measures of agreement in a qualitative scale of response. The approach is conceptually simple and it has a low computational cost. Both informative and non-informative scenarios are considered. In case no initial information is available, the results are in line with the classical methodology, but providing more information on the measures of agreement. For the informative case, some guidelines are presented to elicitate the prior distribution. The approach has been applied to two applications related to schizophrenia diagnosis and sensory analysis. PMID:29881002

  16. Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)

    NASA Astrophysics Data System (ADS)

    Kasibhatla, P.

    2004-12-01

    In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.

  17. Rainfall runoff modelling of the Upper Ganga and Brahmaputra basins using PERSiST.

    PubMed

    Futter, M N; Whitehead, P G; Sarkar, S; Rodda, H; Crossman, J

    2015-06-01

    There are ongoing discussions about the appropriate level of complexity and sources of uncertainty in rainfall runoff models. Simulations for operational hydrology, flood forecasting or nutrient transport all warrant different levels of complexity in the modelling approach. More complex model structures are appropriate for simulations of land-cover dependent nutrient transport while more parsimonious model structures may be adequate for runoff simulation. The appropriate level of complexity is also dependent on data availability. Here, we use PERSiST; a simple, semi-distributed dynamic rainfall-runoff modelling toolkit to simulate flows in the Upper Ganges and Brahmaputra rivers. We present two sets of simulations driven by single time series of daily precipitation and temperature using simple (A) and complex (B) model structures based on uniform and hydrochemically relevant land covers respectively. Models were compared based on ensembles of Bayesian Information Criterion (BIC) statistics. Equifinality was observed for parameters but not for model structures. Model performance was better for the more complex (B) structural representations than for parsimonious model structures. The results show that structural uncertainty is more important than parameter uncertainty. The ensembles of BIC statistics suggested that neither structural representation was preferable in a statistical sense. Simulations presented here confirm that relatively simple models with limited data requirements can be used to credibly simulate flows and water balance components needed for nutrient flux modelling in large, data-poor basins.

  18. PGT: A Statistical Approach to Prediction and Mechanism Design

    NASA Astrophysics Data System (ADS)

    Wolpert, David H.; Bono, James W.

    One of the biggest challenges facing behavioral economics is the lack of a single theoretical framework that is capable of directly utilizing all types of behavioral data. One of the biggest challenges of game theory is the lack of a framework for making predictions and designing markets in a manner that is consistent with the axioms of decision theory. An approach in which solution concepts are distribution-valued rather than set-valued (i.e. equilibrium theory) has both capabilities. We call this approach Predictive Game Theory (or PGT). This paper outlines a general Bayesian approach to PGT. It also presents one simple example to illustrate the way in which this approach differs from equilibrium approaches in both prediction and mechanism design settings.

  19. Machine learning Z2 quantum spin liquids with quasiparticle statistics

    NASA Astrophysics Data System (ADS)

    Zhang, Yi; Melko, Roger G.; Kim, Eun-Ah

    2017-12-01

    After decades of progress and effort, obtaining a phase diagram for a strongly correlated topological system still remains a challenge. Although in principle one could turn to Wilson loops and long-range entanglement, evaluating these nonlocal observables at many points in phase space can be prohibitively costly. With growing excitement over topological quantum computation comes the need for an efficient approach for obtaining topological phase diagrams. Here we turn to machine learning using quantum loop topography (QLT), a notion we have recently introduced. Specifically, we propose a construction of QLT that is sensitive to quasiparticle statistics. We then use mutual statistics between the spinons and visons to detect a Z2 quantum spin liquid in a multiparameter phase space. We successfully obtain the quantum phase boundary between the topological and trivial phases using a simple feed-forward neural network. Furthermore, we demonstrate advantages of our approach for the evaluation of phase diagrams relating to speed and storage. Such statistics-based machine learning of topological phases opens new efficient routes to studying topological phase diagrams in strongly correlated systems.

  20. Nature of Driving Force for Protein Folding: A Result From Analyzing the Statistical Potential

    NASA Astrophysics Data System (ADS)

    Li, Hao; Tang, Chao; Wingreen, Ned S.

    1997-07-01

    In a statistical approach to protein structure analysis, Miyazawa and Jernigan derived a 20×20 matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the Miyazawa-Jernigan matrix can be accurately reconstructed from its first two principal component vectors as Mij = C0+C1\\(qi+qj\\)+C2qiqj, with constant C's, and 20 q values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.

  1. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  2. Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

    PubMed Central

    Agami, Sarit

    2017-01-01

    Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301

  3. A methodology for the design of experiments in computational intelligence with multiple regression models.

    PubMed

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  4. A methodology for the design of experiments in computational intelligence with multiple regression models

    PubMed Central

    Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952

  5. MAI statistics estimation and analysis in a DS-CDMA system

    NASA Astrophysics Data System (ADS)

    Alami Hassani, A.; Zouak, M.; Mrabti, M.; Abdi, F.

    2018-05-01

    A primary limitation of Direct Sequence Code Division Multiple Access DS-CDMA link performance and system capacity is multiple access interference (MAI). To examine the performance of CDMA systems in the presence of MAI, i.e., in a multiuser environment, several works assumed that the interference can be approximated by a Gaussian random variable. In this paper, we first develop a new and simple approach to characterize the MAI in a multiuser system. In addition to statistically quantifying the MAI power, the paper also proposes a statistical model for both variance and mean of the MAI for synchronous and asynchronous CDMA transmission. We show that the MAI probability density function (PDF) is Gaussian for the equal-received-energy case and validate it by computer simulations.

  6. Dynamics of non-Markovian exclusion processes

    NASA Astrophysics Data System (ADS)

    Khoromskaia, Diana; Harris, Rosemary J.; Grosskinsky, Stefan

    2014-12-01

    Driven diffusive systems are often used as simple discrete models of collective transport phenomena in physics, biology or social sciences. Restricting attention to one-dimensional geometries, the asymmetric simple exclusion process (ASEP) plays a paradigmatic role to describe noise-activated driven motion of entities subject to an excluded volume interaction and many variants have been studied in recent years. While in the standard ASEP the noise is Poissonian and the process is therefore Markovian, in many applications the statistics of the activating noise has a non-standard distribution with possible memory effects resulting from internal degrees of freedom or external sources. This leads to temporal correlations and can significantly affect the shape of the current-density relation as has been studied recently for a number of scenarios. In this paper we report a general framework to derive the fundamental diagram of ASEPs driven by non-Poissonian noise by using effectively only two simple quantities, viz., the mean residual lifetime of the jump distribution and a suitably defined temporal correlation length. We corroborate our results by detailed numerical studies for various noise statistics under periodic boundary conditions and discuss how our approach can be applied to more general driven diffusive systems.

  7. Estimation of critical behavior from the density of states in classical statistical models

    NASA Astrophysics Data System (ADS)

    Malakis, A.; Peratzakis, A.; Fytas, N. G.

    2004-12-01

    We present a simple and efficient approximation scheme which greatly facilitates the extension of Wang-Landau sampling (or similar techniques) in large systems for the estimation of critical behavior. The method, presented in an algorithmic approach, is based on a very simple idea, familiar in statistical mechanics from the notion of thermodynamic equivalence of ensembles and the central limit theorem. It is illustrated that we can predict with high accuracy the critical part of the energy space and by using this restricted part we can extend our simulations to larger systems and improve the accuracy of critical parameters. It is proposed that the extensions of the finite-size critical part of the energy space, determining the specific heat, satisfy a scaling law involving the thermal critical exponent. The method is applied successfully for the estimation of the scaling behavior of specific heat of both square and simple cubic Ising lattices. The proposed scaling law is verified by estimating the thermal critical exponent from the finite-size behavior of the critical part of the energy space. The density of states of the zero-field Ising model on these lattices is obtained via a multirange Wang-Landau sampling.

  8. Non-abelian anyons and topological quantum information processing in 1D wire networks

    NASA Astrophysics Data System (ADS)

    Alicea, Jason

    2012-02-01

    Topological quantum computation provides an elegant solution to decoherence, circumventing this infamous problem at the hardware level. The most basic requirement in this approach is the ability to stabilize and manipulate particles exhibiting non-Abelian exchange statistics -- Majorana fermions being the simplest example. Curiously, Majorana fermions have been predicted to arise both in 2D systems, where non-Abelian statistics is well established, and in 1D, where exchange statistics of any type is ill-defined. An important question then arises: do Majorana fermions in 1D hold the same technological promise as their 2D counterparts? In this talk I will answer this question in the affirmative, describing how one can indeed manipulate and harness the non-Abelian statistics of Majoranas in a remarkably simple fashion using networks formed by quantum wires or topological insulator edges.

  9. A new statistical approach to climate change detection and attribution

    NASA Astrophysics Data System (ADS)

    Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe

    2017-01-01

    We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).

  10. Statistical Approaches for Spatiotemporal Prediction of Low Flows

    NASA Astrophysics Data System (ADS)

    Fangmann, A.; Haberlandt, U.

    2017-12-01

    An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be problematic. Spatiotemporal prediction of L-moments appeared highly uncertain for higher-order moments resulting in unrealistic future low flow values. All in all, the results promote an inclusion of simple statistical methods in climate change impact assessment.

  11. Aggregative Learning Method and Its Application for Communication Quality Evaluation

    NASA Astrophysics Data System (ADS)

    Akhmetov, Dauren F.; Kotaki, Minoru

    2007-12-01

    In this paper, so-called Aggregative Learning Method (ALM) is proposed to improve and simplify the learning and classification abilities of different data processing systems. It provides a universal basis for design and analysis of mathematical models of wide class. A procedure was elaborated for time series model reconstruction and analysis for linear and nonlinear cases. Data approximation accuracy (during learning phase) and data classification quality (during recall phase) are estimated from introduced statistic parameters. The validity and efficiency of the proposed approach have been demonstrated through its application for monitoring of wireless communication quality, namely, for Fixed Wireless Access (FWA) system. Low memory and computation resources were shown to be needed for the procedure realization, especially for data classification (recall) stage. Characterized with high computational efficiency and simple decision making procedure, the derived approaches can be useful for simple and reliable real-time surveillance and control system design.

  12. Physics Based Modeling and Rendering of Vegetation in the Thermal Infrared

    NASA Technical Reports Server (NTRS)

    Smith, J. A.; Ballard, J. R., Jr.

    1999-01-01

    We outline a procedure for rendering physically-based thermal infrared images of simple vegetation scenes. Our approach incorporates the biophysical processes that affect the temperature distribution of the elements within a scene. Computer graphics plays a key role in two respects. First, in computing the distribution of scene shaded and sunlit facets and, second, in the final image rendering once the temperatures of all the elements in the scene have been computed. We illustrate our approach for a simple corn scene where the three-dimensional geometry is constructed based on measured morphological attributes of the row crop. Statistical methods are used to construct a representation of the scene in agreement with the measured characteristics. Our results are quite good. The rendered images exhibit realistic behavior in directional properties as a function of view and sun angle. The root-mean-square error in measured versus predicted brightness temperatures for the scene was 2.1 deg C.

  13. The Importance of Proving the Null

    PubMed Central

    Gallistel, C. R.

    2010-01-01

    Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is favored. A general solution is a sensitivity analysis: Compute the odds for or against the null as a function of the limit(s) on the vagueness of the alternative. If the odds on the null approach 1 from above as the hypothesized maximum size of the possible effect approaches 0, then the data favor the null over any vaguer alternative to it. The simple computations and the intuitive graphic representation of the analysis are illustrated by the analysis of diverse examples from the current literature. They pose 3 common experimental questions: (a) Are 2 means the same? (b) Is performance at chance? (c) Are factors additive? PMID:19348549

  14. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  15. Cortical Auditory Evoked Potentials with Simple (Tone Burst) and Complex (Speech) Stimuli in Children with Cochlear Implant

    PubMed Central

    Martins, Kelly Vasconcelos Chaves; Gil, Daniela

    2017-01-01

    Introduction  The registry of the component P1 of the cortical auditory evoked potential has been widely used to analyze the behavior of auditory pathways in response to cochlear implant stimulation. Objective  To determine the influence of aural rehabilitation in the parameters of latency and amplitude of the P1 cortical auditory evoked potential component elicited by simple auditory stimuli (tone burst) and complex stimuli (speech) in children with cochlear implants. Method  The study included six individuals of both genders aged 5 to 10 years old who have been cochlear implant users for at least 12 months, and who attended auditory rehabilitation with an aural rehabilitation therapy approach. Participants were submitted to research of the cortical auditory evoked potential at the beginning of the study and after 3 months of aural rehabilitation. To elicit the responses, simple stimuli (tone burst) and complex stimuli (speech) were used and presented in free field at 70 dB HL. The results were statistically analyzed, and both evaluations were compared. Results  There was no significant difference between the type of eliciting stimulus of the cortical auditory evoked potential for the latency and the amplitude of P1. There was a statistically significant difference in the P1 latency between the evaluations for both stimuli, with reduction of the latency in the second evaluation after 3 months of auditory rehabilitation. There was no statistically significant difference regarding the amplitude of P1 under the two types of stimuli or in the two evaluations. Conclusion  A decrease in latency of the P1 component elicited by both simple and complex stimuli was observed within a three-month interval in children with cochlear implant undergoing aural rehabilitation. PMID:29018498

  16. Bootstrap Methods: A Very Leisurely Look.

    ERIC Educational Resources Information Center

    Hinkle, Dennis E.; Winstead, Wayland H.

    The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…

  17. Fitting mechanistic epidemic models to data: A comparison of simple Markov chain Monte Carlo approaches.

    PubMed

    Li, Michael; Dushoff, Jonathan; Bolker, Benjamin M

    2018-07-01

    Simple mechanistic epidemic models are widely used for forecasting and parameter estimation of infectious diseases based on noisy case reporting data. Despite the widespread application of models to emerging infectious diseases, we know little about the comparative performance of standard computational-statistical frameworks in these contexts. Here we build a simple stochastic, discrete-time, discrete-state epidemic model with both process and observation error and use it to characterize the effectiveness of different flavours of Bayesian Markov chain Monte Carlo (MCMC) techniques. We use fits to simulated data, where parameters (and future behaviour) are known, to explore the limitations of different platforms and quantify parameter estimation accuracy, forecasting accuracy, and computational efficiency across combinations of modeling decisions (e.g. discrete vs. continuous latent states, levels of stochasticity) and computational platforms (JAGS, NIMBLE, Stan).

  18. Symmetric log-domain diffeomorphic Registration: a demons-based approach.

    PubMed

    Vercauteren, Tom; Pennec, Xavier; Perchant, Aymeric; Ayache, Nicholas

    2008-01-01

    Modern morphometric studies use non-linear image registration to compare anatomies and perform group analysis. Recently, log-Euclidean approaches have contributed to promote the use of such computational anatomy tools by permitting simple computations of statistics on a rather large class of invertible spatial transformations. In this work, we propose a non-linear registration algorithm perfectly fit for log-Euclidean statistics on diffeomorphisms. Our algorithm works completely in the log-domain, i.e. it uses a stationary velocity field. This implies that we guarantee the invertibility of the deformation and have access to the true inverse transformation. This also means that our output can be directly used for log-Euclidean statistics without relying on the heavy computation of the log of the spatial transformation. As it is often desirable, our algorithm is symmetric with respect to the order of the input images. Furthermore, we use an alternate optimization approach related to Thirion's demons algorithm to provide a fast non-linear registration algorithm. First results show that our algorithm outperforms both the demons algorithm and the recently proposed diffeomorphic demons algorithm in terms of accuracy of the transformation while remaining computationally efficient.

  19. Two Simple Approaches to Overcome a Problem with the Mantel-Haenszel Statistic: Comments on Wang, Bradlow, Wainer, and Muller (2008)

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Dorans, Neil J.

    2010-01-01

    The Mantel-Haenszel (MH) procedure (Mantel and Haenszel) is a popular method for estimating and testing a common two-factor association parameter in a 2 x 2 x K table. Holland and Holland and Thayer described how to use the procedure to detect differential item functioning (DIF) for tests with dichotomously scored items. Wang, Bradlow, Wainer, and…

  20. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  1. Nature of Driving Force for Protein Folding-- A Result From Analyzing the Statistical Potential

    NASA Astrophysics Data System (ADS)

    Li, Hao; Tang, Chao; Wingreen, Ned S.

    1998-03-01

    In a statistical approach to protein structure analysis, Miyazawa and Jernigan (MJ) derived a 20× 20 matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the MJ matrix can be accurately reconstructed from its first two principal component vectors as M_ij=C_0+C_1(q_i+q_j)+C2 qi q_j, with constant C's, and 20 q values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.

  2. Are V1 Simple Cells Optimized for Visual Occlusions? A Comparative Study

    PubMed Central

    Bornschein, Jörg; Henniges, Marc; Lücke, Jörg

    2013-01-01

    Simple cells in primary visual cortex were famously found to respond to low-level image components such as edges. Sparse coding and independent component analysis (ICA) emerged as the standard computational models for simple cell coding because they linked their receptive fields to the statistics of visual stimuli. However, a salient feature of image statistics, occlusions of image components, is not considered by these models. Here we ask if occlusions have an effect on the predicted shapes of simple cell receptive fields. We use a comparative approach to answer this question and investigate two models for simple cells: a standard linear model and an occlusive model. For both models we simultaneously estimate optimal receptive fields, sparsity and stimulus noise. The two models are identical except for their component superposition assumption. We find the image encoding and receptive fields predicted by the models to differ significantly. While both models predict many Gabor-like fields, the occlusive model predicts a much sparser encoding and high percentages of ‘globular’ receptive fields. This relatively new center-surround type of simple cell response is observed since reverse correlation is used in experimental studies. While high percentages of ‘globular’ fields can be obtained using specific choices of sparsity and overcompleteness in linear sparse coding, no or only low proportions are reported in the vast majority of studies on linear models (including all ICA models). Likewise, for the here investigated linear model and optimal sparsity, only low proportions of ‘globular’ fields are observed. In comparison, the occlusive model robustly infers high proportions and can match the experimentally observed high proportions of ‘globular’ fields well. Our computational study, therefore, suggests that ‘globular’ fields may be evidence for an optimal encoding of visual occlusions in primary visual cortex. PMID:23754938

  3. A classification procedure for the effective management of changes during the maintenance process

    NASA Technical Reports Server (NTRS)

    Briand, Lionel C.; Basili, Victor R.

    1992-01-01

    During software operation, maintainers are often faced with numerous change requests. Given available resources such as effort and calendar time, changes, if approved, have to be planned to fit within budget and schedule constraints. In this paper, we address the issue of assessing the difficulty of a change based on known or predictable data. This paper should be considered as a first step towards the construction of customized economic models for maintainers. In it, we propose a modeling approach, based on regular statistical techniques, that can be used in a variety of software maintenance environments. The approach can be easily automated, and is simple for people with limited statistical experience to use. Moreover, it deals effectively with the uncertainty usually associated with both model inputs and outputs. The modeling approach is validated on a data set provided by NASA/GSFC which shows it was effective in classifying changes with respect to the effort involved in implementing them. Other advantages of the approach are discussed along with additional steps to improve the results.

  4. Comparison between the basic least squares and the Bayesian approach for elastic constants identification

    NASA Astrophysics Data System (ADS)

    Gogu, C.; Haftka, R.; LeRiche, R.; Molimard, J.; Vautrin, A.; Sankar, B.

    2008-11-01

    The basic formulation of the least squares method, based on the L2 norm of the misfit, is still widely used today for identifying elastic material properties from experimental data. An alternative statistical approach is the Bayesian method. We seek here situations with significant difference between the material properties found by the two methods. For a simple three bar truss example we illustrate three such situations in which the Bayesian approach leads to more accurate results: different magnitude of the measurements, different uncertainty in the measurements and correlation among measurements. When all three effects add up, the Bayesian approach can have a large advantage. We then compared the two methods for identification of elastic constants from plate vibration natural frequencies.

  5. A simple white noise analysis of neuronal light responses.

    PubMed

    Chichilnisky, E J

    2001-05-01

    A white noise technique is presented for estimating the response properties of spiking visual system neurons. The technique is simple, robust, efficient and well suited to simultaneous recordings from multiple neurons. It provides a complete and easily interpretable model of light responses even for neurons that display a common form of response nonlinearity that precludes classical linear systems analysis. A theoretical justification of the technique is presented that relies only on elementary linear algebra and statistics. Implementation is described with examples. The technique and the underlying model of neural responses are validated using recordings from retinal ganglion cells, and in principle are applicable to other neurons. Advantages and disadvantages of the technique relative to classical approaches are discussed.

  6. Fock expansion of multimode pure Gaussian states

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cariolaro, Gianfranco; Pierobon, Gianfranco, E-mail: gianfranco.pierobon@unipd.it

    2015-12-15

    The Fock expansion of multimode pure Gaussian states is derived starting from their representation as displaced and squeezed multimode vacuum states. The approach is new and appears to be simpler and more general than previous ones starting from the phase-space representation given by the characteristic or Wigner function. Fock expansion is performed in terms of easily evaluable two-variable Hermite–Kampé de Fériet polynomials. A relatively simple and compact expression for the joint statistical distribution of the photon numbers in the different modes is obtained. In particular, this result enables one to give a simple characterization of separable and entangled states, asmore » shown for two-mode and three-mode Gaussian states.« less

  7. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  8. Statistics of dislocation pinning at localized obstacles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dutta, A.; Bhattacharya, M., E-mail: mishreyee@vecc.gov.in; Barat, P.

    2014-10-14

    Pinning of dislocations at nanosized obstacles like precipitates, voids, and bubbles is a crucial mechanism in the context of phenomena like hardening and creep. The interaction between such an obstacle and a dislocation is often studied at fundamental level by means of analytical tools, atomistic simulations, and finite element methods. Nevertheless, the information extracted from such studies cannot be utilized to its maximum extent on account of insufficient information about the underlying statistics of this process comprising a large number of dislocations and obstacles in a system. Here, we propose a new statistical approach, where the statistics of pinning ofmore » dislocations by idealized spherical obstacles is explored by taking into account the generalized size-distribution of the obstacles along with the dislocation density within a three-dimensional framework. Starting with a minimal set of material parameters, the framework employs the method of geometrical statistics with a few simple assumptions compatible with the real physical scenario. The application of this approach, in combination with the knowledge of fundamental dislocation-obstacle interactions, has successfully been demonstrated for dislocation pinning at nanovoids in neutron irradiated type 316-stainless steel in regard to the non-conservative motion of dislocations. An interesting phenomenon of transition from rare pinning to multiple pinning regimes with increasing irradiation temperature is revealed.« less

  9. Massive optimal data compression and density estimation for scalable, likelihood-free inference in cosmology

    NASA Astrophysics Data System (ADS)

    Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen

    2018-07-01

    Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.

  10. Simple heuristics and rules of thumb: where psychologists and behavioural biologists might meet.

    PubMed

    Hutchinson, John M C; Gigerenzer, Gerd

    2005-05-31

    The Centre for Adaptive Behaviour and Cognition (ABC) has hypothesised that much human decision-making can be described by simple algorithmic process models (heuristics). This paper explains this approach and relates it to research in biology on rules of thumb, which we also review. As an example of a simple heuristic, consider the lexicographic strategy of Take The Best for choosing between two alternatives: cues are searched in turn until one discriminates, then search stops and all other cues are ignored. Heuristics consist of building blocks, and building blocks exploit evolved or learned abilities such as recognition memory; it is the complexity of these abilities that allows the heuristics to be simple. Simple heuristics have an advantage in making decisions fast and with little information, and in avoiding overfitting. Furthermore, humans are observed to use simple heuristics. Simulations show that the statistical structures of different environments affect which heuristics perform better, a relationship referred to as ecological rationality. We contrast ecological rationality with the stronger claim of adaptation. Rules of thumb from biology provide clearer examples of adaptation because animals can be studied in the environments in which they evolved. The range of examples is also much more diverse. To investigate them, biologists have sometimes used similar simulation techniques to ABC, but many examples depend on empirically driven approaches. ABC's theoretical framework can be useful in connecting some of these examples, particularly the scattered literature on how information from different cues is integrated. Optimality modelling is usually used to explain less detailed aspects of behaviour but might more often be redirected to investigate rules of thumb.

  11. Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model.

    PubMed

    Hu, Jianhua; Wright, Fred A

    2007-03-01

    The identification of the genes that are differentially expressed in two-sample microarray experiments remains a difficult problem when the number of arrays is very small. We discuss the implications of using ordinary t-statistics and examine other commonly used variants. For oligonucleotide arrays with multiple probes per gene, we introduce a simple model relating the mean and variance of expression, possibly with gene-specific random effects. Parameter estimates from the model have natural shrinkage properties that guard against inappropriately small variance estimates, and the model is used to obtain a differential expression statistic. A limiting value to the positive false discovery rate (pFDR) for ordinary t-tests provides motivation for our use of the data structure to improve variance estimates. Our approach performs well compared to other proposed approaches in terms of the false discovery rate.

  12. Automated sampling assessment for molecular simulations using the effective sample size

    PubMed Central

    Zhang, Xin; Bhatt, Divesh; Zuckerman, Daniel M.

    2010-01-01

    To quantify the progress in the development of algorithms and forcefields used in molecular simulations, a general method for the assessment of the sampling quality is needed. Statistical mechanics principles suggest the populations of physical states characterize equilibrium sampling in a fundamental way. We therefore develop an approach for analyzing the variances in state populations, which quantifies the degree of sampling in terms of the effective sample size (ESS). The ESS estimates the number of statistically independent configurations contained in a simulated ensemble. The method is applicable to both traditional dynamics simulations as well as more modern (e.g., multi–canonical) approaches. Our procedure is tested in a variety of systems from toy models to atomistic protein simulations. We also introduce a simple automated procedure to obtain approximate physical states from dynamic trajectories: this allows sample–size estimation in systems for which physical states are not known in advance. PMID:21221418

  13. Improving Robot Locomotion Through Learning Methods for Expensive Black-Box Systems

    DTIC Science & Technology

    2013-11-01

    development of a class of “gradient free” optimization techniques; these include local approaches, such as a Nelder- Mead simplex search (c.f. [73]), and global...1Note that this simple method differs from the Nelder Mead constrained nonlinear optimization method [73]. 39 the Non-dominated Sorting Genetic Algorithm...Kober, and Jan Peters. Model-free inverse reinforcement learning. In International Conference on Artificial Intelligence and Statistics, 2011. [12] George

  14. {Phi}{sup 4} kinks: Statistical mechanics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Habib, S.

    1995-12-31

    Some recent investigations of the thermal equilibrium properties of kinks in a 1+1-dimensional, classical {phi}{sup 4} field theory are reviewed. The distribution function, kink density, correlation function, and certain thermodynamic quantities were studied both theoretically and via large scale simulations. A simple double Gaussian variational approach within the transfer operator formalism was shown to give good results in the intermediate temperature range where the dilute gas theory is known to fail.

  15. A simple approach to quantitative analysis using three-dimensional spectra based on selected Zernike moments.

    PubMed

    Zhai, Hong Lin; Zhai, Yue Yuan; Li, Pei Zhen; Tian, Yue Li

    2013-01-21

    A very simple approach to quantitative analysis is proposed based on the technology of digital image processing using three-dimensional (3D) spectra obtained by high-performance liquid chromatography coupled with a diode array detector (HPLC-DAD). As the region-based shape features of a grayscale image, Zernike moments with inherently invariance property were employed to establish the linear quantitative models. This approach was applied to the quantitative analysis of three compounds in mixed samples using 3D HPLC-DAD spectra, and three linear models were obtained, respectively. The correlation coefficients (R(2)) for training and test sets were more than 0.999, and the statistical parameters and strict validation supported the reliability of established models. The analytical results suggest that the Zernike moment selected by stepwise regression can be used in the quantitative analysis of target compounds. Our study provides a new idea for quantitative analysis using 3D spectra, which can be extended to the analysis of other 3D spectra obtained by different methods or instruments.

  16. Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models.

    PubMed

    Wilson, Machelle D; Sethi, Sunjay; Lein, Pamela J; Keil, Kimberly P

    2017-03-01

    The Sholl technique is widely used to quantify dendritic morphology. Data from such studies, which typically sample multiple neurons per animal, are often analyzed using simple linear models. However, simple linear models fail to account for intra-class correlation that occurs with clustered data, which can lead to faulty inferences. Mixed effects models account for intra-class correlation that occurs with clustered data; thus, these models more accurately estimate the standard deviation of the parameter estimate, which produces more accurate p-values. While mixed models are not new, their use in neuroscience has lagged behind their use in other disciplines. A review of the published literature illustrates common mistakes in analyses of Sholl data. Analysis of Sholl data collected from Golgi-stained pyramidal neurons in the hippocampus of male and female mice using both simple linear and mixed effects models demonstrates that the p-values and standard deviations obtained using the simple linear models are biased downwards and lead to erroneous rejection of the null hypothesis in some analyses. The mixed effects approach more accurately models the true variability in the data set, which leads to correct inference. Mixed effects models avoid faulty inference in Sholl analysis of data sampled from multiple neurons per animal by accounting for intra-class correlation. Given the widespread practice in neuroscience of obtaining multiple measurements per subject, there is a critical need to apply mixed effects models more widely. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Expected Monotonicity – A Desirable Property for Evidence Measures?

    PubMed Central

    Hodge, Susan E.; Vieland, Veronica J.

    2010-01-01

    We consider here the principle of ‘evidential consistency’ – that as one gathers more data, any well-behaved evidence measure should, in some sense, approach the true answer. Evidential consistency is essential for the genome-scan design (GWAS or linkage), where one selects the most promising locus(i) for follow-up, expecting that new data will increase evidence for the correct hypothesis. Earlier work [Vieland, Hum Hered 2006;61:144–156] showed that many popular statistics do not satisfy this principle; Vieland concluded that the problem stems from fundamental difficulties in how we measure evidence and argued for determining criteria to evaluate evidence measures. Here, we investigate in detail one proposed consistency criterion – expected monotonicity (ExpM) – for a simple statistical model (binomial) and four likelihood ratio (LR)-based evidence measures. We show that, with one limited exception, none of these measures displays ExpM; what they do display is sometimes counterintuitive. We conclude that ExpM is not a reasonable requirement for evidence measures; moreover, no requirement based on expected values seems feasible. We demonstrate certain desirable properties of the simple LR and demonstrate a connection between the simple and integrated LRs. We also consider an alternative version of consistency, which is satisfied by certain forms of the integrated LR and posterior probability of linkage. PMID:20664208

  18. In defence of model-based inference in phylogeography

    PubMed Central

    Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

    2017-01-01

    Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924

  19. Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

    NASA Astrophysics Data System (ADS)

    Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.

    2013-04-01

    Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.

  20. Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.

    PubMed

    Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi

    2016-05-01

    Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.

  1. Loop series for discrete statistical models on graphs

    NASA Astrophysics Data System (ADS)

    Chertkov, Michael; Chernyak, Vladimir Y.

    2006-06-01

    In this paper we present the derivation details, logic, and motivation for the three loop calculus introduced in Chertkov and Chernyak (2006 Phys. Rev. E 73 065102(R)). Generating functions for each of the three interrelated discrete statistical models are expressed in terms of a finite series. The first term in the series corresponds to the Bethe-Peierls belief-propagation (BP) contribution; the other terms are labelled by loops on the factor graph. All loop contributions are simple rational functions of spin correlation functions calculated within the BP approach. We discuss two alternative derivations of the loop series. One approach implements a set of local auxiliary integrations over continuous fields with the BP contribution corresponding to an integrand saddle-point value. The integrals are replaced by sums in the complementary approach, briefly explained in Chertkov and Chernyak (2006 Phys. Rev. E 73 065102(R)). Local gauge symmetry transformations that clarify an important invariant feature of the BP solution are revealed in both approaches. The individual terms change under the gauge transformation while the partition function remains invariant. The requirement for all individual terms to be nonzero only for closed loops in the factor graph (as opposed to paths with loose ends) is equivalent to fixing the first term in the series to be exactly equal to the BP contribution. Further applications of the loop calculus to problems in statistical physics, computer and information sciences are discussed.

  2. Efficient multi-atlas abdominal segmentation on clinically acquired CT with SIMPLE context learning.

    PubMed

    Xu, Zhoubing; Burke, Ryan P; Lee, Christopher P; Baucom, Rebeccah B; Poulose, Benjamin K; Abramson, Richard G; Landman, Bennett A

    2015-08-01

    Abdominal segmentation on clinically acquired computed tomography (CT) has been a challenging problem given the inter-subject variance of human abdomens and complex 3-D relationships among organs. Multi-atlas segmentation (MAS) provides a potentially robust solution by leveraging label atlases via image registration and statistical fusion. We posit that the efficiency of atlas selection requires further exploration in the context of substantial registration errors. The selective and iterative method for performance level estimation (SIMPLE) method is a MAS technique integrating atlas selection and label fusion that has proven effective for prostate radiotherapy planning. Herein, we revisit atlas selection and fusion techniques for segmenting 12 abdominal structures using clinically acquired CT. Using a re-derived SIMPLE algorithm, we show that performance on multi-organ classification can be improved by accounting for exogenous information through Bayesian priors (so called context learning). These innovations are integrated with the joint label fusion (JLF) approach to reduce the impact of correlated errors among selected atlases for each organ, and a graph cut technique is used to regularize the combined segmentation. In a study of 100 subjects, the proposed method outperformed other comparable MAS approaches, including majority vote, SIMPLE, JLF, and the Wolz locally weighted vote technique. The proposed technique provides consistent improvement over state-of-the-art approaches (median improvement of 7.0% and 16.2% in DSC over JLF and Wolz, respectively) and moves toward efficient segmentation of large-scale clinically acquired CT data for biomarker screening, surgical navigation, and data mining. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Analysis and modeling of wafer-level process variability in 28 nm FD-SOI using split C-V measurements

    NASA Astrophysics Data System (ADS)

    Pradeep, Krishna; Poiroux, Thierry; Scheer, Patrick; Juge, André; Gouget, Gilles; Ghibaudo, Gérard

    2018-07-01

    This work details the analysis of wafer level global process variability in 28 nm FD-SOI using split C-V measurements. The proposed approach initially evaluates the native on wafer process variability using efficient extraction methods on split C-V measurements. The on-wafer threshold voltage (VT) variability is first studied and modeled using a simple analytical model. Then, a statistical model based on the Leti-UTSOI compact model is proposed to describe the total C-V variability in different bias conditions. This statistical model is finally used to study the contribution of each process parameter to the total C-V variability.

  4. Statistical foundations of liquid-crystal theory

    PubMed Central

    Seguin, Brian; Fried, Eliot

    2013-01-01

    We develop a mechanical theory for systems of rod-like particles. Central to our approach is the assumption that the external power expenditure for any subsystem of rods is independent of the underlying frame of reference. This assumption is used to derive the basic balance laws for forces and torques. By considering inertial forces on par with other forces, these laws hold relative to any frame of reference, inertial or noninertial. Finally, we introduce a simple set of constitutive relations to govern the interactions between rods and find restrictions necessary and sufficient for these laws to be consistent with thermodynamics. Our framework provides a foundation for a statistical mechanical derivation of the macroscopic balance laws governing liquid crystals. PMID:23772091

  5. Robust matching for voice recognition

    NASA Astrophysics Data System (ADS)

    Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

    1994-10-01

    This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.

  6. Statistical theory of chromatography: new outlooks for affinity chromatography.

    PubMed Central

    Denizot, F C; Delaage, M A

    1975-01-01

    We have developed further the statistical approach to chromatography initiated by Giddings and Eyring, and applied it to affinity chromatography. By means of a convenient expression of moments the convergence towards the Laplace-Gauss distribution has been established. The Gaussian character is not preserved if other causes of dispersion are taken into account, but expressions of moments can be obtained in a generalized form. A simple procedure is deduced for expressing the fundamental constants of the model in terms of purely experimental quantities. Thus, affinity chromatography can be used to determine rate constants of association and dissociation in a range considered as the domain of the stopped-flow methods. PMID:1061072

  7. A simple signaling rule for variable life-adjusted display derived from an equivalent risk-adjusted CUSUM chart.

    PubMed

    Wittenberg, Philipp; Gan, Fah Fatt; Knoth, Sven

    2018-04-17

    The variable life-adjusted display (VLAD) is the first risk-adjusted graphical procedure proposed in the literature for monitoring the performance of a surgeon. It displays the cumulative sum of expected minus observed deaths. It has since become highly popular because the statistic plotted is easy to understand. But it is also easy to misinterpret a surgeon's performance by utilizing the VLAD, potentially leading to grave consequences. The problem of misinterpretation is essentially caused by the variance of the VLAD's statistic that increases with sample size. In order for the VLAD to be truly useful, a simple signaling rule is desperately needed. Various forms of signaling rules have been developed, but they are usually quite complicated. Without signaling rules, making inferences using the VLAD alone is difficult if not misleading. In this paper, we establish an equivalence between a VLAD with V-mask and a risk-adjusted cumulative sum (RA-CUSUM) chart based on the difference between the estimated probability of death and surgical outcome. Average run length analysis based on simulation shows that this particular RA-CUSUM chart has similar performance as compared to the established RA-CUSUM chart based on the log-likelihood ratio statistic obtained by testing the odds ratio of death. We provide a simple design procedure for determining the V-mask parameters based on a resampling approach. Resampling from a real data set ensures that these parameters can be estimated appropriately. Finally, we illustrate the monitoring of a real surgeon's performance using VLAD with V-mask. Copyright © 2018 John Wiley & Sons, Ltd.

  8. A Simple Illustration for the Need of Multiple Comparison Procedures

    ERIC Educational Resources Information Center

    Carter, Rickey E.

    2010-01-01

    Statistical adjustments to accommodate multiple comparisons are routinely covered in introductory statistical courses. The fundamental rationale for such adjustments, however, may not be readily understood. This article presents a simple illustration to help remedy this.

  9. Probability distributions of molecular observables computed from Markov models. II. Uncertainties in observables and their time-evolution

    NASA Astrophysics Data System (ADS)

    Chodera, John D.; Noé, Frank

    2010-09-01

    Discrete-state Markov (or master equation) models provide a useful simplified representation for characterizing the long-time statistical evolution of biomolecules in a manner that allows direct comparison with experiments as well as the elucidation of mechanistic pathways for an inherently stochastic process. A vital part of meaningful comparison with experiment is the characterization of the statistical uncertainty in the predicted experimental measurement, which may take the form of an equilibrium measurement of some spectroscopic signal, the time-evolution of this signal following a perturbation, or the observation of some statistic (such as the correlation function) of the equilibrium dynamics of a single molecule. Without meaningful error bars (which arise from both approximation and statistical error), there is no way to determine whether the deviations between model and experiment are statistically meaningful. Previous work has demonstrated that a Bayesian method that enforces microscopic reversibility can be used to characterize the statistical component of correlated uncertainties in state-to-state transition probabilities (and functions thereof) for a model inferred from molecular simulation data. Here, we extend this approach to include the uncertainty in observables that are functions of molecular conformation (such as surrogate spectroscopic signals) characterizing each state, permitting the full statistical uncertainty in computed spectroscopic experiments to be assessed. We test the approach in a simple model system to demonstrate that the computed uncertainties provide a useful indicator of statistical variation, and then apply it to the computation of the fluorescence autocorrelation function measured for a dye-labeled peptide previously studied by both experiment and simulation.

  10. A simple and fast representation space for classifying complex time series

    NASA Astrophysics Data System (ADS)

    Zunino, Luciano; Olivares, Felipe; Bariviera, Aurelio F.; Rosso, Osvaldo A.

    2017-03-01

    In the context of time series analysis considerable effort has been directed towards the implementation of efficient discriminating statistical quantifiers. Very recently, a simple and fast representation space has been introduced, namely the number of turning points versus the Abbe value. It is able to separate time series from stationary and non-stationary processes with long-range dependences. In this work we show that this bidimensional approach is useful for distinguishing complex time series: different sets of financial and physiological data are efficiently discriminated. Additionally, a multiscale generalization that takes into account the multiple time scales often involved in complex systems has been also proposed. This multiscale analysis is essential to reach a higher discriminative power between physiological time series in health and disease.

  11. A Recommended Procedure for Estimating the Cosmic-Ray Spectral Parameter of a Simple Power Law With Applications to Detector Design

    NASA Technical Reports Server (NTRS)

    Howell, L. W.

    2001-01-01

    A simple power law model consisting of a single spectral index alpha-1 is believed to be an adequate description of the galactic cosmic-ray (GCR) proton flux at energies below 10(exp 13) eV. Two procedures for estimating alpha-1 the method of moments and maximum likelihood (ML), are developed and their statistical performance compared. It is concluded that the ML procedure attains the most desirable statistical properties and is hence the recommended statistical estimation procedure for estimating alpha-1. The ML procedure is then generalized for application to a set of real cosmic-ray data and thereby makes this approach applicable to existing cosmic-ray data sets. Several other important results, such as the relationship between collecting power and detector energy resolution, as well as inclusion of a non-Gaussian detector response function, are presented. These results have many practical benefits in the design phase of a cosmic-ray detector as they permit instrument developers to make important trade studies in design parameters as a function of one of the science objectives. This is particularly important for space-based detectors where physical parameters, such as dimension and weight, impose rigorous practical limits to the design envelope.

  12. Efficient statistical tests to compare Youden index: accounting for contingency correlation.

    PubMed

    Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan

    2015-04-30

    Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.

  13. Power-law behaviour evaluation from foreign exchange market data using a wavelet transform method

    NASA Astrophysics Data System (ADS)

    Wei, H. L.; Billings, S. A.

    2009-09-01

    Numerous studies in the literature have shown that the dynamics of many time series including observations in foreign exchange markets exhibit scaling behaviours. A simple new statistical approach, derived from the concept of the continuous wavelet transform correlation function (WTCF), is proposed for the evaluation of power-law properties from observed data. The new method reveals that foreign exchange rates obey power-laws and thus belong to the class of self-similarity processes.

  14. Verifying the botanical authenticity of commercial tannins through sugars and simple phenols profiles.

    PubMed

    Malacarne, Mario; Nardin, Tiziana; Bertoldi, Daniela; Nicolini, Giorgio; Larcher, Roberto

    2016-09-01

    Commercial tannins from several botanical sources and with different chemical and technological characteristics are used in the food and winemaking industries. Different ways to check their botanical authenticity have been studied in the last few years, through investigation of different analytical parameters. This work proposes a new, effective approach based on the quantification of 6 carbohydrates, 7 polyalcohols, and 55 phenols. 87 tannins from 12 different botanical sources were analysed following a very simple sample preparation procedure. Using Forward Stepwise Discriminant Analysis, 3 statistical models were created based on sugars content, phenols concentration and combination of the two classes of compounds for the 8 most abundant categories (i.e. oak, grape seed, grape skin, gall, chestnut, quebracho, tea and acacia). The last approach provided good results in attributing tannins to the correct botanical origin. Validation, repeated 3 times on subsets of 10% of samples, confirmed the reliability of this model. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Obtaining natural-like flow releases in diverted river reaches from simple riparian benefit economic models.

    PubMed

    Perona, Paolo; Dürrenmatt, David J; Characklis, Gregory W

    2013-03-30

    We propose a theoretical river modeling framework for generating variable flow patterns in diverted-streams (i.e., no reservoir). Using a simple economic model and the principle of equal marginal utility in an inverse fashion we first quantify the benefit of the water that goes to the environment in relation to that of the anthropic activity. Then, we obtain exact expressions for optimal water allocation rules between the two competing uses, as well as the related statistical distributions. These rules are applied using both synthetic and observed streamflow data, to demonstrate that this approach may be useful in 1) generating more natural flow patterns in the river reach downstream of the diversion, thus reducing the ecodeficit; 2) obtaining a more enlightened economic interpretation of Minimum Flow Release (MFR) strategies, and; 3) comparing the long-term costs and benefits of variable versus MFR policies and showing the greater ecological sustainability of this new approach. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. The importance of proving the null.

    PubMed

    Gallistel, C R

    2009-04-01

    Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is favored. A general solution is a sensitivity analysis: Compute the odds for or against the null as a function of the limit(s) on the vagueness of the alternative. If the odds on the null approach 1 from above as the hypothesized maximum size of the possible effect approaches 0, then the data favor the null over any vaguer alternative to it. The simple computations and the intuitive graphic representation of the analysis are illustrated by the analysis of diverse examples from the current literature. They pose 3 common experimental questions: (a) Are 2 means the same? (b) Is performance at chance? (c) Are factors additive? (c) 2009 APA, all rights reserved

  17. Simple F Test Reveals Gene-Gene Interactions in Case-Control Studies

    PubMed Central

    Chen, Guanjie; Yuan, Ao; Zhou, Jie; Bentley, Amy R.; Adeyemo, Adebowale; Rotimi, Charles N.

    2012-01-01

    Missing heritability is still a challenge for Genome Wide Association Studies (GWAS). Gene-gene interactions may partially explain this residual genetic influence and contribute broadly to complex disease. To analyze the gene-gene interactions in case-control studies of complex disease, we propose a simple, non-parametric method that utilizes the F-statistic. This approach consists of three steps. First, we examine the joint distribution of a pair of SNPs in cases and controls separately. Second, an F-test is used to evaluate the ratio of dependence in cases to that of controls. Finally, results are adjusted for multiple tests. This method was used to evaluate gene-gene interactions that are associated with risk of Type 2 Diabetes among African Americans in the Howard University Family Study. We identified 18 gene-gene interactions (P < 0.0001). Compared with the commonly-used logistical regression method, we demonstrate that the F-ratio test is an efficient approach to measuring gene-gene interactions, especially for studies with limited sample size. PMID:22837643

  18. Local and global approaches to the problem of Poincaré recurrences. Applications in nonlinear dynamics

    NASA Astrophysics Data System (ADS)

    Anishchenko, V. S.; Boev, Ya. I.; Semenova, N. I.; Strelkova, G. I.

    2015-07-01

    We review rigorous and numerical results on the statistics of Poincaré recurrences which are related to the modern development of the Poincaré recurrence problem. We analyze and describe the rigorous results which are achieved both in the classical (local) approach and in the recently developed global approach. These results are illustrated by numerical simulation data for simple chaotic and ergodic systems. It is shown that the basic theoretical laws can be applied to noisy systems if the probability measure is ergodic and stationary. Poincaré recurrences are studied numerically in nonautonomous systems. Statistical characteristics of recurrences are analyzed in the framework of the global approach for the cases of positive and zero topological entropy. We show that for the positive entropy, there is a relationship between the Afraimovich-Pesin dimension, Lyapunov exponents and the Kolmogorov-Sinai entropy either without and in the presence of external noise. The case of zero topological entropy is exemplified by numerical results for the Poincare recurrence statistics in the circle map. We show and prove that the dependence of minimal recurrence times on the return region size demonstrates universal properties for the golden and the silver ratio. The behavior of Poincaré recurrences is analyzed at the critical point of Feigenbaum attractor birth. We explore Poincaré recurrences for an ergodic set which is generated in the stroboscopic section of a nonautonomous oscillator and is similar to a circle shift. Based on the obtained results we show how the Poincaré recurrence statistics can be applied for solving a number of nonlinear dynamics issues. We propose and illustrate alternative methods for diagnosing effects of external and mutual synchronization of chaotic systems in the context of the local and global approaches. The properties of the recurrence time probability density can be used to detect the stochastic resonance phenomenon. We also discuss how the fractal dimension of chaotic attractors can be estimated using the Poincaré recurrence statistics.

  19. Comparing and combining process-based crop models and statistical models with some implications for climate change

    NASA Astrophysics Data System (ADS)

    Roberts, Michael J.; Braun, Noah O.; Sinclair, Thomas R.; Lobell, David B.; Schlenker, Wolfram

    2017-09-01

    We compare predictions of a simple process-based crop model (Soltani and Sinclair 2012), a simple statistical model (Schlenker and Roberts 2009), and a combination of both models to actual maize yields on a large, representative sample of farmer-managed fields in the Corn Belt region of the United States. After statistical post-model calibration, the process model (Simple Simulation Model, or SSM) predicts actual outcomes slightly better than the statistical model, but the combined model performs significantly better than either model. The SSM, statistical model and combined model all show similar relationships with precipitation, while the SSM better accounts for temporal patterns of precipitation, vapor pressure deficit and solar radiation. The statistical and combined models show a more negative impact associated with extreme heat for which the process model does not account. Due to the extreme heat effect, predicted impacts under uniform climate change scenarios are considerably more severe for the statistical and combined models than for the process-based model.

  20. PDF approach for turbulent scalar field: Some recent developments

    NASA Technical Reports Server (NTRS)

    Gao, Feng

    1993-01-01

    The probability density function (PDF) method has been proven a very useful approach in turbulence research. It has been particularly effective in simulating turbulent reacting flows and in studying some detailed statistical properties generated by a turbulent field There are, however, some important questions that have yet to be answered in PDF studies. Our efforts in the past year have been focused on two areas. First, a simple mixing model suitable for Monte Carlo simulations has been developed based on the mapping closure. Secondly, the mechanism of turbulent transport has been analyzed in order to understand the recently observed abnormal PDF's of turbulent temperature fields generated by linear heat sources.

  1. A Statistical Approach to Illustrate the Challenge of Astrobiology for Public Outreach.

    PubMed

    Foucher, Frédéric; Hickman-Lewis, Keyron; Westall, Frances; Brack, André

    2017-10-26

    In this study, we attempt to illustrate the competition that constitutes the main challenge of astrobiology, namely the competition between the probability of extraterrestrial life and its detectability. To illustrate this fact, we propose a simple statistical approach based on our knowledge of the Universe and the Milky Way, the Solar System, and the evolution of life on Earth permitting us to obtain the order of magnitude of the distance between Earth and bodies inhabited by more or less evolved past or present life forms, and the consequences of this probability for the detection of associated biosignatures. We thus show that the probability of the existence of evolved extraterrestrial forms of life increases with distance from the Earth while, at the same time, the number of detectable biosignatures decreases due to technical and physical limitations. This approach allows us to easily explain to the general public why it is very improbable to detect a signal of extraterrestrial intelligence while it is justified to launch space probes dedicated to the search for microbial life in the Solar System.

  2. A Statistical Approach to Illustrate the Challenge of Astrobiology for Public Outreach

    PubMed Central

    Westall, Frances; Brack, André

    2017-01-01

    In this study, we attempt to illustrate the competition that constitutes the main challenge of astrobiology, namely the competition between the probability of extraterrestrial life and its detectability. To illustrate this fact, we propose a simple statistical approach based on our knowledge of the Universe and the Milky Way, the Solar System, and the evolution of life on Earth permitting us to obtain the order of magnitude of the distance between Earth and bodies inhabited by more or less evolved past or present life forms, and the consequences of this probability for the detection of associated biosignatures. We thus show that the probability of the existence of evolved extraterrestrial forms of life increases with distance from the Earth while, at the same time, the number of detectable biosignatures decreases due to technical and physical limitations. This approach allows us to easily explain to the general public why it is very improbable to detect a signal of extraterrestrial intelligence while it is justified to launch space probes dedicated to the search for microbial life in the Solar System. PMID:29072614

  3. Applying compressive sensing to TEM video: A substantial frame rate increase on any camera

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, Andrew; Kovarik, Libor; Abellan, Patricia

    One of the main limitations of imaging at high spatial and temporal resolution during in-situ transmission electron microscopy (TEM) experiments is the frame rate of the camera being used to image the dynamic process. While the recent development of direct detectors has provided the hardware to achieve frame rates approaching 0.1 ms, the cameras are expensive and must replace existing detectors. In this paper, we examine the use of coded aperture compressive sensing (CS) methods to increase the frame rate of any camera with simple, low-cost hardware modifications. The coded aperture approach allows multiple sub-frames to be coded and integratedmore » into a single camera frame during the acquisition process, and then extracted upon readout using statistical CS inversion. Here we describe the background of CS and statistical methods in depth and simulate the frame rates and efficiencies for in-situ TEM experiments. Depending on the resolution and signal/noise of the image, it should be possible to increase the speed of any camera by more than an order of magnitude using this approach.« less

  4. Applying compressive sensing to TEM video: A substantial frame rate increase on any camera

    DOE PAGES

    Stevens, Andrew; Kovarik, Libor; Abellan, Patricia; ...

    2015-08-13

    One of the main limitations of imaging at high spatial and temporal resolution during in-situ transmission electron microscopy (TEM) experiments is the frame rate of the camera being used to image the dynamic process. While the recent development of direct detectors has provided the hardware to achieve frame rates approaching 0.1 ms, the cameras are expensive and must replace existing detectors. In this paper, we examine the use of coded aperture compressive sensing (CS) methods to increase the frame rate of any camera with simple, low-cost hardware modifications. The coded aperture approach allows multiple sub-frames to be coded and integratedmore » into a single camera frame during the acquisition process, and then extracted upon readout using statistical CS inversion. Here we describe the background of CS and statistical methods in depth and simulate the frame rates and efficiencies for in-situ TEM experiments. Depending on the resolution and signal/noise of the image, it should be possible to increase the speed of any camera by more than an order of magnitude using this approach.« less

  5. On-line estimation of error covariance parameters for atmospheric data assimilation

    NASA Technical Reports Server (NTRS)

    Dee, Dick P.

    1995-01-01

    A simple scheme is presented for on-line estimation of covariance parameters in statistical data assimilation systems. The scheme is based on a maximum-likelihood approach in which estimates are produced on the basis of a single batch of simultaneous observations. Simple-sample covariance estimation is reasonable as long as the number of available observations exceeds the number of tunable parameters by two or three orders of magnitude. Not much is known at present about model error associated with actual forecast systems. Our scheme can be used to estimate some important statistical model error parameters such as regionally averaged variances or characteristic correlation length scales. The advantage of the single-sample approach is that it does not rely on any assumptions about the temporal behavior of the covariance parameters: time-dependent parameter estimates can be continuously adjusted on the basis of current observations. This is of practical importance since it is likely to be the case that both model error and observation error strongly depend on the actual state of the atmosphere. The single-sample estimation scheme can be incorporated into any four-dimensional statistical data assimilation system that involves explicit calculation of forecast error covariances, including optimal interpolation (OI) and the simplified Kalman filter (SKF). The computational cost of the scheme is high but not prohibitive; on-line estimation of one or two covariance parameters in each analysis box of an operational bozed-OI system is currently feasible. A number of numerical experiments performed with an adaptive SKF and an adaptive version of OI, using a linear two-dimensional shallow-water model and artificially generated model error are described. The performance of the nonadaptive versions of these methods turns out to depend rather strongly on correct specification of model error parameters. These parameters are estimated under a variety of conditions, including uniformly distributed model error and time-dependent model error statistics.

  6. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

    PubMed

    Westfall, Jacob; Kenny, David A; Judd, Charles M

    2014-10-01

    Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.

  7. Is There a Critical Distance for Fickian Transport? - a Statistical Approach to Sub-Fickian Transport Modelling in Porous Media

    NASA Astrophysics Data System (ADS)

    Most, S.; Nowak, W.; Bijeljic, B.

    2014-12-01

    Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.

  8. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    PubMed Central

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  9. Uncertainty Quantification Techniques for Population Density Estimates Derived from Sparse Open Source Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stewart, Robert N; White, Devin A; Urban, Marie L

    2013-01-01

    The Population Density Tables (PDT) project at the Oak Ridge National Laboratory (www.ornl.gov) is developing population density estimates for specific human activities under normal patterns of life based largely on information available in open source. Currently, activity based density estimates are based on simple summary data statistics such as range and mean. Researchers are interested in improving activity estimation and uncertainty quantification by adopting a Bayesian framework that considers both data and sociocultural knowledge. Under a Bayesian approach knowledge about population density may be encoded through the process of expert elicitation. Due to the scale of the PDT effort whichmore » considers over 250 countries, spans 40 human activity categories, and includes numerous contributors, an elicitation tool is required that can be operationalized within an enterprise data collection and reporting system. Such a method would ideally require that the contributor have minimal statistical knowledge, require minimal input by a statistician or facilitator, consider human difficulties in expressing qualitative knowledge in a quantitative setting, and provide methods by which the contributor can appraise whether their understanding and associated uncertainty was well captured. This paper introduces an algorithm that transforms answers to simple, non-statistical questions into a bivariate Gaussian distribution as the prior for the Beta distribution. Based on geometric properties of the Beta distribution parameter feasibility space and the bivariate Gaussian distribution, an automated method for encoding is developed that responds to these challenging enterprise requirements. Though created within the context of population density, this approach may be applicable to a wide array of problem domains requiring informative priors for the Beta distribution.« less

  10. The questioned p value: clinical, practical and statistical significance.

    PubMed

    Jiménez-Paneque, Rosa

    2016-09-09

    The use of p-value and statistical significance have been questioned since the early 80s in the last century until today. Much has been discussed about it in the field of statistics and its applications, especially in Epidemiology and Public Health. As a matter of fact, the p-value and its equivalent, statistical significance, are difficult concepts to grasp for the many health professionals some way involved in research applied to their work areas. However, its meaning should be clear in intuitive terms although it is based on theoretical concepts of the field of Statistics. This paper attempts to present the p-value as a concept that applies to everyday life and therefore intuitively simple but whose proper use cannot be separated from theoretical and methodological elements of inherent complexity. The reasons behind the criticism received by the p-value and its isolated use are intuitively explained, mainly the need to demarcate statistical significance from clinical significance and some of the recommended remedies for these problems are approached as well. It finally refers to the current trend to vindicate the p-value appealing to the convenience of its use in certain situations and the recent statement of the American Statistical Association in this regard.

  11. A Methodology to Seperate and Analyze a Seismic Wide Angle Profile

    NASA Astrophysics Data System (ADS)

    Weinzierl, Wolfgang; Kopp, Heidrun

    2010-05-01

    General solutions of inverse problems can often be obtained through the introduction of probability distributions to sample the model space. We present a simple approach of defining an a priori space in a tomographic study and retrieve the velocity-depth posterior distribution by a Monte Carlo method. Utilizing a fitting routine designed for very low statistics to setup and analyze the obtained tomography results, it is possible to statistically separate the velocity-depth model space derived from the inversion of seismic refraction data. An example of a profile acquired in the Lesser Antilles subduction zone reveals the effectiveness of this approach. The resolution analysis of the structural heterogeneity includes a divergence analysis which proves to be capable of dissecting long wide-angle profiles for deep crust and upper mantle studies. The complete information of any parameterised physical system is contained in the a posteriori distribution. Methods for analyzing and displaying key properties of the a posteriori distributions of highly nonlinear inverse problems are therefore essential in the scope of any interpretation. From this study we infer several conclusions concerning the interpretation of the tomographic approach. By calculating a global as well as singular misfits of velocities we are able to map different geological units along a profile. Comparing velocity distributions with the result of a tomographic inversion along the profile we can mimic the subsurface structures in their extent and composition. The possibility of gaining a priori information for seismic refraction analysis by a simple solution to an inverse problem and subsequent resolution of structural heterogeneities through a divergence analysis is a new and simple way of defining a priori space and estimating the a posteriori mean and covariance in singular and general form. The major advantage of a Monte Carlo based approach in our case study is the obtained knowledge of velocity depth distributions. Certainly the decision of where to extract velocity information on the profile for setting up a Monte Carlo ensemble is limiting the a priori space. However, the general conclusion of analyzing the velocity field according to distinct reference distributions gives us the possibility to define the covariance according to any geological unit if we have a priori information on the velocity depth distributions. Using the wide angle data recorded across the Lesser Antilles arc, we are able to resolve a shallow feature like the backstop by a robust and simple divergence analysis. We demonstrate the effectiveness of the new methodology to extract some key features and properties from the inversion results by including information concerning the confidence level of results.

  12. A new method to address verification bias in studies of clinical screening tests: cervical cancer screening assays as an example.

    PubMed

    Xue, Xiaonan; Kim, Mimi Y; Castle, Philip E; Strickler, Howard D

    2014-03-01

    Studies to evaluate clinical screening tests often face the problem that the "gold standard" diagnostic approach is costly and/or invasive. It is therefore common to verify only a subset of negative screening tests using the gold standard method. However, undersampling the screen negatives can lead to substantial overestimation of the sensitivity and underestimation of the specificity of the diagnostic test. Our objective was to develop a simple and accurate statistical method to address this "verification bias." We developed a weighted generalized estimating equation approach to estimate, in a single model, the accuracy (eg, sensitivity/specificity) of multiple assays and simultaneously compare results between assays while addressing verification bias. This approach can be implemented using standard statistical software. Simulations were conducted to assess the proposed method. An example is provided using a cervical cancer screening trial that compared the accuracy of human papillomavirus and Pap tests, with histologic data as the gold standard. The proposed approach performed well in estimating and comparing the accuracy of multiple assays in the presence of verification bias. The proposed approach is an easy to apply and accurate method for addressing verification bias in studies of multiple screening methods. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. Full of Sound and Fury, Signifying Nothing? A Reply to Dave Hill's "Race and Class in Britain: A Critique of the Statistical Basis for Critical Race Theory in Britain"

    ERIC Educational Resources Information Center

    Gillborn, David

    2010-01-01

    This paper is a reply to an earlier piece by Dave Hill, in this journal, that attacked critical race theory (CRT) in general and my own work in particular. I begin with a brief introduction to CRT which highlights the differences between the reality, of a broad and dynamic approach, as opposed to the simple and monolithic version constructed by…

  14. Predicting the Ability of Marine Mammal Populations to Compensate for Behavioral Disturbances

    DTIC Science & Technology

    2015-09-30

    approaches, including simple theoretical models as well as statistical analysis of data rich conditions. Building on models developed for PCoD [2,3], we...conditions is population trajectory most likely to be affected (the central aim of PCoD ). For the revised model presented here, we include a population...averaged condition individuals (here used as a proxy for individual health as defined in PCoD ), and E is the quality of the environment in which the

  15. A multi-resolution approach for optimal mass transport

    NASA Astrophysics Data System (ADS)

    Dominitz, Ayelet; Angenent, Sigurd; Tannenbaum, Allen

    2007-09-01

    Optimal mass transport is an important technique with numerous applications in econometrics, fluid dynamics, automatic control, statistical physics, shape optimization, expert systems, and meteorology. Motivated by certain problems in image registration and medical image visualization, in this note, we describe a simple gradient descent methodology for computing the optimal L2 transport mapping which may be easily implemented using a multiresolution scheme. We also indicate how the optimal transport map may be computed on the sphere. A numerical example is presented illustrating our ideas.

  16. Perspectives on jet noise

    NASA Technical Reports Server (NTRS)

    Ribner, H. S.

    1981-01-01

    Jet noise is a byproduct of turbulence. Until recently turbulence was assumed to be known statistically, and jet noise was computed therefrom. As a result of new findings though on the behavior of vortices and instability waves, a more integrated view of the problem has been accepted lately. After presenting a simple view of jet noise, the paper attempts to resolve the apparent differences between Lighthill's and Lilley's interpretations of mean-flow shear, and examines a number of ad hoc approaches to jet noise suppression.

  17. Finite-sample and asymptotic sign-based tests for parameters of non-linear quantile regression with Markov noise

    NASA Astrophysics Data System (ADS)

    Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.

    2017-01-01

    One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.

  18. Discriminating between Graduates and Failure in the USAF Medical Laboratory Specialist School: An Explorative Approach.

    DTIC Science & Technology

    1981-12-01

    occurred on the Introversion Scale of the NMPI. 20 A review of the use of psychological tests on MT’s was accomplished by Driver and Feeley [1974...programs, Gondek [1981] has recommended that the best pro- cedure for variable inclusion when using a stepwise procedure is to use the threshold default...values supplied by the package, since no simple rules exist for determining entry or removal thresholds for partial F’s, tolerance statistics, or any of

  19. Molecular vibrational energy flow

    NASA Astrophysics Data System (ADS)

    Gruebele, M.; Bigwood, R.

    This article reviews some recent work in molecular vibrational energy flow (IVR), with emphasis on our own computational and experimental studies. We consider the problem in various representations, and use these to develop a family of simple models which combine specific molecular properties (e.g. size, vibrational frequencies) with statistical properties of the potential energy surface and wavefunctions. This marriage of molecular detail and statistical simplification captures trends of IVR mechanisms and survival probabilities beyond the abilities of purely statistical models or the computational limitations of full ab initio approaches. Of particular interest is IVR in the intermediate time regime, where heavy-atom skeletal modes take over the IVR process from hydrogenic motions even upon X H bond excitation. Experiments and calculations on prototype heavy-atom systems show that intermediate time IVR differs in many aspects from the early stages of hydrogenic mode IVR. As a result, IVR can be coherently frozen, with potential applications to selective chemistry.

  20. Artificial neural network study on organ-targeting peptides

    NASA Astrophysics Data System (ADS)

    Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun

    2010-01-01

    We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.

  1. Velocity bias in the distribution of dark matter halos

    NASA Astrophysics Data System (ADS)

    Baldauf, Tobias; Desjacques, Vincent; Seljak, Uroš

    2015-12-01

    The standard formalism for the coevolution of halos and dark matter predicts that any initial halo velocity bias rapidly decays to zero. We argue that, when the purpose is to compute statistics like power spectra etc., the coupling in the momentum conservation equation for the biased tracers must be modified. Our new formulation predicts the constancy in time of any statistical halo velocity bias present in the initial conditions, in agreement with peak theory. We test this prediction by studying the evolution of a conserved halo population in N -body simulations. We establish that the initial simulated halo density and velocity statistics show distinct features of the peak model and, thus, deviate from the simple local Lagrangian bias. We demonstrate, for the first time, that the time evolution of their velocity is in tension with the rapid decay expected in the standard approach.

  2. Effects of Nongray Opacity on Radiatively Driven Wolf-Rayet Winds

    NASA Astrophysics Data System (ADS)

    Onifer, A. J.; Gayley, K. G.

    2002-05-01

    Wolf-Rayet winds are characterized by their large momentum fluxes, and simulations of radiation driving have been increasingly successful in modeling these winds. Simple analytic approaches that help understand the most critical processes for copious momentum deposition already exist in the effectively gray approximation, but these have not been extended to more realistic nongray opacities. With this in mind, we have developed a simplified theory for describing the interaction of the stellar flux with nongray wind opacity. We replace the detailed line list with a set of statistical parameters that are sensitive not only to the strength but also the wavelength distribution of lines, incorporating as a free parameter the rate of photon frequency redistribution. We label the resulting flux-weighted opacity the statistical Sobolev- Rosseland (SSR) mean, and explore how changing these various statistical parameters affects the flux/opacity interaction. We wish to acknowledge NSF grant AST-0098155

  3. Simple instruments facilitating achievement of transanal total mesorectal excision in male patients.

    PubMed

    Xu, Chang; Song, Hua-Yu; Han, Shao-Liang; Ni, Shi-Chang; Zhang, Hu-Xiang; Xing, Chun-Gen

    2017-08-21

    To assess the efficacy of a modified approach with transanal total mesorectal excision (taTME) using simple customized instruments in male patients with low rectal cancer. A total of 115 male patients with low rectal cancer from December 2006 to August 2015 were retrospectively studied. All patients had a bulky tumor (tumor diameter ≥ 40 mm). Forty-one patients (group A) underwent a classical approach of transabdominal total mesorectal excision (TME) and transanal intersphincteric resection (ISR), and the other 74 patients (group B) underwent a modified approach with transabdominal TME, transanal ISR, and taTME. Some simple instruments including modified retractors and an anal dilator with a papilionaceous fixture were used to perform taTME. The operative time, quality of mesorectal excision, circumferential resection margin, local recurrence, and postoperative survival were evaluated. All 115 patients had successful sphincter preservation. The operative time in group B (240 min, range: 160-330 min) was significantly shorter than that in group A (280 min, range: 200-360 min; P = 0.000). Compared with group A, more complete distal mesorectum and total mesorectum were achieved in group B (100% vs 75.6%, P = 0.000; 90.5% vs 70.7%, P = 0.008, respectively). After 46.1 ± 25.6 mo follow-up, group B had a lower local recurrence rate and higher disease-free survival rate compared with group A, but these differences were not statistically significant (5.4% vs 14.6%, P = 0.093; 79.5% vs 65.1%, P = 0.130). Retrograde taTME with simple customized instruments can achieve high-quality TME, and it might be an effective and economical alternative for male patients with bulky tumors.

  4. Simple instruments facilitating achievement of transanal total mesorectal excision in male patients

    PubMed Central

    Xu, Chang; Song, Hua-Yu; Han, Shao-Liang; Ni, Shi-Chang; Zhang, Hu-Xiang; Xing, Chun-Gen

    2017-01-01

    AIM To assess the efficacy of a modified approach with transanal total mesorectal excision (taTME) using simple customized instruments in male patients with low rectal cancer. METHODS A total of 115 male patients with low rectal cancer from December 2006 to August 2015 were retrospectively studied. All patients had a bulky tumor (tumor diameter ≥ 40 mm). Forty-one patients (group A) underwent a classical approach of transabdominal total mesorectal excision (TME) and transanal intersphincteric resection (ISR), and the other 74 patients (group B) underwent a modified approach with transabdominal TME, transanal ISR, and taTME. Some simple instruments including modified retractors and an anal dilator with a papilionaceous fixture were used to perform taTME. The operative time, quality of mesorectal excision, circumferential resection margin, local recurrence, and postoperative survival were evaluated. RESULTS All 115 patients had successful sphincter preservation. The operative time in group B (240 min, range: 160-330 min) was significantly shorter than that in group A (280 min, range: 200-360 min; P = 0.000). Compared with group A, more complete distal mesorectum and total mesorectum were achieved in group B (100% vs 75.6%, P = 0.000; 90.5% vs 70.7%, P = 0.008, respectively). After 46.1 ± 25.6 mo follow-up, group B had a lower local recurrence rate and higher disease-free survival rate compared with group A, but these differences were not statistically significant (5.4% vs 14.6%, P = 0.093; 79.5% vs 65.1%, P = 0.130). CONCLUSION Retrograde taTME with simple customized instruments can achieve high-quality TME, and it might be an effective and economical alternative for male patients with bulky tumors. PMID:28883706

  5. A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.

    PubMed

    Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S

    2016-01-01

    Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.

  6. Spectral likelihood expansions for Bayesian inference

    NASA Astrophysics Data System (ADS)

    Nagel, Joseph B.; Sudret, Bruno

    2016-03-01

    A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.

  7. Statistical foundations of liquid-crystal theory: I. Discrete systems of rod-like molecules.

    PubMed

    Seguin, Brian; Fried, Eliot

    2012-12-01

    We develop a mechanical theory for systems of rod-like particles. Central to our approach is the assumption that the external power expenditure for any subsystem of rods is independent of the underlying frame of reference. This assumption is used to derive the basic balance laws for forces and torques. By considering inertial forces on par with other forces, these laws hold relative to any frame of reference, inertial or noninertial. Finally, we introduce a simple set of constitutive relations to govern the interactions between rods and find restrictions necessary and sufficient for these laws to be consistent with thermodynamics. Our framework provides a foundation for a statistical mechanical derivation of the macroscopic balance laws governing liquid crystals.

  8. Nongrayness Effects in Wolf-Rayet Wind Momentum Deposition

    NASA Astrophysics Data System (ADS)

    Onifer, A. J.; Gayley, K. G.

    2004-05-01

    Wolf-Rayet winds are characterized by their large momentum fluxes and optically thick winds. A simple analytic approach that helps to understand the most critical processes is the effecively gray approximation, but this has not been generalized to more realistic nongray opacities. We have developed a simplified theory for describing the interaction of the stellar flux with nongray wind opacity. We replace the detailed line list with a set of statistical parameters that are sensitive to the line strengths as well as the wavelength distribution of lines. We determine these statistical parameters for several real line lists, exploring the effects of temperature and density changes on the efficiency of momentum driving relative to gray opacity. We wish to acknowledge NSF grant AST-0098155.

  9. Recent applications of boxed molecular dynamics: a simple multiscale technique for atomistic simulations.

    PubMed

    Booth, Jonathan; Vazquez, Saulo; Martinez-Nunez, Emilio; Marks, Alison; Rodgers, Jeff; Glowacki, David R; Shalashilin, Dmitrii V

    2014-08-06

    In this paper, we briefly review the boxed molecular dynamics (BXD) method which allows analysis of thermodynamics and kinetics in complicated molecular systems. BXD is a multiscale technique, in which thermodynamics and long-time dynamics are recovered from a set of short-time simulations. In this paper, we review previous applications of BXD to peptide cyclization, solution phase organic reaction dynamics and desorption of ions from self-assembled monolayers (SAMs). We also report preliminary results of simulations of diamond etching mechanisms and protein unfolding in atomic force microscopy experiments. The latter demonstrate a correlation between the protein's structural motifs and its potential of mean force. Simulations of these processes by standard molecular dynamics (MD) is typically not possible, because the experimental time scales are very long. However, BXD yields well-converged and physically meaningful results. Compared with other methods of accelerated MD, our BXD approach is very simple; it is easy to implement, and it provides an integrated approach for simultaneously obtaining both thermodynamics and kinetics. It also provides a strategy for obtaining statistically meaningful dynamical results in regions of configuration space that standard MD approaches would visit only very rarely.

  10. A novel multivariate approach using science-based calibration for direct coating thickness determination in real-time NIR process monitoring.

    PubMed

    Möltgen, C-V; Herdling, T; Reich, G

    2013-11-01

    This study demonstrates an approach, using science-based calibration (SBC), for direct coating thickness determination on heart-shaped tablets in real-time. Near-Infrared (NIR) spectra were collected during four full industrial pan coating operations. The tablets were coated with a thin hydroxypropyl methylcellulose (HPMC) film up to a film thickness of 28 μm. The application of SBC permits the calibration of the NIR spectral data without using costly determined reference values. This is due to the fact that SBC combines classical methods to estimate the coating signal and statistical methods for the noise estimation. The approach enabled the use of NIR for the measurement of the film thickness increase from around 8 to 28 μm of four independent batches in real-time. The developed model provided a spectroscopic limit of detection for the coating thickness of 0.64 ± 0.03 μm root-mean square (RMS). In the commonly used statistical methods for calibration, such as Partial Least Squares (PLS), sufficiently varying reference values are needed for calibration. For thin non-functional coatings this is a challenge because the quality of the model depends on the accuracy of the selected calibration standards. The obvious and simple approach of SBC eliminates many of the problems associated with the conventional statistical methods and offers an alternative for multivariate calibration. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. Metaplot: a novel stata graph for assessing heterogeneity at a glance.

    PubMed

    Poorolajal, J; Mahmoodi, M; Majdzadeh, R; Fotouhi, A

    2010-01-01

    Heterogeneity is usually a major concern in meta-analysis. Although there are some statistical approaches for assessing variability across studies, here we present a new approach to heterogeneity using "MetaPlot" that investigate the influence of a single study on the overall heterogeneity. MetaPlot is a two-way (x, y) graph, which can be considered as a complementary graphical approach for testing heterogeneity. This method shows graphically as well as numerically the results of an influence analysis, in which Higgins' I(2) statistic with 95% (Confidence interval) CI are computed omitting one study in each turn and then are plotted against reciprocal of standard error (1/SE) or "precision". In this graph, "1/SE" lies on x axis and "I(2) results" lies on y axe. Having a first glance at MetaPlot, one can predict to what extent omission of a single study may influence the overall heterogeneity. The precision on x-axis enables us to distinguish the size of each trial. The graph describes I(2) statistic with 95% CI graphically as well as numerically in one view for prompt comparison. It is possible to implement MetaPlot for meta-analysis of different types of outcome data and summary measures. This method presents a simple graphical approach to identify an outlier and its effect on overall heterogeneity at a glance. We wish to suggest MetaPlot to Stata experts to prepare its module for the software.

  12. Metaplot: A Novel Stata Graph for Assessing Heterogeneity at a Glance

    PubMed Central

    Poorolajal, J; Mahmoodi, M; Majdzadeh, R; Fotouhi, A

    2010-01-01

    Background: Heterogeneity is usually a major concern in meta-analysis. Although there are some statistical approaches for assessing variability across studies, here we present a new approach to heterogeneity using “MetaPlot” that investigate the influence of a single study on the overall heterogeneity. Methods: MetaPlot is a two-way (x, y) graph, which can be considered as a complementary graphical approach for testing heterogeneity. This method shows graphically as well as numerically the results of an influence analysis, in which Higgins’ I2 statistic with 95% (Confidence interval) CI are computed omitting one study in each turn and then are plotted against reciprocal of standard error (1/SE) or “precision”. In this graph, “1/SE” lies on x axis and “I2 results” lies on y axe. Results: Having a first glance at MetaPlot, one can predict to what extent omission of a single study may influence the overall heterogeneity. The precision on x-axis enables us to distinguish the size of each trial. The graph describes I2 statistic with 95% CI graphically as well as numerically in one view for prompt comparison. It is possible to implement MetaPlot for meta-analysis of different types of outcome data and summary measures. Conclusion: This method presents a simple graphical approach to identify an outlier and its effect on overall heterogeneity at a glance. We wish to suggest MetaPlot to Stata experts to prepare its module for the software. PMID:23113013

  13. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  14. A simple approach to estimate daily loads of total, refractory, and labile organic carbon from their seasonal loads in a watershed.

    PubMed

    Ouyang, Ying; Grace, Johnny M; Zipperer, Wayne C; Hatten, Jeff; Dewey, Janet

    2018-05-22

    Loads of naturally occurring total organic carbons (TOC), refractory organic carbon (ROC), and labile organic carbon (LOC) in streams control the availability of nutrients and the solubility and toxicity of contaminants and affect biological activities through absorption of light and complex metals with production of carcinogenic compounds. Although computer models have become increasingly popular in understanding and management of TOC, ROC, and LOC loads in streams, the usefulness of these models hinges on the availability of daily data for model calibration and validation. Unfortunately, these daily data are usually insufficient and/or unavailable for most watersheds due to a variety of reasons, such as budget and time constraints. A simple approach was developed here to calculate daily loads of TOC, ROC, and LOC in streams based on their seasonal loads. We concluded that the predictions from our approach adequately match field measurements based on statistical comparisons between model calculations and field measurements. Our approach demonstrates that an increase in stream discharge results in increased stream TOC, ROC, and LOC concentrations and loads, although high peak discharge did not necessarily result in high peaks of TOC, ROC, and LOC concentrations and loads. The approach developed herein is a useful tool to convert seasonal loads of TOC, ROC, and LOC into daily loads in the absence of measured daily load data.

  15. Applications of Ergodic Theory to Coverage Analysis

    NASA Technical Reports Server (NTRS)

    Lo, Martin W.

    2003-01-01

    The study of differential equations, or dynamical systems in general, has two fundamentally different approaches. We are most familiar with the construction of solutions to differential equations. Another approach is to study the statistical behavior of the solutions. Ergodic Theory is one of the most developed methods to study the statistical behavior of the solutions of differential equations. In the theory of satellite orbits, the statistical behavior of the orbits is used to produce 'Coverage Analysis' or how often a spacecraft is in view of a site on the ground. In this paper, we consider the use of Ergodic Theory for Coverage Analysis. This allows us to greatly simplify the computation of quantities such as the total time for which a ground station can see a satellite without ever integrating the trajectory, see Lo 1,2. More over, for any quantity which is an integrable function of the ground track, its average may be computed similarly without the integration of the trajectory. For example, the data rate for a simple telecom system is a function of the distance between the satellite and the ground station. We show that such a function may be averaged using the Ergodic Theorem.

  16. Statistical complexity without explicit reference to underlying probabilities

    NASA Astrophysics Data System (ADS)

    Pennini, F.; Plastino, A.

    2018-06-01

    We show that extremely simple systems of a not too large number of particles can be simultaneously thermally stable and complex. To such an end, we extend the statistical complexity's notion to simple configurations of non-interacting particles, without appeal to probabilities, and discuss configurational properties.

  17. Additive effects in high-voltage layered-oxide cells: A statistics of mixtures approach

    DOE PAGES

    Sahore, Ritu; Peebles, Cameron; Abraham, Daniel P.; ...

    2017-07-20

    Li 1.03(Ni 0.5Mn 0.3Co 0.2) 0.97O 2 (NMC)-based coin cells containing the electrolyte additives vinylene carbonate (VC) and tris(trimethylsilyl)phosphite (TMSPi) in the range of 0-2 wt% were cycled between 3.0 and 4.4 V. The changes in capacity at rates of C/10 and C/1 and resistance at 60% state of charge were found to follow linear-with-time kinetic rate laws. Further, the C/10 capacity and resistance data were amenable to modeling by a statistics of mixtures approach. Applying physical meaning to the terms in the empirical models indicated that the interactions between the electrolyte and additives were not simple. For example, theremore » were strong, synergistic interactions between VC and TMSPi affecting C/10 capacity loss, as expected, but there were other, more subtle interactions between the electrolyte components. In conclusion, the interactions between these components controlled the C/10 capacity decline and resistance increase.« less

  18. Valid randomization-based p-values for partially post hoc subgroup analyses.

    PubMed

    Lee, Joseph J; Rubin, Donald B

    2015-10-30

    By 'partially post-hoc' subgroup analyses, we mean analyses that compare existing data from a randomized experiment-from which a subgroup specification is derived-to new, subgroup-only experimental data. We describe a motivating example in which partially post hoc subgroup analyses instigated statistical debate about a medical device's efficacy. We clarify the source of such analyses' invalidity and then propose a randomization-based approach for generating valid posterior predictive p-values for such partially post hoc subgroups. Lastly, we investigate the approach's operating characteristics in a simple illustrative setting through a series of simulations, showing that it can have desirable properties under both null and alternative hypotheses. Copyright © 2015 John Wiley & Sons, Ltd.

  19. Domain Adaption of Parsing for Operative Notes

    PubMed Central

    Wang, Yan; Pakhomov, Serguei; Ryan, James O.; Melton, Genevieve B.

    2016-01-01

    Background Full syntactic parsing of clinical text as a part of clinical natural language processing (NLP) is critical for a wide range of applications, such as identification of adverse drug reactions, patient cohort identification, and gene interaction extraction. Several robust syntactic parsers are publicly available to produce linguistic representations for sentences. However, these existing parsers are mostly trained on general English text and often require adaptation for optimal performance on clinical text. Our objective was to adapt an existing general English parser for the clinical text of operative reports via lexicon augmentation, statistics adjusting, and grammar rules modification based on a set of biomedical text. Method The Stanford unlexicalized probabilistic context-free grammar (PCFG) parser lexicon was expanded with SPECIALIST lexicon along with statistics collected from a limited set of operative notes tagged with a two of POS taggers (GENIA tagger and MedPost). The most frequently occurring verb entries of the SPECIALIST lexicon were adjusted based on manual review of verb usage in operative notes. Stanford parser grammar production rules were also modified based on linguistic features of operative reports. An analogous approach was then applied to the GENIA corpus to test the generalizability of this approach to biomedical text. Results The new unlexicalized PCFG parser extended with the extra lexicon from SPECIALIST along with accurate statistics collected from an operative note corpus tagged with GENIA POS tagger improved the parser performance by 2.26% from 87.64% to 89.90%. There was a progressive improvement with the addition of multiple approaches. Most of the improvement occurred with lexicon augmentation combined with statistics from the operative notes corpus. Application of this approach on the GENIA corpus showed that parsing performance was boosted by 3.81% with a simple new grammar and the addition of the GENIA corpus lexicon. Conclusion Using statistics collected from clinical text tagged with POS taggers along with proper modification of grammars and lexicons of an unlexicalized PCFG parser can improve parsing performance. PMID:25661593

  20. Gaussian statistics for palaeomagnetic vectors

    USGS Publications Warehouse

    Love, J.J.; Constable, C.G.

    2003-01-01

    With the aim of treating the statistics of palaeomagnetic directions and intensities jointly and consistently, we represent the mean and the variance of palaeomagnetic vectors, at a particular site and of a particular polarity, by a probability density function in a Cartesian three-space of orthogonal magnetic-field components consisting of a single (unimoda) non-zero mean, spherically-symmetrical (isotropic) Gaussian function. For palaeomagnetic data of mixed polarities, we consider a bimodal distribution consisting of a pair of such symmetrical Gaussian functions, with equal, but opposite, means and equal variances. For both the Gaussian and bi-Gaussian distributions, and in the spherical three-space of intensity, inclination, and declination, we obtain analytical expressions for the marginal density functions, the cumulative distributions, and the expected values and variances for each spherical coordinate (including the angle with respect to the axis of symmetry of the distributions). The mathematical expressions for the intensity and off-axis angle are closed-form and especially manageable, with the intensity distribution being Rayleigh-Rician. In the limit of small relative vectorial dispersion, the Gaussian (bi-Gaussian) directional distribution approaches a Fisher (Bingham) distribution and the intensity distribution approaches a normal distribution. In the opposite limit of large relative vectorial dispersion, the directional distributions approach a spherically-uniform distribution and the intensity distribution approaches a Maxwell distribution. We quantify biases in estimating the properties of the vector field resulting from the use of simple arithmetic averages, such as estimates of the intensity or the inclination of the mean vector, or the variances of these quantities. With the statistical framework developed here and using the maximum-likelihood method, which gives unbiased estimates in the limit of large data numbers, we demonstrate how to formulate the inverse problem, and how to estimate the mean and variance of the magnetic vector field, even when the data consist of mixed combinations of directions and intensities. We examine palaeomagnetic secular-variation data from Hawaii and Re??union, and although these two sites are on almost opposite latitudes, we find significant differences in the mean vector and differences in the local vectorial variances, with the Hawaiian data being particularly anisotropic. These observations are inconsistent with a description of the mean field as being a simple geocentric axial dipole and with secular variation being statistically symmetrical with respect to reflection through the equatorial plane. Finally, our analysis of palaeomagnetic acquisition data from the 1960 Kilauea flow in Hawaii and the Holocene Xitle flow in Mexico, is consistent with the widely held suspicion that directional data are more accurate than intensity data.

  1. Gaussian statistics for palaeomagnetic vectors

    NASA Astrophysics Data System (ADS)

    Love, J. J.; Constable, C. G.

    2003-03-01

    With the aim of treating the statistics of palaeomagnetic directions and intensities jointly and consistently, we represent the mean and the variance of palaeomagnetic vectors, at a particular site and of a particular polarity, by a probability density function in a Cartesian three-space of orthogonal magnetic-field components consisting of a single (unimodal) non-zero mean, spherically-symmetrical (isotropic) Gaussian function. For palaeomagnetic data of mixed polarities, we consider a bimodal distribution consisting of a pair of such symmetrical Gaussian functions, with equal, but opposite, means and equal variances. For both the Gaussian and bi-Gaussian distributions, and in the spherical three-space of intensity, inclination, and declination, we obtain analytical expressions for the marginal density functions, the cumulative distributions, and the expected values and variances for each spherical coordinate (including the angle with respect to the axis of symmetry of the distributions). The mathematical expressions for the intensity and off-axis angle are closed-form and especially manageable, with the intensity distribution being Rayleigh-Rician. In the limit of small relative vectorial dispersion, the Gaussian (bi-Gaussian) directional distribution approaches a Fisher (Bingham) distribution and the intensity distribution approaches a normal distribution. In the opposite limit of large relative vectorial dispersion, the directional distributions approach a spherically-uniform distribution and the intensity distribution approaches a Maxwell distribution. We quantify biases in estimating the properties of the vector field resulting from the use of simple arithmetic averages, such as estimates of the intensity or the inclination of the mean vector, or the variances of these quantities. With the statistical framework developed here and using the maximum-likelihood method, which gives unbiased estimates in the limit of large data numbers, we demonstrate how to formulate the inverse problem, and how to estimate the mean and variance of the magnetic vector field, even when the data consist of mixed combinations of directions and intensities. We examine palaeomagnetic secular-variation data from Hawaii and Réunion, and although these two sites are on almost opposite latitudes, we find significant differences in the mean vector and differences in the local vectorial variances, with the Hawaiian data being particularly anisotropic. These observations are inconsistent with a description of the mean field as being a simple geocentric axial dipole and with secular variation being statistically symmetrical with respect to reflection through the equatorial plane. Finally, our analysis of palaeomagnetic acquisition data from the 1960 Kilauea flow in Hawaii and the Holocene Xitle flow in Mexico, is consistent with the widely held suspicion that directional data are more accurate than intensity data.

  2. A simple rain attenuation model for earth-space radio links operating at 10-35 GHz

    NASA Technical Reports Server (NTRS)

    Stutzman, W. L.; Yon, K. M.

    1986-01-01

    The simple attenuation model has been improved from an earlier version and now includes the effect of wave polarization. The model is for the prediction of rain attenuation statistics on earth-space communication links operating in the 10-35 GHz band. Simple calculations produce attenuation values as a function of average rain rate. These together with rain rate statistics (either measured or predicted) can be used to predict annual rain attenuation statistics. In this paper model predictions are compared to measured data from a data base of 62 experiments performed in the U.S., Europe, and Japan. Comparisons are also made to predictions from other models.

  3. A perceptual space of local image statistics.

    PubMed

    Victor, Jonathan D; Thengone, Daniel J; Rizvi, Syed M; Conte, Mary M

    2015-12-01

    Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice - a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4min. In sum, local image statistics form a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. A perceptual space of local image statistics

    PubMed Central

    Victor, Jonathan D.; Thengone, Daniel J.; Rizvi, Syed M.; Conte, Mary M.

    2015-01-01

    Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice – a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14 min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4 min. In sum, local image statistics forms a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules. PMID:26130606

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palmintier, Bryan S; Bugbee, Bruce; Gotseff, Peter

    Capturing technical and economic impacts of solar photovoltaics (PV) and other distributed energy resources (DERs) on electric distribution systems can require high-time resolution (e.g. 1 minute), long-duration (e.g. 1 year) simulations. However, such simulations can be computationally prohibitive, particularly when including complex control schemes in quasi-steady-state time series (QSTS) simulation. Various approaches have been used in the literature to down select representative time segments (e.g. days), but typically these are best suited for lower time resolutions or consider only a single data stream (e.g. PV production) for selection. We present a statistical approach that combines stratified sampling and bootstrapping tomore » select representative days while also providing a simple method to reassemble annual results. We describe the approach in the context of a recent study with a utility partner. This approach enables much faster QSTS analysis by simulating only a subset of days, while maintaining accurate annual estimates.« less

  6. Comparing Indirect Effects in Different Groups in Single-Group and Multi-Group Structural Equation Models

    PubMed Central

    Ryu, Ehri; Cheong, Jeewon

    2017-01-01

    In this article, we evaluated the performance of statistical methods in single-group and multi-group analysis approaches for testing group difference in indirect effects and for testing simple indirect effects in each group. We also investigated whether the performance of the methods in the single-group approach was affected when the assumption of equal variance was not satisfied. The assumption was critical for the performance of the two methods in the single-group analysis: the method using a product term for testing the group difference in a single path coefficient, and the Wald test for testing the group difference in the indirect effect. Bootstrap confidence intervals in the single-group approach and all methods in the multi-group approach were not affected by the violation of the assumption. We compared the performance of the methods and provided recommendations. PMID:28553248

  7. Modeling gene expression measurement error: a quasi-likelihood approach

    PubMed Central

    Strimmer, Korbinian

    2003-01-01

    Background Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale). Results Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects. Conclusions The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression. PMID:12659637

  8. Single particle fluorescence: a simple experimental approach to evaluate coincidence effects.

    PubMed

    Wu, Xihong; Omenetto, Nicoló; Smith, Benjamin W; Winefordner, James D

    2007-07-01

    Real-time characterization of the chemical and physical properties of individual aerosol particles is an important issue in environmental studies. A well-established way of accomplishing this task relies on the use of laser-induced fluorescence or laser ionization mass spectrometry. We describe here a simple approach aimed at experimentally verifying that single particles are indeed addressed. The approach has been tested with a system consisting of a series of aerodynamic lenses to form a beam of dye-doped particles aerosolized from a solution of known concentration with a medical nebulizer. Two independent spectral detection channels simultaneously measure the fluorescence signals generated in two different spectral regions by the passage of a mixture of two dye-doped particles through a focused laser beam in a vacuum chamber. Coincidence effects, arising from the simultaneous observation of both fluorescence emissions, can then be directly observed. Both dual-color fluorescence and pulse height distribution have been analyzed. As expected, the probability of single- or multiple-particle interaction strongly depends on the particle flux in the chamber, which is related to the concentration of particles in the nebulized solution. In our case, to achieve a two-particle coincidence smaller than 10%, a particle concentration lower than 1.2x10(5) particles/mL is required. Moreover, it was found that the experimental observations are in agreement with a simple mathematical model based on Poisson statistics. Although the results obtained refer to particle concentrations in solution, our approach can equally be applicable to experiments involving direct air sampling, provided that the number density of particles in air can be measured a priori, e.g., with a particle counter.

  9. Intellicount: High-Throughput Quantification of Fluorescent Synaptic Protein Puncta by Machine Learning

    PubMed Central

    Fantuzzo, J. A.; Mirabella, V. R.; Zahn, J. D.

    2017-01-01

    Abstract Synapse formation analyses can be performed by imaging and quantifying fluorescent signals of synaptic markers. Traditionally, these analyses are done using simple or multiple thresholding and segmentation approaches or by labor-intensive manual analysis by a human observer. Here, we describe Intellicount, a high-throughput, fully-automated synapse quantification program which applies a novel machine learning (ML)-based image processing algorithm to systematically improve region of interest (ROI) identification over simple thresholding techniques. Through processing large datasets from both human and mouse neurons, we demonstrate that this approach allows image processing to proceed independently of carefully set thresholds, thus reducing the need for human intervention. As a result, this method can efficiently and accurately process large image datasets with minimal interaction by the experimenter, making it less prone to bias and less liable to human error. Furthermore, Intellicount is integrated into an intuitive graphical user interface (GUI) that provides a set of valuable features, including automated and multifunctional figure generation, routine statistical analyses, and the ability to run full datasets through nested folders, greatly expediting the data analysis process. PMID:29218324

  10. A Simple and Robust Statistical Test for Detecting the Presence of Recombination

    PubMed Central

    Bruen, Trevor C.; Philippe, Hervé; Bryant, David

    2006-01-01

    Recombination is a powerful evolutionary force that merges historically distinct genotypes. But the extent of recombination within many organisms is unknown, and even determining its presence within a set of homologous sequences is a difficult question. Here we develop a new statistic, Φw, that can be used to test for recombination. We show through simulation that our test can discriminate effectively between the presence and absence of recombination, even in diverse situations such as exponential growth (star-like topologies) and patterns of substitution rate correlation. A number of other tests, Max χ2, NSS, a coalescent-based likelihood permutation test (from LDHat), and correlation of linkage disequilibrium (both r2 and |D′|) with distance, all tend to underestimate the presence of recombination under strong population growth. Moreover, both Max χ2 and NSS falsely infer the presence of recombination under a simple model of mutation rate correlation. Results on empirical data show that our test can be used to detect recombination between closely as well as distantly related samples, regardless of the suspected rate of recombination. The results suggest that Φw is one of the best approaches to distinguish recurrent mutation from recombination in a wide variety of circumstances. PMID:16489234

  11. Formalized Conflicts Detection Based on the Analysis of Multiple Emails: An Approach Combining Statistics and Ontologies

    NASA Astrophysics Data System (ADS)

    Zakaria, Chahnez; Curé, Olivier; Salzano, Gabriella; Smaïli, Kamel

    In Computer Supported Cooperative Work (CSCW), it is crucial for project leaders to detect conflicting situations as early as possible. Generally, this task is performed manually by studying a set of documents exchanged between team members. In this paper, we propose a full-fledged automatic solution that identifies documents, subjects and actors involved in relational conflicts. Our approach detects conflicts in emails, probably the most popular type of documents in CSCW, but the methods used can handle other text-based documents. These methods rely on the combination of statistical and ontological operations. The proposed solution is decomposed in several steps: (i) we enrich a simple negative emotion ontology with terms occuring in the corpus of emails, (ii) we categorize each conflicting email according to the concepts of this ontology and (iii) we identify emails, subjects and team members involved in conflicting emails using possibilistic description logic and a set of proposed measures. Each of these steps are evaluated and validated on concrete examples. Moreover, this approach's framework is generic and can be easily adapted to domains other than conflicts, e.g. security issues, and extended with operations making use of our proposed set of measures.

  12. A review of statistical updating methods for clinical prediction models.

    PubMed

    Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew

    2018-01-01

    A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.

  13. Investigating implicit statistical learning mechanisms through contextual cueing.

    PubMed

    Goujon, Annabelle; Didierjean, André; Thorpe, Simon

    2015-09-01

    Since its inception, the contextual cueing (CC) paradigm has generated considerable interest in various fields of cognitive sciences because it constitutes an elegant approach to understanding how statistical learning (SL) mechanisms can detect contextual regularities during a visual search. In this article we review and discuss five aspects of CC: (i) the implicit nature of learning, (ii) the mechanisms involved in CC, (iii) the mediating factors affecting CC, (iv) the generalization of CC phenomena, and (v) the dissociation between implicit and explicit CC phenomena. The findings suggest that implicit SL is an inherent component of ongoing processing which operates through clustering, associative, and reinforcement processes at various levels of sensory-motor processing, and might result from simple spike-timing-dependent plasticity. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Crossed beam (E--VRT) energy transfer experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hertel, I.V.; Hofmann, H.; Rost, K.A.

    A molecular crossed beam apparatus which has been developed to perform electronic-to-vibrational, rotational, translational (E--V,R,T) energy transfer studies is described. Its capabilities are illustrated on the basis of a number of energy transfer spectra obtained for collision systems of the type Na*+Mol(..nu..,j) ..-->..Na+Mol (..nu..',j') where Na* represents a laser excited sodium atom and Mol a diatomic or polyatomic molecule. Because of the lack of reliable dynamic theories on quenching processes, statistical approaches such as the ''linearly forced harmonic oscillator'' and ''prior distributions'' have been used to model the experimental spectra. The agreement is found to be satisfactory, so even suchmore » simple statistics may be useful to describe (E--V,R,T) energy transfer processes in collision systems with small molecules.« less

  15. A practical approach to language complexity: a Wikipedia case study.

    PubMed

    Yasseri, Taha; Kornai, András; Kertész, János

    2012-01-01

    In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.

  16. The power to detect linkage in complex disease by means of simple LOD-score analyses.

    PubMed Central

    Greenberg, D A; Abreu, P; Hodge, S E

    1998-01-01

    Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328

  17. The power to detect linkage in complex disease by means of simple LOD-score analyses.

    PubMed

    Greenberg, D A; Abreu, P; Hodge, S E

    1998-09-01

    Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.

  18. Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    H, Vathsala; Koolagudi, Shashidhar G.

    2017-10-01

    This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).

  19. An empirical Bayes approach to analyzing recurring animal surveys

    USGS Publications Warehouse

    Johnson, D.H.

    1989-01-01

    Recurring estimates of the size of animal populations are often required by biologists or wildlife managers. Because of cost or other constraints, estimates frequently lack the accuracy desired but cannot readily be improved by additional sampling. This report proposes a statistical method employing empirical Bayes (EB) estimators as alternatives to those customarily used to estimate population size, and evaluates them by a subsampling experiment on waterfowl surveys. EB estimates, especially a simple limited-translation version, were more accurate and provided shorter confidence intervals with greater coverage probabilities than customary estimates.

  20. The brain MRI classification problem from wavelets perspective

    NASA Astrophysics Data System (ADS)

    Bendib, Mohamed M.; Merouani, Hayet F.; Diaba, Fatma

    2015-02-01

    Haar and Daubechies 4 (DB4) are the most used wavelets for brain MRI (Magnetic Resonance Imaging) classification. The former is simple and fast to compute while the latter is more complex and offers a better resolution. This paper explores the potential of both of them in performing Normal versus Pathological discrimination on the one hand, and Multiclassification on the other hand. The Whole Brain Atlas is used as a validation database, and the Random Forest (RF) algorithm is employed as a learning approach. The achieved results are discussed and statistically compared.

  1. The change and development of statistical methods used in research articles in child development 1930-2010.

    PubMed

    Køppe, Simo; Dammeyer, Jesper

    2014-09-01

    The evolution of developmental psychology has been characterized by the use of different quantitative and qualitative methods and procedures. But how does the use of methods and procedures change over time? This study explores the change and development of statistical methods used in articles published in Child Development from 1930 to 2010. The methods used in every article in the first issue of every volume were categorized into four categories. Until 1980 relatively simple statistical methods were used. During the last 30 years there has been an explosive use of more advanced statistical methods employed. The absence of statistical methods or use of simple methods had been eliminated.

  2. A simple approach to polymer mixture miscibility.

    PubMed

    Higgins, Julia S; Lipson, Jane E G; White, Ronald P

    2010-03-13

    Polymeric mixtures are important materials, but the control and understanding of mixing behaviour poses problems. The original Flory-Huggins theoretical approach, using a lattice model to compute the statistical thermodynamics, provides the basic understanding of the thermodynamic processes involved but is deficient in describing most real systems, and has little or no predictive capability. We have developed an approach using a lattice integral equation theory, and in this paper we demonstrate that this not only describes well the literature data on polymer mixtures but allows new insights into the behaviour of polymers and their mixtures. The characteristic parameters obtained by fitting the data have been successfully shown to be transferable from one dataset to another, to be able to correctly predict behaviour outside the experimental range of the original data and to allow meaningful comparisons to be made between different polymer mixtures.

  3. A Statistical Physicist's Approach to Biological Motion: From the the Kinesin Walk to Muscle Contraction

    NASA Astrophysics Data System (ADS)

    Vicsek, Tamas

    1997-03-01

    It is demonstrated that a wide range of experimental results on biological motion can be successfully interpreted in terms of statistical physics motivated models taking into account the relevant microscopic details of motor proteins and allowing analytic solutions. Two important examples are considered, i) the motion of a single kinesin molecule along microtubules inside individual cells and ii) muscle contraction which is a macroscopic phenomenon due to the collective action of a large number of myosin heads along actin filaments. i) Recently individual two-headed kinesin molecules have been studied in in vitro motility assays revealing a number of their peculiar transport properties. Here we propose a simple and robust model for the kinesin stepping process with elastically coupled Brownian heads showing all of these properties. The analytic treatment of our model results in a very good fit to the experimental data and practically has no free parameters. ii) Myosin is an ATPase enzyme that converts the chemical energy stored in ATP molecules into mechanical work. During muscle contraction, the myosin cross-bridges attach to the actin filaments and exert force on them yielding a relative sliding of the actin and myosin filaments. In this paper we present a simple mechanochemical model for the cross-bridge interaction involving the relevant kinetic data and providing simple analytic solutions for the mechanical properties of muscle contraction, such as the force-velocity relationship or the relative number of the attached cross-bridges. So far the only analytic formula which could be fitted to the measured force-velocity curves has been the well known Hill equation containing parameters lacking clear microscopic origin. The main advantages of our new approach are that it explicitly connects the mechanical data with the kinetic data and the concentration of the ATP and ATPase products and as such it leads to new analytic solutions which agree extremely well with a wide range of experimental curves, while the parameters of the corresponding expressions have well defined microscopic meaning.

  4. Enhancing communication with distressed patients, families and colleagues: the value of the Simple Skills Secrets model of communication for the nursing and healthcare workforce.

    PubMed

    Jack, Barbara A; O'Brien, Mary R; Kirton, Jennifer A; Marley, Kate; Whelan, Alison; Baldry, Catherine R; Groves, Karen E

    2013-12-01

    Good communication skills in healthcare professionals are acknowledged as a core competency. The consequences of poor communication are well-recognised with far reaching costs including; reduced treatment compliance, higher psychological morbidity, incorrect or delayed diagnoses, and increased complaints. The Simple Skills Secrets is a visual, easily memorised, model of communication for healthcare staff to respond to the distress or unanswerable questions of patients, families and colleagues. To explore the impact of the Simple Skills Secrets model of communication training on the general healthcare workforce. An evaluation methodology encompassing a quantitative pre- and post-course testing of confidence and willingness to have conversations with distressed patients, carers and colleagues and qualitative semi-structured telephone interviews with participants 6-8 weeks post course. During the evaluation, 153 staff undertook the training of which 149 completed the pre- and post-training questionnaire. A purposive sampling approach was adopted for the follow up qualitative interviews and 14 agreed to participate. There is a statistically significant improvement in both willingness and confidence for all categories; (overall confidence score, t(148)=-15.607, p=<0.05 overall willingness score, t(148)=-10.878, p=<0.05) with the greatest improvement in confidence in communicating with carers (pre-course mean 6.171 to post course mean 8.171). There is no statistical significant difference between the registered and support staff. Several themes were obtained from the qualitative data, including: a method of communicating differently, a structured approach, thinking differently and additional skills. The value of the model in clinical practice was reported. This model can be suggested as increasing the confidence of staff, in dealing with a myriad of situations which, if handled appropriately can lead to increased patient and carers' satisfaction. Empowering staff appears to have increased their willingness to undertake these conversations, which could lead to earlier intervention and minimise distress. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Electron transfer statistics and thermal fluctuations in molecular junctions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goswami, Himangshu Prabal; Harbola, Upendra

    2015-02-28

    We derive analytical expressions for probability distribution function (PDF) for electron transport in a simple model of quantum junction in presence of thermal fluctuations. Our approach is based on the large deviation theory combined with the generating function method. For large number of electrons transferred, the PDF is found to decay exponentially in the tails with different rates due to applied bias. This asymmetry in the PDF is related to the fluctuation theorem. Statistics of fluctuations are analyzed in terms of the Fano factor. Thermal fluctuations play a quantitative role in determining the statistics of electron transfer; they tend tomore » suppress the average current while enhancing the fluctuations in particle transfer. This gives rise to both bunching and antibunching phenomena as determined by the Fano factor. The thermal fluctuations and shot noise compete with each other and determine the net (effective) statistics of particle transfer. Exact analytical expression is obtained for delay time distribution. The optimal values of the delay time between successive electron transfers can be lowered below the corresponding shot noise values by tuning the thermal effects.« less

  6. Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations.

    PubMed

    Li, Qizhai; Hu, Jiyuan; Ding, Juan; Zheng, Gang

    2014-04-01

    A classical approach to combine independent test statistics is Fisher's combination of $p$-values, which follows the $\\chi ^2$ distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.

  7. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  8. Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.

    PubMed

    Monica, Stefania; Ferrari, Gianluigi

    2018-05-17

    Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.

  9. Reply to "Comment on `Third law of thermodynamics as a key test of generalized entropies' "

    NASA Astrophysics Data System (ADS)

    Bento, E. P.; Viswanathan, G. M.; da Luz, M. G. E.; Silva, R.

    2015-07-01

    In Bento et al. [Phys. Rev. E 91, 039901 (2015), 10.1103/PhysRevE.91.039901] we develop a method to verify if an arbitrary generalized statistics does or does not obey the third law of thermodynamics. As examples, we address two important formulations, Kaniadakis and Tsallis. In their Comment on the paper, Bagci and Oikonomou suggest that our examination of the Tsallis statistics is valid only for q ≥1 , using arguments like there is no distribution maximizing the Tsallis entropy for the interval q <0 (in which the third law is not verified) compatible with the problem energy expression. In this Reply, we first (and most importantly) show that the Comment misses the point. In our original work we have considered the now already standard construction of the Tsallis statistics. So, if indeed such statistics lacks a maximization principle (a fact irrelevant in our protocol), this is an inherent feature of the statistics itself and not a problem with our analysis. Second, some arguments used by Bagci and Oikonomou (for 0

  10. Approximate message passing with restricted Boltzmann machine priors

    NASA Astrophysics Data System (ADS)

    Tramel, Eric W.; Drémeau, Angélique; Krzakala, Florent

    2016-07-01

    Approximate message passing (AMP) has been shown to be an excellent statistical approach to signal inference and compressed sensing problems. The AMP framework provides modularity in the choice of signal prior; here we propose a hierarchical form of the Gauss-Bernoulli prior which utilizes a restricted Boltzmann machine (RBM) trained on the signal support to push reconstruction performance beyond that of simple i.i.d. priors for signals whose support can be well represented by a trained binary RBM. We present and analyze two methods of RBM factorization and demonstrate how these affect signal reconstruction performance within our proposed algorithm. Finally, using the MNIST handwritten digit dataset, we show experimentally that using an RBM allows AMP to approach oracle-support performance.

  11. Cover estimation and payload location using Markov random fields

    NASA Astrophysics Data System (ADS)

    Quach, Tu-Thach

    2014-02-01

    Payload location is an approach to find the message bits hidden in steganographic images, but not necessarily their logical order. Its success relies primarily on the accuracy of the underlying cover estimators and can be improved if more estimators are used. This paper presents an approach based on Markov random field to estimate the cover image given a stego image. It uses pairwise constraints to capture the natural two-dimensional statistics of cover images and forms a basis for more sophisticated models. Experimental results show that it is competitive against current state-of-the-art estimators and can locate payload embedded by simple LSB steganography and group-parity steganography. Furthermore, when combined with existing estimators, payload location accuracy improves significantly.

  12. The Bootstrap, the Jackknife, and the Randomization Test: A Sampling Taxonomy.

    PubMed

    Rodgers, J L

    1999-10-01

    A simple sampling taxonomy is defined that shows the differences between and relationships among the bootstrap, the jackknife, and the randomization test. Each method has as its goal the creation of an empirical sampling distribution that can be used to test statistical hypotheses, estimate standard errors, and/or create confidence intervals. Distinctions between the methods can be made based on the sampling approach (with replacement versus without replacement) and the sample size (replacing the whole original sample versus replacing a subset of the original sample). The taxonomy is useful for teaching the goals and purposes of resampling schemes. An extension of the taxonomy implies other possible resampling approaches that have not previously been considered. Univariate and multivariate examples are presented.

  13. Simple surgical approach with high-frequency radio-wave electrosurgery for conjunctivochalasis.

    PubMed

    Youm, Dong Ju; Kim, Joon Mo; Choi, Chul Young

    2010-11-01

    To introduce a new simple surgical approach with high-frequency radio-wave electrosurgery to reduce conjunctivochalasis (CCh). Prospective, noncomparative, interventional case series analysis. Twelve patients (20 eyes) with CCh were recruited from the outpatient service of the Department of Ophthalmology, Kangbuk Samsung Hospital, Seoul, Korea. On the inferior bulbar conjunctiva, subconjunctival coagulation was performed with a fine-needle electrode using a high-frequency radio-wave electrosurgical unit (Ellman Surgitron; Ellman International, Inc., Hewlett, NY) in coagulation mode. Conjunctivochalasis grade; epiphora and dry eye symptoms (the Ocular Surface Disease Index [OSDI]; Allergan Inc., Irvine, CA, holds the copyright); and intraoperative and postoperative complications. Eighteen eyes (90%) recovered a smooth, wet, and noninflamed conjunctival surface within 1 month and remained stable for a follow-up period of 3 months. At 3 months postoperatively, 18 eyes (90%) had grade 0 CCh. There was a statistically significant decrease of the OSDI score at 3 months postoperatively (P < 0.001). A surgical approach with high-frequency radio-wave electrosurgery produced a significant reduction in CCh and an improvement in symptoms. Radio-wave surgical techniques represent a favorable alternative to surgical treatment of CCh. The author(s) have no proprietary or commercial interest in any materials discussed in this article. Copyright © 2010 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  14. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  15. Data-adaptive test statistics for microarray data.

    PubMed

    Mukherjee, Sach; Roberts, Stephen J; van der Laan, Mark J

    2005-09-01

    An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. By request to the corresponding author.

  16. On the statistical mechanics of species abundance distributions.

    PubMed

    Bowler, Michael G; Kelly, Colleen K

    2012-09-01

    A central issue in ecology is that of the factors determining the relative abundance of species within a natural community. The proper application of the principles of statistical physics to species abundance distributions (SADs) shows that simple ecological properties could account for the near universal features observed. These properties are (i) a limit on the number of individuals in an ecological guild and (ii) per capita birth and death rates. They underpin the neutral theory of Hubbell (2001), the master equation approach of Volkov et al. (2003, 2005) and the idiosyncratic (extreme niche) theory of Pueyo et al. (2007); they result in an underlying log series SAD, regardless of neutral or niche dynamics. The success of statistical mechanics in this application implies that communities are in dynamic equilibrium and hence that niches must be flexible and that temporal fluctuations on all sorts of scales are likely to be important in community structure. Copyright © 2012 Elsevier Inc. All rights reserved.

  17. Langevin modelling of high-frequency Hang-Seng index data

    NASA Astrophysics Data System (ADS)

    Tang, Lei-Han

    2003-06-01

    Accurate statistical characterization of financial time series, such as compound stock indices, foreign currency exchange rates, etc., is fundamental to investment risk management, pricing of derivative products and financial decision making. Traditionally, such data were analyzed and modeled from a purely statistics point of view, with little concern on the specifics of financial markets. Increasingly, however, attention has been paid to the underlying economic forces and the collective behavior of investors. Here we summarize a novel approach to the statistical modeling of a major stock index (the Hang Seng index). Based on mathematical results previously derived in the fluid turbulence literature, we show that a Langevin equation with a variable noise amplitude correctly reproduces the ubiquitous fat tails in the probability distribution of intra-day price moves. The form of the Langevin equation suggests that, despite the extremely complex nature of financial concerns and investment strategies at the individual's level, there exist simple universal rules governing the high-frequency price move in a stock market.

  18. Statistical approaches to account for missing values in accelerometer data: Applications to modeling physical activity.

    PubMed

    Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki

    2018-04-01

    Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.

  19. A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts 1

    PubMed Central

    Cicchetti, Domenic V.; Conn, Harold O.

    1976-01-01

    Observer variability affects virtually all aspects of clinical medicine and investigation. One important aspect, not previously examined, is the selection of abstracts for presentation at national medical meetings. In the present study, 109 abstracts, submitted to the American Association for the Study of Liver Disease, were evaluated by three “blind” reviewers for originality, design-execution, importance, and overall scientific merit. Of the 77 abstracts rated for all parameters by all observers, interobserver agreement ranged between 81 and 88%. However, corresponding intraclass correlations varied between 0.16 (approaching statistical significance) and 0.37 (p < 0.01). Specific tests of systematic differences in scoring revealed statistically significant levels of observer bias on most of the abstract components. Moreover, the mean differences in interobserver ratings were quite small compared to the standard deviations of these differences. These results emphasize the importance of evaluating the simple percentage of rater agreement within the broader context of observer variability and systematic bias. PMID:997596

  20. A critique of Rasch residual fit statistics.

    PubMed

    Karabatsos, G

    2000-01-01

    In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.

  1. Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

    PubMed

    Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

    2013-08-08

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  2. Nomogram for sample size calculation on a straightforward basis for the kappa statistic.

    PubMed

    Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo

    2014-09-01

    Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. [Comparative study of the repair of full thickness tear of the supraspinatus by means of "single row" or "suture bridge" techniques].

    PubMed

    Arroyo-Hernández, M; Mellado-Romero, M A; Páramo-Díaz, P; Martín-López, C M; Cano-Egea, J M; Vilá Y Rico, J

    2015-01-01

    The purpose of this study is to analyze if there is any difference between the arthroscopic reparation of full-thickness supraspinatus tears with simple row technique versus suture bridge technique. We accomplished a retrospective study of 123 patients with full-thickness supraspinatus tears between January 2009 and January 2013 in our hospital. There were 60 simple row reparations, and 63 suture bridge ones. The mean age in the simple row group was 62.9, and in the suture bridge group was 63.3 years old. There were more women than men in both groups (67%). All patients were studied using the Constant test. The mean Constant test in the suture bridge group was 76.7, and in the simple row group was 72.4. We have also accomplished a statistical analysis of each Constant item. Strength was higher in the suture bridge group, with a significant statistical difference (p 0.04). The range of movement was also greater in the suture bridge group, but was not statistically significant. Suture bridge technique has better clinical results than single row reparations, but the difference is not statistically significant (p = 0.298).

  4. Spatial Differentiation of Landscape Values in the Murray River Region of Victoria, Australia

    NASA Astrophysics Data System (ADS)

    Zhu, Xuan; Pfueller, Sharron; Whitelaw, Paul; Winter, Caroline

    2010-05-01

    This research advances the understanding of the location of perceived landscape values through a statistically based approach to spatial analysis of value densities. Survey data were obtained from a sample of people living in and using the Murray River region, Australia, where declining environmental quality prompted a reevaluation of its conservation status. When densities of 12 perceived landscape values were mapped using geographic information systems (GIS), valued places clustered along the entire river bank and in associated National/State Parks and reserves. While simple density mapping revealed high value densities in various locations, it did not indicate what density of a landscape value could be regarded as a statistically significant hotspot or distinguish whether overlapping areas of high density for different values indicate identical or adjacent locations. A spatial statistic Getis-Ord Gi* was used to indicate statistically significant spatial clusters of high value densities or “hotspots”. Of 251 hotspots, 40% were for single non-use values, primarily spiritual, therapeutic or intrinsic. Four hotspots had 11 landscape values. Two, lacking economic value, were located in ecologically important river red gum forests and two, lacking wilderness value, were near the major towns of Echuca-Moama and Albury-Wodonga. Hotspots for eight values showed statistically significant associations with another value. There were high associations between learning and heritage values while economic and biological diversity values showed moderate associations with several other direct and indirect use values. This approach may improve confidence in the interpretation of spatial analysis of landscape values by enhancing understanding of value relationships.

  5. Markov Logic Networks in the Analysis of Genetic Data

    PubMed Central

    Sakhanenko, Nikita A.

    2010-01-01

    Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249

  6. Bayesian estimation of the transmissivity spatial structure from pumping test data

    NASA Astrophysics Data System (ADS)

    Demir, Mehmet Taner; Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier

    2017-06-01

    Estimating the statistical parameters (mean, variance, and integral scale) that define the spatial structure of the transmissivity or hydraulic conductivity fields is a fundamental step for the accurate prediction of subsurface flow and contaminant transport. In practice, the determination of the spatial structure is a challenge because of spatial heterogeneity and data scarcity. In this paper, we describe a novel approach that uses time drawdown data from multiple pumping tests to determine the transmissivity statistical spatial structure. The method builds on the pumping test interpretation procedure of Copty et al. (2011) (Continuous Derivation method, CD), which uses the time-drawdown data and its time derivative to estimate apparent transmissivity values as a function of radial distance from the pumping well. A Bayesian approach is then used to infer the statistical parameters of the transmissivity field by combining prior information about the parameters and the likelihood function expressed in terms of radially-dependent apparent transmissivities determined from pumping tests. A major advantage of the proposed Bayesian approach is that the likelihood function is readily determined from randomly generated multiple realizations of the transmissivity field, without the need to solve the groundwater flow equation. Applying the method to synthetically-generated pumping test data, we demonstrate that, through a relatively simple procedure, information on the spatial structure of the transmissivity may be inferred from pumping tests data. It is also shown that the prior parameter distribution has a significant influence on the estimation procedure, given the non-uniqueness of the estimation procedure. Results also indicate that the reliability of the estimated transmissivity statistical parameters increases with the number of available pumping tests.

  7. Towards first principle medical diagnostics: on the importance of disease-disease and sign-sign interactions

    NASA Astrophysics Data System (ADS)

    Ramezanpour, Abolfazl; Mashaghi, Alireza

    2017-07-01

    A fundamental problem in medicine and biology is to assign states, e.g. healthy or diseased, to cells, organs or individuals. State assignment or making a diagnosis is often a nontrivial and challenging process and, with the advent of omics technologies, the diagnostic challenge is becoming more and more serious. The challenge lies not only in the increasing number of measured properties and dynamics of the system (e.g. cell or human body) but also in the co-evolution of multiple states and overlapping properties, and degeneracy of states. We develop, from first principles, a generic rational framework for state assignment in cell biology and medicine, and demonstrate its applicability with a few simple theoretical case studies from medical diagnostics. We show how disease-related statistical information can be used to build a comprehensive model that includes the relevant dependencies between clinical and laboratory findings (signs) and diseases. In particular, we include disease-disease and sign-sign interactions and study how one can infer the probability of a disease in a patient with given signs. We perform comparative analysis with simple benchmark models to check the performances of our models. We find that including interactions can significantly change the statistical importance of the signs and diseases. This first principles approach, as we show, facilitates the early diagnosis of disease by taking interactions into accounts, and enables the construction of consensus diagnostic flow charts. Additionally, we envision that our approach will find applications in systems biology, and in particular, in characterizing the phenome via the metabolome, the proteome, the transcriptome, and the genome.

  8. A mixed-effects model approach for the statistical analysis of vocal fold viscoelastic shear properties.

    PubMed

    Xu, Chet C; Chan, Roger W; Sun, Han; Zhan, Xiaowei

    2017-11-01

    A mixed-effects model approach was introduced in this study for the statistical analysis of rheological data of vocal fold tissues, in order to account for the data correlation caused by multiple measurements of each tissue sample across the test frequency range. Such data correlation had often been overlooked in previous studies in the past decades. The viscoelastic shear properties of the vocal fold lamina propria of two commonly used laryngeal research animal species (i.e. rabbit, porcine) were measured by a linear, controlled-strain simple-shear rheometer. Along with published canine and human rheological data, the vocal fold viscoelastic shear moduli of these animal species were compared to those of human over a frequency range of 1-250Hz using the mixed-effects models. Our results indicated that tissues of the rabbit, canine and porcine vocal fold lamina propria were significantly stiffer and more viscous than those of human. Mixed-effects models were shown to be able to more accurately analyze rheological data generated from repeated measurements. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Effective model approach to the dense state of QCD matter

    NASA Astrophysics Data System (ADS)

    Fukushima, Kenji

    2011-12-01

    The first-principle approach to the dense state of QCD matter, i.e. the lattice-QCD simulation at finite baryon density, is not under theoretical control for the moment. The effective model study based on QCD symmetries is a practical alternative. However the model parameters that are fixed by hadronic properties in the vacuum may have unknown dependence on the baryon chemical potential. We propose a new prescription to constrain the effective model parameters by the matching condition with the thermal Statistical Model. In the transitional region where thermal quantities blow up in the Statistical Model, deconfined quarks and gluons should smoothly take over the relevant degrees of freedom from hadrons and resonances. We use the Polyakov-loop coupled Nambu-Jona-Lasinio (PNJL) model as an effective description in the quark side and show how the matching condition is satisfied by a simple ansäatz on the Polyakov loop potential. Our results favor a phase diagram with the chiral phase transition located at slightly higher temperature than deconfinement which stays close to the chemical freeze-out points.

  10. A score-statistic approach for determining threshold values in QTL mapping.

    PubMed

    Kao, Chen-Hung; Ho, Hsiang-An

    2012-06-01

    Issues in determining the threshold values of QTL mapping are often investigated for the backcross and F2 populations with relatively simple genome structures so far. The investigations of these issues in the progeny populations after F2 (advanced populations) with relatively more complicated genomes are generally inadequate. As these advanced populations have been well implemented in QTL mapping, it is important to address these issues for them in more details. Due to an increasing number of meiosis cycle, the genomes of the advanced populations can be very different from the backcross and F2 genomes. Therefore, special devices that consider the specific genome structures present in the advanced populations are required to resolve these issues. By considering the differences in genome structure between populations, we formulate more general score test statistics and gaussian processes to evaluate their threshold values. In general, we found that, given a significance level and a genome size, threshold values for QTL detection are higher in the denser marker maps and in the more advanced populations. Simulations were performed to validate our approach.

  11. Statistical mechanics of homogeneous partly pinned fluid systems.

    PubMed

    Krakoviack, Vincent

    2010-12-01

    The homogeneous partly pinned fluid systems are simple models of a fluid confined in a disordered porous matrix obtained by arresting randomly chosen particles in a one-component bulk fluid or one of the two components of a binary mixture. In this paper, their configurational properties are investigated. It is shown that a peculiar complementarity exists between the mobile and immobile phases, which originates from the fact that the solid is prepared in presence of and in equilibrium with the adsorbed fluid. Simple identities follow, which connect different types of configurational averages, either relative to the fluid-matrix system or to the bulk fluid from which it is prepared. Crucial simplifications result for the computation of important structural quantities, both in computer simulations and in theoretical approaches. Finally, possible applications of the model in the field of dynamics in confinement or in strongly asymmetric mixtures are suggested.

  12. Universal Power Law Governing Pedestrian Interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karamouzas, Ioannis; Skinner, Brian; Guy, Stephen J.

    2014-12-01

    Human crowds often bear a striking resemblance to interacting particle systems, and this has prompted many researchers to describe pedestrian dynamics in terms of interaction forces and potential energies. The correct quantitative form of this interaction, however, has remained an open question. Here, we introduce a novel statistical-mechanical approach to directly measure the interaction energy between pedestrians. This analysis, when applied to a large collection of human motion data, reveals a simple power-law interaction that is based not on the physical separation between pedestrians but on their projected time to a potential future collision, and is therefore fundamentally anticipatory inmore » nature. Remarkably, this simple law is able to describe human interactions across a wide variety of situations, speeds, and densities. We further show, through simulations, that the interaction law we identify is sufficient to reproduce many known crowd phenomena.« less

  13. Simple and Accurate Method for Central Spin Problems

    NASA Astrophysics Data System (ADS)

    Lindoy, Lachlan P.; Manolopoulos, David E.

    2018-06-01

    We describe a simple quantum mechanical method that can be used to obtain accurate numerical results over long timescales for the spin correlation tensor of an electron spin that is hyperfine coupled to a large number of nuclear spins. This method does not suffer from the statistical errors that accompany a Monte Carlo sampling of the exact eigenstates of the central spin Hamiltonian obtained from the algebraic Bethe ansatz, or from the growth of the truncation error with time in the time-dependent density matrix renormalization group (TDMRG) approach. As a result, it can be applied to larger central spin problems than the algebraic Bethe ansatz, and for longer times than the TDMRG algorithm. It is therefore an ideal method to use to solve central spin problems, and we expect that it will also prove useful for a variety of related problems that arise in a number of different research fields.

  14. Interaction between mean flow and turbulence in two dimensions

    PubMed Central

    2016-01-01

    This short note is written to call attention to an analytic approach to the interaction of developed turbulence with mean flows of simple geometry (jets and vortices). It is instructive to compare cases in two and three dimensions and see why the former are solvable and the latter are not (yet). We present the analytical solutions for two-dimensional mean flows generated by an inverse turbulent cascade on a sphere and in planar domains of different aspect ratios. These solutions are obtained in the limit of small friction when the flow is strong while turbulence can be considered weak and treated perturbatively. I then discuss when these simple solutions can be realized and when more complicated flows may appear instead. The next step of describing turbulence statistics inside a flow and directions of possible future progress are briefly discussed at the end. PMID:27493579

  15. A simple dynamic subgrid-scale model for LES of particle-laden turbulence

    NASA Astrophysics Data System (ADS)

    Park, George Ilhwan; Bassenne, Maxime; Urzay, Javier; Moin, Parviz

    2017-04-01

    In this study, a dynamic model for large-eddy simulations is proposed in order to describe the motion of small inertial particles in turbulent flows. The model is simple, involves no significant computational overhead, contains no adjustable parameters, and is flexible enough to be deployed in any type of flow solvers and grids, including unstructured setups. The approach is based on the use of elliptic differential filters to model the subgrid-scale velocity. The only model parameter, which is related to the nominal filter width, is determined dynamically by imposing consistency constraints on the estimated subgrid energetics. The performance of the model is tested in large-eddy simulations of homogeneous-isotropic turbulence laden with particles, where improved agreement with direct numerical simulation results is observed in the dispersed-phase statistics, including particle acceleration, local carrier-phase velocity, and preferential-concentration metrics.

  16. Determination of Flavonoids, Phenolic Acids, and Xanthines in Mate Tea (Ilex paraguariensis St.-Hil.)

    PubMed Central

    Bojić, Mirza; Simon Haas, Vicente; Maleš, Željan

    2013-01-01

    Raw material, different formulations of foods, and dietary supplements of mate demands control of the content of bioactive substances for which high performance thin layer chromatography (TLC), described here, presents simple and rapid approach for detections as well as quantification. Using TLC densitometry, the following bioactive compounds were identified and quantified: chlorogenic acid (2.1 mg/g), caffeic acid (1.5 mg/g), rutin (5.2 mg/g), quercetin (2.2 mg/g), and kaempferol (4.5 mg/g). The results obtained with TLC densitometry for caffeine (5.4 mg/g) and theobromine (2.7 mg/g) show no statistical difference to the content of total xanthines (7.6 mg/g) obtained by UV-Vis spectrophotometry. Thus, TLC remains a technique of choice for simple and rapid analysis of great number of samples as well as a primary screening technique in plant analysis. PMID:23841023

  17. Are the correct herbal claims by Hildegard von Bingen only lucky strikes? A new statistical approach.

    PubMed

    Uehleke, Bernhard; Hopfenmueller, Werner; Stange, Rainer; Saller, Reinhard

    2012-01-01

    Ancient and medieval herbal books are often believed to describe the same claims still in use today. Medieval herbal books, however, provide long lists of claims for each herb, most of which are not approved today, while the herb's modern use is often missing. So the hypothesis arises that a medieval author could have randomly hit on 'correct' claims among his many 'wrong' ones. We developed a statistical procedure based on a simple probability model. We applied our procedure to the herbal books of Hildegard von Bingen (1098- 1179) as an example for its usefulness. Claim attributions for a certain herb were classified as 'correct' if approximately the same as indicated in actual monographs. The number of 'correct' claim attributions was significantly higher than it could have been by pure chance, even though the vast majority of Hildegard von Bingen's claims were not 'correct'. The hypothesis that Hildegard would have achieved her 'correct' claims purely by chance can be clearly rejected. The finding that medical claims provided by a medieval author are significantly related to modern herbal use supports the importance of traditional medicinal systems as an empirical source. However, since many traditional claims are not in accordance with modern applications, they should be used carefully and analyzed in a systematic, statistics-based manner. Our statistical approach can be used for further systematic comparison of herbal claims of traditional sources as well as in the fields of ethnobotany and ethnopharmacology. Copyright © 2012 S. Karger AG, Basel.

  18. A powerful approach for association analysis incorporating imprinting effects

    PubMed Central

    Xia, Fan; Zhou, Ji-Yuan; Fung, Wing Kam

    2011-01-01

    Motivation: For a diallelic marker locus, the transmission disequilibrium test (TDT) is a simple and powerful design for genetic studies. The TDT was originally proposed for use in families with both parents available (complete nuclear families) and has further been extended to 1-TDT for use in families with only one of the parents available (incomplete nuclear families). Currently, the increasing interest of the influence of parental imprinting on heritability indicates the importance of incorporating imprinting effects into the mapping of association variants. Results: In this article, we extend the TDT-type statistics to incorporate imprinting effects and develop a series of new test statistics in a general two-stage framework for association studies. Our test statistics enjoy the nature of family-based designs that need no assumption of Hardy–Weinberg equilibrium. Also, the proposed methods accommodate complete and incomplete nuclear families with one or more affected children. In the simulation study, we verify the validity of the proposed test statistics under various scenarios, and compare the powers of the proposed statistics with some existing test statistics. It is shown that our methods greatly improve the power for detecting association in the presence of imprinting effects. We further demonstrate the advantage of our methods by the application of the proposed test statistics to a rheumatoid arthritis dataset. Contact: wingfung@hku.hk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21798962

  19. A powerful approach for association analysis incorporating imprinting effects.

    PubMed

    Xia, Fan; Zhou, Ji-Yuan; Fung, Wing Kam

    2011-09-15

    For a diallelic marker locus, the transmission disequilibrium test (TDT) is a simple and powerful design for genetic studies. The TDT was originally proposed for use in families with both parents available (complete nuclear families) and has further been extended to 1-TDT for use in families with only one of the parents available (incomplete nuclear families). Currently, the increasing interest of the influence of parental imprinting on heritability indicates the importance of incorporating imprinting effects into the mapping of association variants. In this article, we extend the TDT-type statistics to incorporate imprinting effects and develop a series of new test statistics in a general two-stage framework for association studies. Our test statistics enjoy the nature of family-based designs that need no assumption of Hardy-Weinberg equilibrium. Also, the proposed methods accommodate complete and incomplete nuclear families with one or more affected children. In the simulation study, we verify the validity of the proposed test statistics under various scenarios, and compare the powers of the proposed statistics with some existing test statistics. It is shown that our methods greatly improve the power for detecting association in the presence of imprinting effects. We further demonstrate the advantage of our methods by the application of the proposed test statistics to a rheumatoid arthritis dataset. wingfung@hku.hk Supplementary data are available at Bioinformatics online.

  20. Detecting atypical examples of known domain types by sequence similarity searching: the SBASE domain library approach.

    PubMed

    Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor

    2010-11-01

    SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.

  1. A Probabilistic Approach to Fitting Period–luminosity Relations and Validating Gaia Parallaxes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sesar, Branimir; Fouesneau, Morgan; Bailer-Jones, Coryn A. L.

    Pulsating stars, such as Cepheids, Miras, and RR Lyrae stars, are important distance indicators and calibrators of the “cosmic distance ladder,” and yet their period–luminosity–metallicity (PLZ) relations are still constrained using simple statistical methods that cannot take full advantage of available data. To enable optimal usage of data provided by the Gaia mission, we present a probabilistic approach that simultaneously constrains parameters of PLZ relations and uncertainties in Gaia parallax measurements. We demonstrate this approach by constraining PLZ relations of type ab RR Lyrae stars in near-infrared W 1 and W 2 bands, using Tycho- Gaia Astrometric Solution (TGAS) parallaxmore » measurements for a sample of ≈100 type ab RR Lyrae stars located within 2.5 kpc of the Sun. The fitted PLZ relations are consistent with previous studies, and in combination with other data, deliver distances precise to 6% (once various sources of uncertainty are taken into account). To a precision of 0.05 mas (1 σ ), we do not find a statistically significant offset in TGAS parallaxes for this sample of distant RR Lyrae stars (median parallax of 0.8 mas and distance of 1.4 kpc). With only minor modifications, our probabilistic approach can be used to constrain PLZ relations of other pulsating stars, and we intend to apply it to Cepheid and Mira stars in the near future.« less

  2. Identifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control

    PubMed Central

    Shen, Li; Saykin, Andrew J.; Williams, Scott M.; Moore, Jason H.

    2016-01-01

    ABSTRACT Although gene‐environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome‐wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening‐testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome‐wide data sets, we incorporate a score statistic‐based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP‐education interactions relative to Alzheimer's disease status using genome‐wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). PMID:27578615

  3. Sequence-Based Prioritization of Nonsynonymous Single-Nucleotide Polymorphisms for the Study of Disease Mutations

    PubMed Central

    Jiang, Rui ; Yang, Hua ; Zhou, Linqi ; Kuo, C.-C. Jay ; Sun, Fengzhu ; Chen, Ting 

    2007-01-01

    The increasing demand for the identification of genetic variation responsible for common diseases has translated into a need for sophisticated methods for effectively prioritizing mutations occurring in disease-associated genetic regions. In this article, we prioritize candidate nonsynonymous single-nucleotide polymorphisms (nsSNPs) through a bioinformatics approach that takes advantages of a set of improved numeric features derived from protein-sequence information and a new statistical learning model called “multiple selection rule voting” (MSRV). The sequence-based features can maximize the scope of applications of our approach, and the MSRV model can capture subtle characteristics of individual mutations. Systematic validation of the approach demonstrates that this approach is capable of prioritizing causal mutations for both simple monogenic diseases and complex polygenic diseases. Further studies of familial Alzheimer diseases and diabetes show that the approach can enrich mutations underlying these polygenic diseases among the top of candidate mutations. Application of this approach to unclassified mutations suggests that there are 10 suspicious mutations likely to cause diseases, and there is strong support for this in the literature. PMID:17668383

  4. To t-Test or Not to t-Test? A p-Values-Based Point of View in the Receiver Operating Characteristic Curve Framework.

    PubMed

    Vexler, Albert; Yu, Jihnhee

    2018-04-13

    A common statistical doctrine supported by many introductory courses and textbooks is that t-test type procedures based on normally distributed data points are anticipated to provide a standard in decision-making. In order to motivate scholars to examine this convention, we introduce a simple approach based on graphical tools of receiver operating characteristic (ROC) curve analysis, a well-established biostatistical methodology. In this context, we propose employing a p-values-based method, taking into account the stochastic nature of p-values. We focus on the modern statistical literature to address the expected p-value (EPV) as a measure of the performance of decision-making rules. During the course of our study, we extend the EPV concept to be considered in terms of the ROC curve technique. This provides expressive evaluations and visualizations of a wide spectrum of testing mechanisms' properties. We show that the conventional power characterization of tests is a partial aspect of the presented EPV/ROC technique. We desire that this explanation of the EPV/ROC approach convinces researchers of the usefulness of the EPV/ROC approach for depicting different characteristics of decision-making procedures, in light of the growing interest regarding correct p-values-based applications.

  5. Quantitative identification of riverine nitrogen from point, direct runoff and base flow sources.

    PubMed

    Huang, Hong; Zhang, Baifa; Lu, Jun

    2014-01-01

    We present a methodological example for quantifying the contributions of riverine total nitrogen (TN) from point, direct runoff and base flow sources by combining a recursive digital filter technique and statistical methods. First, we separated daily riverine flow into direct runoff and base flow using a recursive digital filter technique; then, a statistical model was established using daily simultaneous data for TN load, direct runoff rate, base flow rate, and temperature; and finally, the TN loading from direct runoff and base flow sources could be inversely estimated. As a case study, this approach was adopted to identify the TN source contributions in Changle River, eastern China. Results showed that, during 2005-2009, the total annual TN input to the river was 1,700.4±250.2 ton, and the contributions of point, direct runoff and base flow sources were 17.8±2.8%, 45.0±3.6%, and 37.2±3.9%, respectively. The innovation of the approach is that the nitrogen from direct runoff and base flow sources could be separately quantified. The approach is simple but detailed enough to take the major factors into account, providing an effective and reliable method for riverine nitrogen loading estimation and source apportionment.

  6. Experimental design and data analysis of Ago-RIP-Seq experiments for the identification of microRNA targets.

    PubMed

    Tichy, Diana; Pickl, Julia Maria Anna; Benner, Axel; Sültmann, Holger

    2017-03-31

    The identification of microRNA (miRNA) target genes is crucial for understanding miRNA function. Many methods for the genome-wide miRNA target identification have been developed in recent years; however, they have several limitations including the dependence on low-confident prediction programs and artificial miRNA manipulations. Ago-RNA immunoprecipitation combined with high-throughput sequencing (Ago-RIP-Seq) is a promising alternative. However, appropriate statistical data analysis algorithms taking into account the experimental design and the inherent noise of such experiments are largely lacking.Here, we investigate the experimental design for Ago-RIP-Seq and examine biostatistical methods to identify de novo miRNA target genes. Statistical approaches considered are either based on a negative binomial model fit to the read count data or applied to transformed data using a normal distribution-based generalized linear model. We compare them by a real data simulation study using plasmode data sets and evaluate the suitability of the approaches to detect true miRNA targets by sensitivity and false discovery rates. Our results suggest that simple approaches like linear regression models on (appropriately) transformed read count data are preferable. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. Understanding system dynamics of an adaptive enzyme network from globally profiled kinetic parameters.

    PubMed

    Chiang, Austin W T; Liu, Wei-Chung; Charusanti, Pep; Hwang, Ming-Jing

    2014-01-15

    A major challenge in mathematical modeling of biological systems is to determine how model parameters contribute to systems dynamics. As biological processes are often complex in nature, it is desirable to address this issue using a systematic approach. Here, we propose a simple methodology that first performs an enrichment test to find patterns in the values of globally profiled kinetic parameters with which a model can produce the required system dynamics; this is then followed by a statistical test to elucidate the association between individual parameters and different parts of the system's dynamics. We demonstrate our methodology on a prototype biological system of perfect adaptation dynamics, namely the chemotaxis model for Escherichia coli. Our results agreed well with those derived from experimental data and theoretical studies in the literature. Using this model system, we showed that there are motifs in kinetic parameters and that these motifs are governed by constraints of the specified system dynamics. A systematic approach based on enrichment statistical tests has been developed to elucidate the relationships between model parameters and the roles they play in affecting system dynamics of a prototype biological network. The proposed approach is generally applicable and therefore can find wide use in systems biology modeling research.

  8. Entry trajectory and atmosphere reconstruction methodologies for the Mars Exploration Rover mission

    NASA Astrophysics Data System (ADS)

    Desai, Prasun N.; Blanchard, Robert C.; Powell, Richard W.

    2004-02-01

    The Mars Exploration Rover (MER) mission will land two landers on the surface of Mars, arriving in January 2004. Both landers will deliver the rovers to the surface by decelerating with the aid of an aeroshell, a supersonic parachute, retro-rockets, and air bags for safely landing on the surface. The reconstruction of the MER descent trajectory and atmosphere profile will be performed for all the phases from hypersonic flight through landing. A description of multiple methodologies for the flight reconstruction is presented from simple parameter identification methods through a statistical Kalman filter approach.

  9. Allele-sharing models: LOD scores and accurate linkage tests.

    PubMed

    Kong, A; Cox, N J

    1997-11-01

    Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested.

  10. Allele-sharing models: LOD scores and accurate linkage tests.

    PubMed Central

    Kong, A; Cox, N J

    1997-01-01

    Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested. PMID:9345087

  11. Collisional-radiative switching - A powerful technique for converging non-LTE calculations

    NASA Technical Reports Server (NTRS)

    Hummer, D. G.; Voels, S. A.

    1988-01-01

    A very simple technique has been developed to converge statistical equilibrium and model atmospheric calculations in extreme non-LTE conditions when the usual iterative methods fail to converge from an LTE starting model. The proposed technique is based on a smooth transition from a collision-dominated LTE situation to the desired non-LTE conditions in which radiation dominates, at least in the most important transitions. The proposed approach was used to successfully compute stellar models with He abundances of 0.20, 0.30, and 0.50; Teff = 30,000 K, and log g = 2.9.

  12. Invariant approach to the character classification

    NASA Astrophysics Data System (ADS)

    Šariri, Kristina; Demoli, Nazif

    2008-04-01

    Image moments analysis is a very useful tool which allows image description invariant to translation and rotation, scale change and some types of image distortions. The aim of this work was development of simple method for fast and reliable classification of characters by using Hu's and affine moment invariants. Measure of Eucleidean distance was used as a discrimination feature with statistical parameters estimated. The method was tested in classification of Times New Roman font letters as well as sets of the handwritten characters. It is shown that using all Hu's and three affine invariants as discrimination set improves recognition rate by 30%.

  13. Computation of fluid flow and pore-space properties estimation on micro-CT images of rock samples

    NASA Astrophysics Data System (ADS)

    Starnoni, M.; Pokrajac, D.; Neilson, J. E.

    2017-09-01

    Accurate determination of the petrophysical properties of rocks, namely REV, mean pore and grain size and absolute permeability, is essential for a broad range of engineering applications. Here, the petrophysical properties of rocks are calculated using an integrated approach comprising image processing, statistical correlation and numerical simulations. The Stokes equations of creeping flow for incompressible fluids are solved using the Finite-Volume SIMPLE algorithm. Simulations are then carried out on three-dimensional digital images obtained from micro-CT scanning of two rock formations: one sandstone and one carbonate. Permeability is predicted from the computed flow field using Darcy's law. It is shown that REV, REA and mean pore and grain size are effectively estimated using the two-point spatial correlation function. Homogeneity and anisotropy are also evaluated using the same statistical tools. A comparison of different absolute permeability estimates is also presented, revealing a good agreement between the numerical value and the experimentally determined one for the carbonate sample, but a large discrepancy for the sandstone. Finally, a new convergence criterion for the SIMPLE algorithm, and more generally for the family of pressure-correction methods, is presented. This criterion is based on satisfaction of bulk momentum balance, which makes it particularly useful for pore-scale modelling of reservoir rocks.

  14. A Simple Approach for Monitoring Business Service Time Variation

    PubMed Central

    2014-01-01

    Control charts are effective tools for signal detection in both manufacturing processes and service processes. Much of the data in service industries comes from processes having nonnormal or unknown distributions. The commonly used Shewhart variable control charts, which depend heavily on the normality assumption, are not appropriately used here. In this paper, we propose a new asymmetric EWMA variance chart (EWMA-AV chart) and an asymmetric EWMA mean chart (EWMA-AM chart) based on two simple statistics to monitor process variance and mean shifts simultaneously. Further, we explore the sampling properties of the new monitoring statistics and calculate the average run lengths when using both the EWMA-AV chart and the EWMA-AM chart. The performance of the EWMA-AV and EWMA-AM charts and that of some existing variance and mean charts are compared. A numerical example involving nonnormal service times from the service system of a bank branch in Taiwan is used to illustrate the applications of the EWMA-AV and EWMA-AM charts and to compare them with the existing variance (or standard deviation) and mean charts. The proposed EWMA-AV chart and EWMA-AM charts show superior detection performance compared to the existing variance and mean charts. The EWMA-AV chart and EWMA-AM chart are thus recommended. PMID:24895647

  15. A simple approach for monitoring business service time variation.

    PubMed

    Yang, Su-Fen; Arnold, Barry C

    2014-01-01

    Control charts are effective tools for signal detection in both manufacturing processes and service processes. Much of the data in service industries comes from processes having nonnormal or unknown distributions. The commonly used Shewhart variable control charts, which depend heavily on the normality assumption, are not appropriately used here. In this paper, we propose a new asymmetric EWMA variance chart (EWMA-AV chart) and an asymmetric EWMA mean chart (EWMA-AM chart) based on two simple statistics to monitor process variance and mean shifts simultaneously. Further, we explore the sampling properties of the new monitoring statistics and calculate the average run lengths when using both the EWMA-AV chart and the EWMA-AM chart. The performance of the EWMA-AV and EWMA-AM charts and that of some existing variance and mean charts are compared. A numerical example involving nonnormal service times from the service system of a bank branch in Taiwan is used to illustrate the applications of the EWMA-AV and EWMA-AM charts and to compare them with the existing variance (or standard deviation) and mean charts. The proposed EWMA-AV chart and EWMA-AM charts show superior detection performance compared to the existing variance and mean charts. The EWMA-AV chart and EWMA-AM chart are thus recommended.

  16. Using Fisher information to track stability in multivariate ...

    EPA Pesticide Factsheets

    With the current proliferation of data, the proficient use of statistical and mining techniques offer substantial benefits to capture useful information from any dataset. As numerous approaches make use of information theory concepts, here, we discuss how Fisher information (FI) can be applied to sustainability science problems and used in data mining applications by analyzing patterns in data. FI was developed as a measure of information content in data, and it has been adapted to assess order in complex system behaviors. The main advantage of the approach is the ability to collapse multiple variables into an index that can be used to assess stability and track overall trends in a system, including its regimes and regime shifts. Here, we provide a brief overview of FI theory, followed by a simple step-by-step numerical example on how to compute FI. Furthermore, we introduce an open source Python library that can be freely downloaded from GitHub and we use it in a simple case study to evaluate the evolution of FI for the global-mean temperature from 1880 to 2015. Results indicate significant declines in FI starting in 1978, suggesting a possible regime shift. Demonstrate Fisher information as a useful method for assessing patterns in big data.

  17. Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge

    PubMed Central

    Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R

    2016-01-01

    This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. PMID:26209007

  18. Sculpting bespoke mountains: Determining free energies with basis expansions

    NASA Astrophysics Data System (ADS)

    Whitmer, Jonathan K.; Fluitt, Aaron M.; Antony, Lucas; Qin, Jian; McGovern, Michael; de Pablo, Juan J.

    2015-07-01

    The intriguing behavior of a wide variety of physical systems, ranging from amorphous solids or glasses to proteins, is a direct manifestation of underlying free energy landscapes riddled with local minima separated by large barriers. Exploring such landscapes has arguably become one of statistical physics's great challenges. A new method is proposed here for uniform sampling of rugged free energy surfaces. The method, which relies on special Green's functions to approximate the Dirac delta function, improves significantly on existing simulation techniques by providing a boundary-agnostic approach that is capable of mapping complex features in multidimensional free energy surfaces. The usefulness of the proposed approach is established in the context of a simple model glass former and model proteins, demonstrating improved convergence and accuracy over existing methods.

  19. A Tiered Approach to Evaluating Salinity Sources in Water at Oil and Gas Production Sites.

    PubMed

    Paquette, Shawn M; Molofsky, Lisa J; Connor, John A; Walker, Kenneth L; Hopkins, Harley; Chakraborty, Ayan

    2017-09-01

    A suspected increase in the salinity of fresh water resources can trigger a site investigation to identify the source(s) of salinity and the extent of any impacts. These investigations can be complicated by the presence of naturally elevated total dissolved solids or chlorides concentrations, multiple potential sources of salinity, and incomplete data and information on both naturally occurring conditions and the characteristics of potential sources. As a result, data evaluation techniques that are effective at one site may not be effective at another. In order to match the complexity of the evaluation effort to the complexity of the specific site, this paper presents a strategic tiered approach that utilizes established techniques for evaluating and identifying the source(s) of salinity in an efficient step-by-step manner. The tiered approach includes: (1) a simple screening process to evaluate whether an impact has occurred and if the source is readily apparent; (2) basic geochemical characterization of the impacted water resource(s) and potential salinity sources coupled with simple visual and statistical data evaluation methods to determine the source(s); and (3) advanced laboratory analyses (e.g., isotopes) and data evaluation methods to identify the source(s) and the extent of salinity impacts where it was not otherwise conclusive. A case study from the U.S. Gulf Coast is presented to illustrate the application of this tiered approach. © 2017, National Ground Water Association.

  20. Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation

    NASA Astrophysics Data System (ADS)

    Reis, D. S.; Stedinger, J. R.; Martins, E. S.

    2005-10-01

    This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.

  1. The Prediction Properties of Inverse and Reverse Regression for the Simple Linear Calibration Problem

    NASA Technical Reports Server (NTRS)

    Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.

    2010-01-01

    The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.

  2. [Bayesian statistics in medicine -- part II: main applications and inference].

    PubMed

    Montomoli, C; Nichelatti, M

    2008-01-01

    Bayesian statistics is not only used when one is dealing with 2-way tables, but it can be used for inferential purposes. Using the basic concepts presented in the first part, this paper aims to give a simple overview of Bayesian methods by introducing its foundation (Bayes' theorem) and then applying this rule to a very simple practical example; whenever possible, the elementary processes at the basis of analysis are compared to those of frequentist (classical) statistical analysis. The Bayesian reasoning is naturally connected to medical activity, since it appears to be quite similar to a diagnostic process.

  3. Using expert knowledge to incorporate uncertainty in cause-of-death assignments for modeling of cause-specific mortality

    USGS Publications Warehouse

    Walsh, Daniel P.; Norton, Andrew S.; Storm, Daniel J.; Van Deelen, Timothy R.; Heisy, Dennis M.

    2018-01-01

    Implicit and explicit use of expert knowledge to inform ecological analyses is becoming increasingly common because it often represents the sole source of information in many circumstances. Thus, there is a need to develop statistical methods that explicitly incorporate expert knowledge, and can successfully leverage this information while properly accounting for associated uncertainty during analysis. Studies of cause-specific mortality provide an example of implicit use of expert knowledge when causes-of-death are uncertain and assigned based on the observer's knowledge of the most likely cause. To explicitly incorporate this use of expert knowledge and the associated uncertainty, we developed a statistical model for estimating cause-specific mortality using a data augmentation approach within a Bayesian hierarchical framework. Specifically, for each mortality event, we elicited the observer's belief of cause-of-death by having them specify the probability that the death was due to each potential cause. These probabilities were then used as prior predictive values within our framework. This hierarchical framework permitted a simple and rigorous estimation method that was easily modified to include covariate effects and regularizing terms. Although applied to survival analysis, this method can be extended to any event-time analysis with multiple event types, for which there is uncertainty regarding the true outcome. We conducted simulations to determine how our framework compared to traditional approaches that use expert knowledge implicitly and assume that cause-of-death is specified accurately. Simulation results supported the inclusion of observer uncertainty in cause-of-death assignment in modeling of cause-specific mortality to improve model performance and inference. Finally, we applied the statistical model we developed and a traditional method to cause-specific survival data for white-tailed deer, and compared results. We demonstrate that model selection results changed between the two approaches, and incorporating observer knowledge in cause-of-death increased the variability associated with parameter estimates when compared to the traditional approach. These differences between the two approaches can impact reported results, and therefore, it is critical to explicitly incorporate expert knowledge in statistical methods to ensure rigorous inference.

  4. Combining Statistics and Physics to Improve Climate Downscaling

    NASA Astrophysics Data System (ADS)

    Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.

    2017-12-01

    Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.

  5. Generalized statistical mechanics approaches to earthquakes and tectonics.

    PubMed

    Vallianatos, Filippos; Papadakis, Giorgos; Michas, Georgios

    2016-12-01

    Despite the extreme complexity that characterizes the mechanism of the earthquake generation process, simple empirical scaling relations apply to the collective properties of earthquakes and faults in a variety of tectonic environments and scales. The physical characterization of those properties and the scaling relations that describe them attract a wide scientific interest and are incorporated in the probabilistic forecasting of seismicity in local, regional and planetary scales. Considerable progress has been made in the analysis of the statistical mechanics of earthquakes, which, based on the principle of entropy, can provide a physical rationale to the macroscopic properties frequently observed. The scale-invariant properties, the (multi) fractal structures and the long-range interactions that have been found to characterize fault and earthquake populations have recently led to the consideration of non-extensive statistical mechanics (NESM) as a consistent statistical mechanics framework for the description of seismicity. The consistency between NESM and observations has been demonstrated in a series of publications on seismicity, faulting, rock physics and other fields of geosciences. The aim of this review is to present in a concise manner the fundamental macroscopic properties of earthquakes and faulting and how these can be derived by using the notions of statistical mechanics and NESM, providing further insights into earthquake physics and fault growth processes.

  6. Generalized statistical mechanics approaches to earthquakes and tectonics

    PubMed Central

    Papadakis, Giorgos; Michas, Georgios

    2016-01-01

    Despite the extreme complexity that characterizes the mechanism of the earthquake generation process, simple empirical scaling relations apply to the collective properties of earthquakes and faults in a variety of tectonic environments and scales. The physical characterization of those properties and the scaling relations that describe them attract a wide scientific interest and are incorporated in the probabilistic forecasting of seismicity in local, regional and planetary scales. Considerable progress has been made in the analysis of the statistical mechanics of earthquakes, which, based on the principle of entropy, can provide a physical rationale to the macroscopic properties frequently observed. The scale-invariant properties, the (multi) fractal structures and the long-range interactions that have been found to characterize fault and earthquake populations have recently led to the consideration of non-extensive statistical mechanics (NESM) as a consistent statistical mechanics framework for the description of seismicity. The consistency between NESM and observations has been demonstrated in a series of publications on seismicity, faulting, rock physics and other fields of geosciences. The aim of this review is to present in a concise manner the fundamental macroscopic properties of earthquakes and faulting and how these can be derived by using the notions of statistical mechanics and NESM, providing further insights into earthquake physics and fault growth processes. PMID:28119548

  7. Statistics without Tears: Complex Statistics with Simple Arithmetic

    ERIC Educational Resources Information Center

    Smith, Brian

    2011-01-01

    One of the often overlooked aspects of modern statistics is the analysis of time series data. Modern introductory statistics courses tend to rush to probabilistic applications involving risk and confidence. Rarely does the first level course linger on such useful and fascinating topics as time series decomposition, with its practical applications…

  8. Applied statistics in ecology: common pitfalls and simple solutions

    Treesearch

    E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick

    2013-01-01

    The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...

  9. On representing the prognostic value of continuous gene expression biomarkers with the restricted mean survival curve.

    PubMed

    Eng, Kevin H; Schiller, Emily; Morrell, Kayla

    2015-11-03

    Researchers developing biomarkers for cancer prognosis from quantitative gene expression data are often faced with an odd methodological discrepancy: while Cox's proportional hazards model, the appropriate and popular technique, produces a continuous and relative risk score, it is hard to cast the estimate in clear clinical terms like median months of survival and percent of patients affected. To produce a familiar Kaplan-Meier plot, researchers commonly make the decision to dichotomize a continuous (often unimodal and symmetric) score. It is well known in the statistical literature that this procedure induces significant bias. We illustrate the liabilities of common techniques for categorizing a risk score and discuss alternative approaches. We promote the use of the restricted mean survival (RMS) and the corresponding RMS curve that may be thought of as an analog to the best fit line from simple linear regression. Continuous biomarker workflows should be modified to include the more rigorous statistical techniques and descriptive plots described in this article. All statistics discussed can be computed via standard functions in the Survival package of the R statistical programming language. Example R language code for the RMS curve is presented in the appendix.

  10. Time-to-event methodology improved statistical evaluation in register-based health services research.

    PubMed

    Bluhmki, Tobias; Bramlage, Peter; Volk, Michael; Kaltheuner, Matthias; Danne, Thomas; Rathmann, Wolfgang; Beyersmann, Jan

    2017-02-01

    Complex longitudinal sampling and the observational structure of patient registers in health services research are associated with methodological challenges regarding data management and statistical evaluation. We exemplify common pitfalls and want to stimulate discussions on the design, development, and deployment of future longitudinal patient registers and register-based studies. For illustrative purposes, we use data from the prospective, observational, German DIabetes Versorgungs-Evaluation register. One aim was to explore predictors for the initiation of a basal insulin supported therapy in patients with type 2 diabetes initially prescribed to glucose-lowering drugs alone. Major challenges are missing mortality information, time-dependent outcomes, delayed study entries, different follow-up times, and competing events. We show that time-to-event methodology is a valuable tool for improved statistical evaluation of register data and should be preferred to simple case-control approaches. Patient registers provide rich data sources for health services research. Analyses are accompanied with the trade-off between data availability, clinical plausibility, and statistical feasibility. Cox' proportional hazards model allows for the evaluation of the outcome-specific hazards, but prediction of outcome probabilities is compromised by missing mortality information. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Estimating maize production in Kenya using NDVI: Some statistical considerations

    USGS Publications Warehouse

    Lewis, J.E.; Rowland, James; Nadeau , A.

    1998-01-01

    A regression model approach using a normalized difference vegetation index (NDVI) has the potential for estimating crop production in East Africa. However, before production estimation can become a reality, the underlying model assumptions and statistical nature of the sample data (NDVI and crop production) must be examined rigorously. Annual maize production statistics from 1982-90 for 36 agricultural districts within Kenya were used as the dependent variable; median area NDVI (independent variable) values from each agricultural district and year were extracted from the annual maximum NDVI data set. The input data and the statistical association of NDVI with maize production for Kenya were tested systematically for the following items: (1) homogeneity of the data when pooling the sample, (2) gross data errors and influence points, (3) serial (time) correlation, (4) spatial autocorrelation and (5) stability of the regression coefficients. The results of using a simple regression model with NDVI as the only independent variable are encouraging (r 0.75, p 0.05) and illustrate that NDVI can be a responsive indicator of maize production, especially in areas of high NDVI spatial variability, which coincide with areas of production variability in Kenya.

  12. The Statistics of wood assays for preservative retention

    Treesearch

    Patricia K. Lebow; Scott W. Conklin

    2011-01-01

    This paper covers general statistical concepts that apply to interpreting wood assay retention values. In particular, since wood assays are typically obtained from a single composited sample, the statistical aspects, including advantages and disadvantages, of simple compositing are covered.

  13. New Statistical Model for Variability of Aerosol Optical Thickness: Theory and Application to MODIS Data over Ocean

    NASA Technical Reports Server (NTRS)

    Alexandrov, Mikhail Dmitrievic; Geogdzhayev, Igor V.; Tsigaridis, Konstantinos; Marshak, Alexander; Levy, Robert; Cairns, Brian

    2016-01-01

    A novel model for the variability in aerosol optical thickness (AOT) is presented. This model is based on the consideration of AOT fields as realizations of a stochastic process, that is the exponent of an underlying Gaussian process with a specific autocorrelation function. In this approach AOT fields have lognormal PDFs and structure functions having the correct asymptotic behavior at large scales. The latter is an advantage compared with fractal (scale-invariant) approaches. The simple analytical form of the structure function in the proposed model facilitates its use for the parameterization of AOT statistics derived from remote sensing data. The new approach is illustrated using a month-long global MODIS AOT dataset (over ocean) with 10 km resolution. It was used to compute AOT statistics for sample cells forming a grid with 5deg spacing. The observed shapes of the structure functions indicated that in a large number of cases the AOT variability is split into two regimes that exhibit different patterns of behavior: small-scale stationary processes and trends reflecting variations at larger scales. The small-scale patterns are suggested to be generated by local aerosols within the marine boundary layer, while the large-scale trends are indicative of elevated aerosols transported from remote continental sources. This assumption is evaluated by comparison of the geographical distributions of these patterns derived from MODIS data with those obtained from the GISS GCM. This study shows considerable potential to enhance comparisons between remote sensing datasets and climate models beyond regional mean AOTs.

  14. The taxonomy statistic uncovers novel clinical patterns in a population of ischemic stroke patients.

    PubMed

    Tukiendorf, Andrzej; Kaźmierski, Radosław; Michalak, Sławomir

    2013-01-01

    In this paper, we describe a simple taxonomic approach for clinical data mining elaborated by Marczewski and Steinhaus (M-S), whose performance equals the advanced statistical methodology known as the expectation-maximization (E-M) algorithm. We tested these two methods on a cohort of ischemic stroke patients. The comparison of both methods revealed strong agreement. Direct agreement between M-S and E-M classifications reached 83%, while Cohen's coefficient of agreement was κ = 0.766(P < 0.0001). The statistical analysis conducted and the outcomes obtained in this paper revealed novel clinical patterns in ischemic stroke patients. The aim of the study was to evaluate the clinical usefulness of Marczewski-Steinhaus' taxonomic approach as a tool for the detection of novel patterns of data in ischemic stroke patients and the prediction of disease outcome. In terms of the identification of fairly frequent types of stroke patients using their age, National Institutes of Health Stroke Scale (NIHSS), and diabetes mellitus (DM) status, when dealing with rough characteristics of patients, four particular types of patients are recognized, which cannot be identified by means of routine clinical methods. Following the obtained taxonomical outcomes, the strong correlation between the health status at moment of admission to emergency department (ED) and the subsequent recovery of patients is established. Moreover, popularization and simplification of the ideas of advanced mathematicians may provide an unconventional explorative platform for clinical problems.

  15. Statistical Paradigm for Organic Optoelectronic Devices: Normal Force Testing for Adhesion of Organic Photovoltaics and Organic Light-Emitting Diodes.

    PubMed

    Vasilak, Lindsay; Tanu Halim, Silvie M; Das Gupta, Hrishikesh; Yang, Juan; Kamperman, Marleen; Turak, Ayse

    2017-04-19

    In this study, we assess the utility of a normal force (pull-test) approach to measuring adhesion in organic solar cells and organic light-emitting diodes. This approach is a simple and practical method of monitoring the impact of systematic changes in materials, processing conditions, or environmental exposure on interfacial strength and electrode delamination. The ease of measurement enables a statistical description with numerous samples, variant geometry, and minimal preparation. After examining over 70 samples, using the Weibull modulus and the characteristic breaking strength as metrics, we were able to successfully differentiate the adhesion values between 8-tris(hydroxyquinoline aluminum) (Alq 3 ) and poly(3-hexyl-thiophene) and [6,6]-phenyl C61-butyric acid methyl ester (P3HT:PCBM) interfaces with Al and between two annealing times for the bulk heterojunction polymer blends. Additionally, the Weibull modulus, a relative measure of the range of flaw sizes at the fracture plane, can be correlated with the roughness of the organic surface. Finite element modeling of the delamination process suggests that the out-of-plane elastic modulus for Alq 3 is lower than the reported in-plane elastic values. We suggest a statistical treatment of a large volume of tests be part of the standard protocol for investigating adhesion to accommodate the unavoidable variability in morphology and interfacial structure found in most organic devices.

  16. Graphical statistical approach to soil organic matter resilience using analytical pyrolysis data.

    PubMed

    Almendros, Gonzalo; Hernández, Zulimar; Sanz, Jesús; Rodríguez-Sánchez, Sonia; Jiménez-González, Marco A; González-Pérez, José A

    2018-01-19

    Pyrolysis-gas chromatography-mass spectrometry (Py-GC-MS) of humic acids (HAs) from 30 agricultural soils from a volcanic island (Tenerife, Spain) was used to discern the molecular characteristics of soil organic matter (SOM) associated to resilience. For faster perceptual identification of the results, the yields of the pyrolysis products in the form of surface density plots were compared in an update of the Van Krevelen graphical statistical method. This approach, with respect to data reduction and visualization, was also used to collectively represent statistical indices that were obtained after simple and partial least squares (PLS) regression. The resulting plots illustrate different SOM structural domains (for example, carbohydrate- and lignin-derived and condensed lipid). The content of SOM and total mineralization coefficient (TMC) values can be well estimated from the relative abundance of 57 major pyrolysis compounds: SOM content and composition parallels the accumulation of lignin- and carbohydrate-derived structures (lignocellulosic material) and the depletion of condensed polyalkyl structures. In other words, in the volcanic ash soils that were studied, we found that the higher the amount of SOM, the lower its quality in terms of resilience. Although no cause-and-effect is inferred from this fact, it is evident that the resistance to biodegradation of the SOM is related to its molecular composition. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Statistical and linguistic features of DNA sequences

    NASA Technical Reports Server (NTRS)

    Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.

  18. Science and Facebook: The same popularity law!

    PubMed

    Néda, Zoltán; Varga, Levente; Biró, Tamás S

    2017-01-01

    The distribution of scientific citations for publications selected with different rules (author, topic, institution, country, journal, etc…) collapse on a single curve if one plots the citations relative to their mean value. We find that the distribution of "shares" for the Facebook posts rescale in the same manner to the very same curve with scientific citations. This finding suggests that citations are subjected to the same growth mechanism with Facebook popularity measures, being influenced by a statistically similar social environment and selection mechanism. In a simple master-equation approach the exponential growth of the number of publications and a preferential selection mechanism leads to a Tsallis-Pareto distribution offering an excellent description for the observed statistics. Based on our model and on the data derived from PubMed we predict that according to the present trend the average citations per scientific publications exponentially relaxes to about 4.

  19. Taking Ockham's razor to enzyme dynamics and catalysis.

    PubMed

    Glowacki, David R; Harvey, Jeremy N; Mulholland, Adrian J

    2012-01-29

    The role of protein dynamics in enzyme catalysis is a matter of intense current debate. Enzyme-catalysed reactions that involve significant quantum tunnelling can give rise to experimental kinetic isotope effects with complex temperature dependences, and it has been suggested that standard statistical rate theories, such as transition-state theory, are inadequate for their explanation. Here we introduce aspects of transition-state theory relevant to the study of enzyme reactivity, taking cues from chemical kinetics and dynamics studies of small molecules in the gas phase and in solution--where breakdowns of statistical theories have received significant attention and their origins are relatively better understood. We discuss recent theoretical approaches to understanding enzyme activity and then show how experimental observations for a number of enzymes may be reproduced using a transition-state-theory framework with physically reasonable parameters. Essential to this simple model is the inclusion of multiple conformations with different reactivity.

  20. Full counting statistics and shot noise of cotunneling in quantum dots and single-molecule transistors

    NASA Astrophysics Data System (ADS)

    Kaasbjerg, Kristen; Belzig, Wolfgang

    2015-06-01

    We develop a conceptually simple scheme based on a master-equation approach to evaluate the full-counting statistics (FCS) of elastic and inelastic off-resonant tunneling (cotunneling) in quantum dots (QDs) and molecules. We demonstrate the method by showing that it reproduces known results for the FCS and shot noise in the cotunneling regime. For a QD with an excited state, we obtain an analytic expression for the cumulant generating function (CGF) taking into account elastic and inelastic cotunneling. From the CGF we find that the shot noise above the inelastic threshold in the cotunneling regime is inherently super-Poissonian when external relaxation is weak. Furthermore, a complete picture of the shot noise across the different transport regimes is given. In the case where the excited state is a blocking state, strongly enhanced shot noise is predicted both in the resonant and cotunneling regimes.

  1. Optimizing Integrated Terminal Airspace Operations Under Uncertainty

    NASA Technical Reports Server (NTRS)

    Bosson, Christabelle; Xue, Min; Zelinski, Shannon

    2014-01-01

    In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.

  2. Science and Facebook: The same popularity law!

    PubMed Central

    Varga, Levente; Biró, Tamás S.

    2017-01-01

    The distribution of scientific citations for publications selected with different rules (author, topic, institution, country, journal, etc…) collapse on a single curve if one plots the citations relative to their mean value. We find that the distribution of “shares” for the Facebook posts rescale in the same manner to the very same curve with scientific citations. This finding suggests that citations are subjected to the same growth mechanism with Facebook popularity measures, being influenced by a statistically similar social environment and selection mechanism. In a simple master-equation approach the exponential growth of the number of publications and a preferential selection mechanism leads to a Tsallis-Pareto distribution offering an excellent description for the observed statistics. Based on our model and on the data derived from PubMed we predict that according to the present trend the average citations per scientific publications exponentially relaxes to about 4. PMID:28678796

  3. Prediction of drug transport processes using simple parameters and PLS statistics. The use of ACD/logP and ACD/ChemSketch descriptors.

    PubMed

    Osterberg, T; Norinder, U

    2001-01-01

    A method of modelling and predicting biopharmaceutical properties using simple theoretically computed molecular descriptors and multivariate statistics has been investigated for several data sets related to solubility, IAM chromatography, permeability across Caco-2 cell monolayers, human intestinal perfusion, brain-blood partitioning, and P-glycoprotein ATPase activity. The molecular descriptors (e.g. molar refractivity, molar volume, index of refraction, surface tension and density) and logP were computed with ACD/ChemSketch and ACD/logP, respectively. Good statistical models were derived that permit simple computational prediction of biopharmaceutical properties. All final models derived had R(2) values ranging from 0.73 to 0.95 and Q(2) values ranging from 0.69 to 0.86. The RMSEP values for the external test sets ranged from 0.24 to 0.85 (log scale).

  4. Learning from physics-based earthquake simulators: a minimal approach

    NASA Astrophysics Data System (ADS)

    Artale Harris, Pietro; Marzocchi, Warner; Melini, Daniele

    2017-04-01

    Physics-based earthquake simulators are aimed to generate synthetic seismic catalogs of arbitrary length, accounting for fault interaction, elastic rebound, realistic fault networks, and some simple earthquake nucleation process like rate and state friction. Through comparison of synthetic and real catalogs seismologists can get insights on the earthquake occurrence process. Moreover earthquake simulators can be used to to infer some aspects of the statistical behavior of earthquakes within the simulated region, by analyzing timescales not accessible through observations. The develoment of earthquake simulators is commonly led by the approach "the more physics, the better", pushing seismologists to go towards simulators more earth-like. However, despite the immediate attractiveness, we argue that this kind of approach makes more and more difficult to understand which physical parameters are really relevant to describe the features of the seismic catalog at which we are interested. For this reason, here we take an opposite minimal approach and analyze the behavior of a purposely simple earthquake simulator applied to a set of California faults. The idea is that a simple model may be more informative than a complex one for some specific scientific objectives, because it is more understandable. The model has three main components: the first one is a realistic tectonic setting, i.e., a fault dataset of California; the other two components are quantitative laws for earthquake generation on each single fault, and the Coulomb Failure Function for modeling fault interaction. The final goal of this work is twofold. On one hand, we aim to identify the minimum set of physical ingredients that can satisfactorily reproduce the features of the real seismic catalog, such as short-term seismic cluster, and to investigate on the hypothetical long-term behavior, and faults synchronization. On the other hand, we want to investigate the limits of predictability of the model itself.

  5. A simple rapid process for semi-automated brain extraction from magnetic resonance images of the whole mouse head.

    PubMed

    Delora, Adam; Gonzales, Aaron; Medina, Christopher S; Mitchell, Adam; Mohed, Abdul Faheem; Jacobs, Russell E; Bearer, Elaine L

    2016-01-15

    Magnetic resonance imaging (MRI) is a well-developed technique in neuroscience. Limitations in applying MRI to rodent models of neuropsychiatric disorders include the large number of animals required to achieve statistical significance, and the paucity of automation tools for the critical early step in processing, brain extraction, which prepares brain images for alignment and voxel-wise statistics. This novel timesaving automation of template-based brain extraction ("skull-stripping") is capable of quickly and reliably extracting the brain from large numbers of whole head images in a single step. The method is simple to install and requires minimal user interaction. This method is equally applicable to different types of MR images. Results were evaluated with Dice and Jacquard similarity indices and compared in 3D surface projections with other stripping approaches. Statistical comparisons demonstrate that individual variation of brain volumes are preserved. A downloadable software package not otherwise available for extraction of brains from whole head images is included here. This software tool increases speed, can be used with an atlas or a template from within the dataset, and produces masks that need little further refinement. Our new automation can be applied to any MR dataset, since the starting point is a template mask generated specifically for that dataset. The method reliably and rapidly extracts brain images from whole head images, rendering them useable for subsequent analytical processing. This software tool will accelerate the exploitation of mouse models for the investigation of human brain disorders by MRI. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Selecting Statistical Procedures for Quality Control Planning Based on Risk Management.

    PubMed

    Yago, Martín; Alcover, Silvia

    2016-07-01

    According to the traditional approach to statistical QC planning, the performance of QC procedures is assessed in terms of its probability of rejecting an analytical run that contains critical size errors (PEDC). Recently, the maximum expected increase in the number of unacceptable patient results reported during the presence of an undetected out-of-control error condition [Max E(NUF)], has been proposed as an alternative QC performance measure because it is more related to the current introduction of risk management concepts for QC planning in the clinical laboratory. We used a statistical model to investigate the relationship between PEDC and Max E(NUF) for simple QC procedures widely used in clinical laboratories and to construct charts relating Max E(NUF) with the capability of the analytical process that allow for QC planning based on the risk of harm to a patient due to the report of erroneous results. A QC procedure shows nearly the same Max E(NUF) value when used for controlling analytical processes with the same capability, and there is a close relationship between PEDC and Max E(NUF) for simple QC procedures; therefore, the value of PEDC can be estimated from the value of Max E(NUF) and vice versa. QC procedures selected by their high PEDC value are also characterized by a low value for Max E(NUF). The PEDC value can be used for estimating the probability of patient harm, allowing for the selection of appropriate QC procedures in QC planning based on risk management. © 2016 American Association for Clinical Chemistry.

  7. Statistical precision of the intensities retrieved from constrained fitting of overlapping peaks in high-resolution mass spectra

    DOE PAGES

    Cubison, M. J.; Jimenez, J. L.

    2015-06-05

    Least-squares fitting of overlapping peaks is often needed to separately quantify ions in high-resolution mass spectrometer data. A statistical simulation approach is used to assess the statistical precision of the retrieved peak intensities. The sensitivity of the fitted peak intensities to statistical noise due to ion counting is probed for synthetic data systems consisting of two overlapping ion peaks whose positions are pre-defined and fixed in the fitting procedure. The fitted intensities are sensitive to imperfections in the m/Q calibration. These propagate as a limiting precision in the fitted intensities that may greatly exceed the precision arising from counting statistics.more » The precision on the fitted peak intensity falls into one of three regimes. In the "counting-limited regime" (regime I), above a peak separation χ ~ 2 to 3 half-widths at half-maximum (HWHM), the intensity precision is similar to that due to counting error for an isolated ion. For smaller χ and higher ion counts (~ 1000 and higher), the intensity precision rapidly degrades as the peak separation is reduced ("calibration-limited regime", regime II). Alternatively for χ < 1.6 but lower ion counts (e.g. 10–100) the intensity precision is dominated by the additional ion count noise from the overlapping ion and is not affected by the imprecision in the m/Q calibration ("overlapping-limited regime", regime III). The transition between the counting and m/Q calibration-limited regimes is shown to be weakly dependent on resolving power and data spacing and can thus be approximated by a simple parameterisation based only on peak intensity ratios and separation. A simple equation can be used to find potentially problematic ion pairs when evaluating results from fitted spectra containing many ions. Longer integration times can improve the precision in regimes I and III, but a given ion pair can only be moved out of regime II through increased spectrometer resolving power. As a result, studies presenting data obtained from least-squares fitting procedures applied to mass spectral peaks should explicitly consider these limits on statistical precision.« less

  8. A Bayesian Approach to Model Selection in Hierarchical Mixtures-of-Experts Architectures.

    PubMed

    Tanner, Martin A.; Peng, Fengchun; Jacobs, Robert A.

    1997-03-01

    There does not exist a statistical model that shows good performance on all tasks. Consequently, the model selection problem is unavoidable; investigators must decide which model is best at summarizing the data for each task of interest. This article presents an approach to the model selection problem in hierarchical mixtures-of-experts architectures. These architectures combine aspects of generalized linear models with those of finite mixture models in order to perform tasks via a recursive "divide-and-conquer" strategy. Markov chain Monte Carlo methodology is used to estimate the distribution of the architectures' parameters. One part of our approach to model selection attempts to estimate the worth of each component of an architecture so that relatively unused components can be pruned from the architecture's structure. A second part of this approach uses a Bayesian hypothesis testing procedure in order to differentiate inputs that carry useful information from nuisance inputs. Simulation results suggest that the approach presented here adheres to the dictum of Occam's razor; simple architectures that are adequate for summarizing the data are favored over more complex structures. Copyright 1997 Elsevier Science Ltd. All Rights Reserved.

  9. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  10. A Role for Chunk Formation in Statistical Learning of Second Language Syntax

    ERIC Educational Resources Information Center

    Hamrick, Phillip

    2014-01-01

    Humans are remarkably sensitive to the statistical structure of language. However, different mechanisms have been proposed to account for such statistical sensitivities. The present study compared adult learning of syntax and the ability of two models of statistical learning to simulate human performance: Simple Recurrent Networks, which learn by…

  11. A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

    NASA Astrophysics Data System (ADS)

    Guillen, George; Rainey, Gail; Morin, Michelle

    2004-04-01

    Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.

  12. Brief communication: Skeletal biology past and present: Are we moving in the right direction?

    PubMed

    Hens, Samantha M; Godde, Kanya

    2008-10-01

    In 1982, Spencer's edited volume A History of American Physical Anthropology: 1930-1980 allowed numerous authors to document the state of our science, including a critical examination of skeletal biology. Some authors argued that the first 50 years of skeletal biology were characterized by the descriptive-historical approach with little regard for processual problems and that technological and statistical analyses were not rooted in theory. In an effort to determine whether Spencer's landmark volume impacted the field of skeletal biology, a content analysis was carried out for the American Journal of Physical Anthropology from 1980 to 2004. The percentage of skeletal biology articles is similar to that of previous decades. Analytical articles averaged only 32% and are defined by three criteria: statistical analysis, hypothesis testing, and broader explanatory context. However, when these criteria were scored individually, nearly 80% of papers attempted a broader theoretical explanation, 44% tested hypotheses, and 67% used advanced statistics, suggesting that the skeletal biology papers in the journal have an analytical emphasis. Considerable fluctuation exists between subfields; trends toward a more analytical approach are witnessed in the subfields of age/sex/stature/demography, skeletal maturation, anatomy, and nonhuman primate studies, which also increased in frequency, while paleontology and pathology were largely descriptive. Comparisons to the International Journal of Osteoarchaeology indicate that there are statistically significant differences between the two journals in terms of analytical criteria. These data indicate a positive shift in theoretical thinking, i.e., an attempt by most to explain processes rather than present a simple description of events.

  13. Ehrenfest dynamics is purity non-preserving: A necessary ingredient for decoherence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alonso, J. L.; Instituto de Biocomputacion y Fisica de Sistemas Complejos; Unidad Asociada IQFR-BIFI, Universidad de Zaragoza, Mariano Esquillor s/n, E-50018 Zaragoza

    2012-08-07

    We discuss the evolution of purity in mixed quantum/classical approaches to electronic nonadiabatic dynamics in the context of the Ehrenfest model. As it is impossible to exactly determine initial conditions for a realistic system, we choose to work in the statistical Ehrenfest formalism that we introduced in Alonso et al. [J. Phys. A: Math. Theor. 44, 396004 (2011)]. From it, we develop a new framework to determine exactly the change in the purity of the quantum subsystem along with the evolution of a statistical Ehrenfest system. In a simple case, we verify how and to which extent Ehrenfest statistical dynamicsmore » makes a system with more than one classical trajectory, and an initial quantum pure state become a quantum mixed one. We prove this numerically showing how the evolution of purity depends on time, on the dimension of the quantum state space D, and on the number of classical trajectories N of the initial distribution. The results in this work open new perspectives for studying decoherence with Ehrenfest dynamics.« less

  14. A statistical method to estimate low-energy hadronic cross sections

    NASA Astrophysics Data System (ADS)

    Balassa, Gábor; Kovács, Péter; Wolf, György

    2018-02-01

    In this article we propose a model based on the Statistical Bootstrap approach to estimate the cross sections of different hadronic reactions up to a few GeV in c.m.s. energy. The method is based on the idea, when two particles collide a so-called fireball is formed, which after a short time period decays statistically into a specific final state. To calculate the probabilities we use a phase space description extended with quark combinatorial factors and the possibility of more than one fireball formation. In a few simple cases the probability of a specific final state can be calculated analytically, where we show that the model is able to reproduce the ratios of the considered cross sections. We also show that the model is able to describe proton-antiproton annihilation at rest. In the latter case we used a numerical method to calculate the more complicated final state probabilities. Additionally, we examined the formation of strange and charmed mesons as well, where we used existing data to fit the relevant model parameters.

  15. A statistical and experimental approach for assessing the preservation of plant lipids in soil

    NASA Astrophysics Data System (ADS)

    Mueller, K. E.; Eissenstat, D. M.; Oleksyn, J.; Freeman, K. H.

    2011-12-01

    Plant-derived lipids contribute to stable soil organic matter, but further interpretations of their abundance in soils are limited because the factors that control lipid preservation are poorly understood. Using data from a long-term field experiment and simple statistical models, we provide novel constraints on several predictors of the concentration of hydrolyzable lipids in forest mineral soils. Focal lipids included common monomers of cutin, suberin, and plant waxes present in tree leaves and roots. Soil lipid concentrations were most strongly influenced by the concentrations of lipids in leaves and roots of the overlying trees, but were also affected by the type of lipid (e.g. alcohols vs. acids), lipid chain length, and whether lipids originated in leaves or roots. Collectively, these factors explained ~80% of the variation in soil lipid concentrations beneath 11 different tree species. In order to use soil lipid analyses to test and improve conceptual models of soil organic matter stabilization, additional studies that provide experimental and quantitative (i.e. statistical) constraints on plant lipid preservation are needed.

  16. Eutrophication risk assessment in coastal embayments using simple statistical models.

    PubMed

    Arhonditsis, G; Eleftheriadou, M; Karydis, M; Tsirtsis, G

    2003-09-01

    A statistical methodology is proposed for assessing the risk of eutrophication in marine coastal embayments. The procedure followed was the development of regression models relating the levels of chlorophyll a (Chl) with the concentration of the limiting nutrient--usually nitrogen--and the renewal rate of the systems. The method was applied in the Gulf of Gera, Island of Lesvos, Aegean Sea and a surrogate for renewal rate was created using the Canberra metric as a measure of the resemblance between the Gulf and the oligotrophic waters of the open sea in terms of their physical, chemical and biological properties. The Chl-total dissolved nitrogen-renewal rate regression model was the most significant, accounting for 60% of the variation observed in Chl. Predicted distributions of Chl for various combinations of the independent variables, based on Bayesian analysis of the models, enabled comparison of the outcomes of specific scenarios of interest as well as further analysis of the system dynamics. The present statistical approach can be used as a methodological tool for testing the resilience of coastal ecosystems under alternative managerial schemes and levels of exogenous nutrient loading.

  17. Learning investment indicators through data extension

    NASA Astrophysics Data System (ADS)

    Dvořák, Marek

    2017-07-01

    Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.

  18. Entropy for Mechanically Vibrating Systems

    NASA Astrophysics Data System (ADS)

    Tufano, Dante

    The research contained within this thesis deals with the subject of entropy as defined for and applied to mechanically vibrating systems. This work begins with an overview of entropy as it is understood in the fields of classical thermodynamics, information theory, statistical mechanics, and statistical vibroacoustics. Khinchin's definition of entropy, which is the primary definition used for the work contained in this thesis, is introduced in the context of vibroacoustic systems. The main goal of this research is to to establish a mathematical framework for the application of Khinchin's entropy in the field of statistical vibroacoustics by examining the entropy context of mechanically vibrating systems. The introduction of this thesis provides an overview of statistical energy analysis (SEA), a modeling approach to vibroacoustics that motivates this work on entropy. The objective of this thesis is given, and followed by a discussion of the intellectual merit of this work as well as a literature review of relevant material. Following the introduction, an entropy analysis of systems of coupled oscillators is performed utilizing Khinchin's definition of entropy. This analysis develops upon the mathematical theory relating to mixing entropy, which is generated by the coupling of vibroacoustic systems. The mixing entropy is shown to provide insight into the qualitative behavior of such systems. Additionally, it is shown that the entropy inequality property of Khinchin's entropy can be reduced to an equality using the mixing entropy concept. This equality can be interpreted as a facet of the second law of thermodynamics for vibroacoustic systems. Following this analysis, an investigation of continuous systems is performed using Khinchin's entropy. It is shown that entropy analyses using Khinchin's entropy are valid for continuous systems that can be decomposed into a finite number of modes. The results are shown to be analogous to those obtained for simple oscillators, which demonstrates the applicability of entropy-based approaches to real-world systems. Three systems are considered to demonstrate these findings: 1) a rod end-coupled to a simple oscillator, 2) two end-coupled rods, and 3) two end-coupled beams. The aforementioned work utilizes the weak coupling assumption to determine the entropy of composite systems. Following this discussion, a direct method of finding entropy is developed which does not rely on this limiting assumption. The resulting entropy provides a useful benchmark for evaluating the accuracy of the weak coupling approach, and is validated using systems of coupled oscillators. The later chapters of this work discuss Khinchin's entropy as applied to nonlinear and nonconservative systems, respectively. The discussion of entropy for nonlinear systems is motivated by the desire to expand the applicability of SEA techniques beyond the linear regime. The discussion of nonconservative systems is also crucial, since real-world systems interact with their environment, and it is necessary to confirm the validity of an entropy approach for systems that are relevant in the context of SEA. Having developed a mathematical framework for determining entropy under a number of previously unexplored cases, the relationship between thermodynamics and statistical vibroacoustics can be better understood. Specifically, vibroacoustic temperatures can be obtained for systems that are not necessarily linear or weakly coupled. In this way, entropy provides insight into how the power flow proportionality of statistical energy analysis (SEA) can be applied to a broader class of vibroacoustic systems. As such, entropy is a useful tool for both justifying and expanding the foundational results of SEA.

  19. Robust Combining of Disparate Classifiers Through Order Statistics

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  20. Counting statistics for genetic switches based on effective interaction approximation

    NASA Astrophysics Data System (ADS)

    Ohkubo, Jun

    2012-09-01

    Applicability of counting statistics for a system with an infinite number of states is investigated. The counting statistics has been studied a lot for a system with a finite number of states. While it is possible to use the scheme in order to count specific transitions in a system with an infinite number of states in principle, we have non-closed equations in general. A simple genetic switch can be described by a master equation with an infinite number of states, and we use the counting statistics in order to count the number of transitions from inactive to active states in the gene. To avoid having the non-closed equations, an effective interaction approximation is employed. As a result, it is shown that the switching problem can be treated as a simple two-state model approximately, which immediately indicates that the switching obeys non-Poisson statistics.

  1. Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices

    NASA Astrophysics Data System (ADS)

    Passemier, Damien; McKay, Matthew R.; Chen, Yang

    2015-07-01

    Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.

  2. Distinguishing Positive Selection From Neutral Evolution: Boosting the Performance of Summary Statistics

    PubMed Central

    Lin, Kao; Li, Haipeng; Schlötterer, Christian; Futschik, Andreas

    2011-01-01

    Summary statistics are widely used in population genetics, but they suffer from the drawback that no simple sufficient summary statistic exists, which captures all information required to distinguish different evolutionary hypotheses. Here, we apply boosting, a recent statistical method that combines simple classification rules to maximize their joint predictive performance. We show that our implementation of boosting has a high power to detect selective sweeps. Demographic events, such as bottlenecks, do not result in a large excess of false positives. A comparison to other neutrality tests shows that our boosting implementation performs well compared to other neutrality tests. Furthermore, we evaluated the relative contribution of different summary statistics to the identification of selection and found that for recent sweeps integrated haplotype homozygosity is very informative whereas older sweeps are better detected by Tajima's π. Overall, Watterson's θ was found to contribute the most information for distinguishing between bottlenecks and selection. PMID:21041556

  3. Evaluating the performance of a fault detection and diagnostic system for vapor compression equipment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Breuker, M.S.; Braun, J.E.

    This paper presents a detailed evaluation of the performance of a statistical, rule-based fault detection and diagnostic (FDD) technique presented by Rossi and Braun (1997). Steady-state and transient tests were performed on a simple rooftop air conditioner over a range of conditions and fault levels. The steady-state data without faults were used to train models that predict outputs for normal operation. The transient data with faults were used to evaluate FDD performance. The effect of a number of design variables on FDD sensitivity for different faults was evaluated and two prototype systems were specified for more complete evaluation. Good performancemore » was achieved in detecting and diagnosing five faults using only six temperatures (2 input and 4 output) and linear models. The performance improved by about a factor of two when ten measurements (three input and seven output) and higher order models were used. This approach for evaluating and optimizing the performance of the statistical, rule-based FDD technique could be used as a design and evaluation tool when applying this FDD method to other packaged air-conditioning systems. Furthermore, the approach could also be modified to evaluate the performance of other FDD methods.« less

  4. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    PubMed

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  5. Statistical inference from multiple iTRAQ experiments without using common reference standards.

    PubMed

    Herbrich, Shelley M; Cole, Robert N; West, Keith P; Schulze, Kerry; Yager, James D; Groopman, John D; Christian, Parul; Wu, Lee; O'Meally, Robert N; May, Damon H; McIntosh, Martin W; Ruczinski, Ingo

    2013-02-01

    Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or "masterpool", in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia.

  6. Bayesian modelling of lung function data from multiple-breath washout tests.

    PubMed

    Mahar, Robert K; Carlin, John B; Ranganathan, Sarath; Ponsonby, Anne-Louise; Vuillermin, Peter; Vukcevic, Damjan

    2018-05-30

    Paediatric respiratory researchers have widely adopted the multiple-breath washout (MBW) test because it allows assessment of lung function in unsedated infants and is well suited to longitudinal studies of lung development and disease. However, a substantial proportion of MBW tests in infants fail current acceptability criteria. We hypothesised that a model-based approach to analysing the data, in place of traditional simple empirical summaries, would enable more efficient use of these tests. We therefore developed a novel statistical model for infant MBW data and applied it to 1197 tests from 432 individuals from a large birth cohort study. We focus on Bayesian estimation of the lung clearance index, the most commonly used summary of lung function from MBW tests. Our results show that the model provides an excellent fit to the data and shed further light on statistical properties of the standard empirical approach. Furthermore, the modelling approach enables the lung clearance index to be estimated by using tests with different degrees of completeness, something not possible with the standard approach. Our model therefore allows previously unused data to be used rather than discarded, as well as routine use of shorter tests without significant loss of precision. Beyond our specific application, our work illustrates a number of important aspects of Bayesian modelling in practice, such as the importance of hierarchical specifications to account for repeated measurements and the value of model checking via posterior predictive distributions. Copyright © 2018 John Wiley & Sons, Ltd.

  7. Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

    PubMed

    Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R

    2015-12-01

    This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Complexity-entropy causality plane: A useful approach for distinguishing songs

    NASA Astrophysics Data System (ADS)

    Ribeiro, Haroldo V.; Zunino, Luciano; Mendes, Renio S.; Lenzi, Ervin K.

    2012-04-01

    Nowadays we are often faced with huge databases resulting from the rapid growth of data storage technologies. This is particularly true when dealing with music databases. In this context, it is essential to have techniques and tools able to discriminate properties from these massive sets. In this work, we report on a statistical analysis of more than ten thousand songs aiming to obtain a complexity hierarchy. Our approach is based on the estimation of the permutation entropy combined with an intensive complexity measure, building up the complexity-entropy causality plane. The results obtained indicate that this representation space is very promising to discriminate songs as well as to allow a relative quantitative comparison among songs. Additionally, we believe that the here-reported method may be applied in practical situations since it is simple, robust and has a fast numerical implementation.

  9. An approach to quality and performance control in a computer-assisted clinical chemistry laboratory.

    PubMed Central

    Undrill, P E; Frazer, S C

    1979-01-01

    A locally developed, computer-based clinical chemistry laboratory system has been in operation since 1970. This utilises a Digital Equipment Co Ltd PDP 12 and an interconnected PDP 8/F computer. Details are presented of the performance and quality control techniques incorporated into the system. Laboratory performance is assessed through analysis of results from fixed-level control sera as well as from cumulative sum methods. At a simple level the presentation may be considered purely indicative, while at a more sophisticated level statistical concepts have been introduced to aid the laboratory controller in decision-making processes. PMID:438340

  10. 3D tracking of laparoscopic instruments using statistical and geometric modeling.

    PubMed

    Wolf, Rémi; Duchateau, Josselin; Cinquin, Philippe; Voros, Sandrine

    2011-01-01

    During a laparoscopic surgery, the endoscope can be manipulated by an assistant or a robot. Several teams have worked on the tracking of surgical instruments, based on methods ranging from the development of specific devices to image processing methods. We propose to exploit the instruments' insertion points, which are fixed on the patients abdominal cavity, as a geometric constraint for the localization of the instruments. A simple geometric model of a laparoscopic instrument is described, as well as a parametrization that exploits a spherical geometric grid, which offers attracting homogeneity and isotropy properties. The general architecture of our proposed approach is based on the probabilistic Condensation algorithm.

  11. Information Visualization Techniques for Effective Cross-Discipline Communication

    NASA Astrophysics Data System (ADS)

    Fisher, Ward

    2013-04-01

    Collaboration between research groups in different fields is a common occurrence, but it can often be frustrating due to the absence of a common vocabulary. This lack of a shared context can make expressing important concepts and discussing results difficult. This problem may be further exacerbated when communicating to an audience of laypeople. Without a clear frame of reference, simple concepts are often rendered difficult-to-understand at best, and unintelligible at worst. An easy way to alleviate this confusion is with the use of clear, well-designed visualizations to illustrate an idea, process or conclusion. There exist a number of well-described machine-learning and statistical techniques which can be used to illuminate the information present within complex high-dimensional datasets. Once the information has been separated from the data, clear communication becomes a matter of selecting an appropriate visualization. Ideally, the visualization is information-rich but data-scarce. Anything from a simple bar chart, to a line chart with confidence intervals, to an animated set of 3D point-clouds can be used to render a complex idea as an easily understood image. Several case studies will be presented in this work. In the first study, we will examine how a complex statistical analysis was applied to a high-dimensional dataset, and how the results were succinctly communicated to an audience of microbiologists and chemical engineers. Next, we will examine a technique used to illustrate the concept of the singular value decomposition, as used in the field of computer vision, to a lay audience of undergraduate students from mixed majors. We will then examine a case where a simple animated line plot was used to communicate an approach to signal decomposition, and will finish with a discussion of the tools available to create these visualizations.

  12. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm.

    PubMed

    Raykov, Yordan P; Boukouvalas, Alexis; Baig, Fahd; Little, Max A

    The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

  13. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm

    PubMed Central

    Baig, Fahd; Little, Max A.

    2016-01-01

    The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism. PMID:27669525

  14. Using a Five-Step Procedure for Inferential Statistical Analyses

    ERIC Educational Resources Information Center

    Kamin, Lawrence F.

    2010-01-01

    Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…

  15. Switching theory-based steganographic system for JPEG images

    NASA Astrophysics Data System (ADS)

    Cherukuri, Ravindranath C.; Agaian, Sos S.

    2007-04-01

    Cellular communications constitute a significant portion of the global telecommunications market. Therefore, the need for secured communication over a mobile platform has increased exponentially. Steganography is an art of hiding critical data into an innocuous signal, which provide answers to the above needs. The JPEG is one of commonly used format for storing and transmitting images on the web. In addition, the pictures captured using mobile cameras are in mostly in JPEG format. In this article, we introduce a switching theory based steganographic system for JPEG images which is applicable for mobile and computer platforms. The proposed algorithm uses the fact that energy distribution among the quantized AC coefficients varies from block to block and coefficient to coefficient. Existing approaches are effective with a part of these coefficients but when employed over all the coefficients they show there ineffectiveness. Therefore, we propose an approach that works each set of AC coefficients with different frame work thus enhancing the performance of the approach. The proposed system offers a high capacity and embedding efficiency simultaneously withstanding to simple statistical attacks. In addition, the embedded information could be retrieved without prior knowledge of the cover image. Based on simulation results, the proposed method demonstrates an improved embedding capacity over existing algorithms while maintaining a high embedding efficiency and preserving the statistics of the JPEG image after hiding information.

  16. Calculation of the detection limit in radiation measurements with systematic uncertainties

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, J. M.; Russ, W.; Venkataraman, R.; Young, B. M.

    2015-06-01

    The detection limit (LD) or Minimum Detectable Activity (MDA) is an a priori evaluation of assay sensitivity intended to quantify the suitability of an instrument or measurement arrangement for the needs of a given application. Traditional approaches as pioneered by Currie rely on Gaussian approximations to yield simple, closed-form solutions, and neglect the effects of systematic uncertainties in the instrument calibration. These approximations are applicable over a wide range of applications, but are of limited use in low-count applications, when high confidence values are required, or when systematic uncertainties are significant. One proposed modification to the Currie formulation attempts account for systematic uncertainties within a Gaussian framework. We have previously shown that this approach results in an approximation formula that works best only for small values of the relative systematic uncertainty, for which the modification of Currie's method is the least necessary, and that it significantly overestimates the detection limit or gives infinite or otherwise non-physical results for larger systematic uncertainties where such a correction would be the most useful. We have developed an alternative approach for calculating detection limits based on realistic statistical modeling of the counting distributions which accurately represents statistical and systematic uncertainties. Instead of a closed form solution, numerical and iterative methods are used to evaluate the result. Accurate detection limits can be obtained by this method for the general case.

  17. Simple taper: Taper equations for the field forester

    Treesearch

    David R. Larsen

    2017-01-01

    "Simple taper" is set of linear equations that are based on stem taper rates; the intent is to provide taper equation functionality to field foresters. The equation parameters are two taper rates based on differences in diameter outside bark at two points on a tree. The simple taper equations are statistically equivalent to more complex equations. The linear...

  18. Using Simple Linear Regression to Assess the Success of the Montreal Protocol in Reducing Atmospheric Chlorofluorocarbons

    ERIC Educational Resources Information Center

    Nelson, Dean

    2009-01-01

    Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…

  19. Clairvoyant fusion: a new methodology for designing robust detection algorithms

    NASA Astrophysics Data System (ADS)

    Schaum, Alan

    2016-10-01

    Many realistic detection problems cannot be solved with simple statistical tests for known alternative probability models. Uncontrollable environmental conditions, imperfect sensors, and other uncertainties transform simple detection problems with likelihood ratio solutions into composite hypothesis (CH) testing problems. Recently many multi- and hyperspectral sensing CH problems have been addressed with a new approach. Clairvoyant fusion (CF) integrates the optimal detectors ("clairvoyants") associated with every unspecified value of the parameters appearing in a detection model. For problems with discrete parameter values, logical rules emerge for combining the decisions of the associated clairvoyants. For many problems with continuous parameters, analytic methods of CF have been found that produce closed-form solutions-or approximations for intractable problems. Here the principals of CF are reviewed and mathematical insights are described that have proven useful in the derivation of solutions. It is also shown how a second-stage fusion procedure can be used to create theoretically superior detection algorithms for ALL discrete parameter problems.

  20. Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

    NASA Astrophysics Data System (ADS)

    Bugatti, Alessandro; Flammini, Alessandra; Migliorati, Pierangelo

    2002-12-01

    We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron). In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.

  1. Simple estimation procedures for regression analysis of interval-censored failure time data under the proportional hazards model.

    PubMed

    Sun, Jianguo; Feng, Yanqin; Zhao, Hui

    2015-01-01

    Interval-censored failure time data occur in many fields including epidemiological and medical studies as well as financial and sociological studies, and many authors have investigated their analysis (Sun, The statistical analysis of interval-censored failure time data, 2006; Zhang, Stat Modeling 9:321-343, 2009). In particular, a number of procedures have been developed for regression analysis of interval-censored data arising from the proportional hazards model (Finkelstein, Biometrics 42:845-854, 1986; Huang, Ann Stat 24:540-568, 1996; Pan, Biometrics 56:199-203, 2000). For most of these procedures, however, one drawback is that they involve estimation of both regression parameters and baseline cumulative hazard function. In this paper, we propose two simple estimation approaches that do not need estimation of the baseline cumulative hazard function. The asymptotic properties of the resulting estimates are given, and an extensive simulation study is conducted and indicates that they work well for practical situations.

  2. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  3. Asymptotically Optimal and Private Statistical Estimation

    NASA Astrophysics Data System (ADS)

    Smith, Adam

    Differential privacy is a definition of "privacy" for statistical databases. The definition is simple, yet it implies strong semantics even in the presence of an adversary with arbitrary auxiliary information about the database.

  4. Confidence of compliance: a Bayesian approach for percentile standards.

    PubMed

    McBride, G B; Ellis, J C

    2001-04-01

    Rules for assessing compliance with percentile standards commonly limit the number of exceedances permitted in a batch of samples taken over a defined assessment period. Such rules are commonly developed using classical statistical methods. Results from alternative Bayesian methods are presented (using beta-distributed prior information and a binomial likelihood), resulting in "confidence of compliance" graphs. These allow simple reading of the consumer's risk and the supplier's risks for any proposed rule. The influence of the prior assumptions required by the Bayesian technique on the confidence results is demonstrated, using two reference priors (uniform and Jeffreys') and also using optimistic and pessimistic user-defined priors. All four give less pessimistic results than does the classical technique, because interpreting classical results as "confidence of compliance" actually invokes a Bayesian approach with an extreme prior distribution. Jeffreys' prior is shown to be the most generally appropriate choice of prior distribution. Cost savings can be expected using rules based on this approach.

  5. The Use of Correctional “NO!” Approach to Reduce Destructive Behavior on Autism Student of CANDA Educational Institution in Surakarta

    NASA Astrophysics Data System (ADS)

    Anggraini, N.

    2017-02-01

    This research aims to reduce the destructive behavior such as throwing the learning materials on autism student by using correctional “NO!” approach in CANDA educational institution Surakarta. This research uses Single Subject Research (SSR) method with A-B design, it is baseline and intervention. Subject of this research is one autism student of CANDA educational institution named G.A.P. Data were collected through recording in direct observation in the form of recording events at the time of implementation baseline and intervention. Data were analyzed by simple descriptive statistical analysis and is displayed in graphical form. Based on the result of data analysis, it could be concluded that destructive behavior such as throwing the learning material on autism student was significantly reduced after given an intervention. Based on the research results, using correctional “NO!” approach can be used by teacher or therapist to reduce the destructive behavior on autism student.

  6. A Conceptual Framework for Pharmacodynamic Genome-wide Association Studies in Pharmacogenomics

    PubMed Central

    Wu, Rongling; Tong, Chunfa; Wang, Zhong; Mauger, David; Tantisira, Kelan; Szefler, Stanley J.; Chinchilli, Vernon M.; Israel, Elliot

    2013-01-01

    Summary Genome-wide association studies (GWAS) have emerged as a powerful tool to identify loci that affect drug response or susceptibility to adverse drug reactions. However, current GWAS based on a simple analysis of associations between genotype and phenotype ignores the biochemical reactions of drug response, thus limiting the scope of inference about its genetic architecture. To facilitate the inference of GWAS in pharmacogenomics, we sought to undertake the mathematical integration of the pharmacodynamic process of drug reactions through computational models. By estimating and testing the genetic control of pharmacodynamic and pharmacokinetic parameters, this mechanistic approach does not only enhance the biological and clinical relevance of significant genetic associations, but also improve the statistical power and robustness of gene detection. This report discusses the general principle and development of pharmacodynamics-based GWAS, highlights the practical use of this approach in addressing various pharmacogenomic problems, and suggests that this approach will be an important method to study the genetic architecture of drug responses or reactions. PMID:21920452

  7. Using Peptide-Level Proteomics Data for Detecting Differentially Expressed Proteins.

    PubMed

    Suomi, Tomi; Corthals, Garry L; Nevalainen, Olli S; Elo, Laura L

    2015-11-06

    The expression of proteins can be quantified in high-throughput means using different types of mass spectrometers. In recent years, there have emerged label-free methods for determining protein abundance. Although the expression is initially measured at the peptide level, a common approach is to combine the peptide-level measurements into protein-level values before differential expression analysis. However, this simple combination is prone to inconsistencies between peptides and may lose valuable information. To this end, we introduce here a method for detecting differentially expressed proteins by combining peptide-level expression-change statistics. Using controlled spike-in experiments, we show that the approach of averaging peptide-level expression changes yields more accurate lists of differentially expressed proteins than does the conventional protein-level approach. This is particularly true when there are only few replicate samples or the differences between the sample groups are small. The proposed technique is implemented in the Bioconductor package PECA, and it can be downloaded from http://www.bioconductor.org.

  8. Statistical Tutorial | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data.  ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018.  The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.

  9. Using Statistical Process Control to Make Data-Based Clinical Decisions.

    ERIC Educational Resources Information Center

    Pfadt, Al; Wheeler, Donald J.

    1995-01-01

    Statistical process control (SPC), which employs simple statistical tools and problem-solving techniques such as histograms, control charts, flow charts, and Pareto charts to implement continual product improvement procedures, can be incorporated into human service organizations. Examples illustrate use of SPC procedures to analyze behavioral data…

  10. A new approach for the assessment of temporal clustering of extratropical wind storms

    NASA Astrophysics Data System (ADS)

    Schuster, Mareike; Eddounia, Fadoua; Kuhnel, Ivan; Ulbrich, Uwe

    2017-04-01

    A widely-used methodology to assess the clustering of storms in a region is based on dispersion statistics of a simple homogeneous Poisson process. This clustering measure is determined by the ratio of the variance and the mean of the local storm statistics per grid point. Resulting values larger than 1, i.e. when the variance is larger than the mean, indicate clustering; while values lower than 1 indicate a sequencing of storms that is more regular than a random process. However, a disadvantage of this methodology is that the characteristics are valid for a pre-defined climatological time period, and it is not possible to identify a temporal variability of clustering. Also, the absolute value of the dispersion statistics is not particularly intuitive. We have developed an approach to describe temporal clustering of storms which offers a more intuitive comprehension, and at the same time allows to assess temporal variations. The approach is based on the local distribution of waiting times between the occurrence of two individual storm events, the former being computed through the post-processing of individual windstorm tracks which in turn are obtained by an objective tracking algorithm. Based on this distribution a threshold can be set, either by the waiting time expected from a random process or by a quantile of the observed distribution. Thus, it can be determined if two consecutive wind storm events count as part of a (temporal) cluster. We analyze extratropical wind storms in a reanalysis dataset and compare the results of the traditional clustering measure with our new methodology. We assess what range of clustering events (in terms of duration and frequency) is covered and identify if the historically known clustered seasons are detectable by the new clustering measure in the reanalysis.

  11. Avoiding the ensemble decorrelation problem using member-by-member post-processing

    NASA Astrophysics Data System (ADS)

    Van Schaeybroeck, Bert; Vannitsem, Stéphane

    2014-05-01

    Forecast calibration or post-processing has become a standard tool in atmospheric and climatological science due to the presence of systematic initial condition and model errors. For ensemble forecasts the most competitive methods derive from the assumption of a fixed ensemble distribution. However, when independently applying such 'statistical' methods at different locations, lead times or for multiple variables the correlation structure for individual ensemble members is destroyed. Instead of reastablishing the correlation structure as in Schefzik et al. (2013) we instead propose a calibration method that avoids such problem by correcting each ensemble member individually. Moreover, we analyse the fundamental mechanisms by which the probabilistic ensemble skill can be enhanced. In terms of continuous ranked probability score, our member-by-member approach amounts to skill gain that extends for lead times far beyond the error doubling time and which is as good as the one of the most competitive statistical approach, non-homogeneous Gaussian regression (Gneiting et al. 2005). Besides the conservation of correlation structure, additional benefits arise including the fact that higher-order ensemble moments like kurtosis and skewness are inherited from the uncorrected forecasts. Our detailed analysis is performed in the context of the Kuramoto-Sivashinsky equation and different simple models but the results extent succesfully to the ensemble forecast of the European Centre for Medium-Range Weather Forecasts (Van Schaeybroeck and Vannitsem, 2013, 2014) . References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Schefzik, R., T.L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling. To appear in Statistical Science 28. [3] Van Schaeybroeck, B., and S. Vannitsem, 2013: Reliable probabilities through statistical post-processing of ensemble forecasts. Proceedings of the European Conference on Complex Systems 2012, Springer proceedings on complexity, XVI, p. 347-352. [4] Van Schaeybroeck, B., and S. Vannitsem, 2014: Ensemble post-processing using member-by-member approaches: theoretical aspects, under review.

  12. S-SPatt: simple statistics for patterns on Markov chains.

    PubMed

    Nuel, Grégory

    2005-07-01

    S-SPatt allows the counting of patterns occurrences in text files and, assuming these texts are generated from a random Markovian source, the computation of the P-value of a given observation using a simple binomial approximation.

  13. Clinical and pharmacogenomic data mining: 2. A simple method for the combination of information from associations and multivariances to facilitate analysis, decision, and design in clinical research and practice.

    PubMed

    Robson, Barry; Mushlin, Richard

    2004-01-01

    The physician and researcher must ultimately be able to combine qualitative and quantitative features from a variety of combinations of observations on data of many component items (i.e., many dimensions), and hence reach simple conclusions about interpretation, rational courses of action, and design. In the first paper of this series, it was noted that such needs are challenging the classical means of using statistics. Hence, the paper proposed the use of a Generalized Theory of Expected Information or "Zeta Theory". The conjoint event [a,b,c,..] is seen as a rule of association for a,b,c,.. associated with a rule strength I(a;b;c;...) = xi(s,o[a,b,c,..]) - xi (s,e[a,b,c,...]), where xi is the incomplete Zeta Function. Here, o[a,b,c,...] is the observed, and e[a,b,c,..] the expected, frequency of occurrence of conjoint event [a,b,c,...]. The present paper explores how output from this approach might be assembled in a form better suited for decision support. Related to this is the difficulty that the treatment of covariance and multivariance was previously rendered as a "fuzzy association" so that the output would fall into a similar form as the true associations, but this was a somewhat ad hoc approach in which only the final I( ) had any meaning. Users at clinical research sites had subsequently requested an alternative approach in which "effective frequencies" o[ ] and e[ ] calculated from the above variances and used to evaluate I( ) give some intuitive feeling analogous to the association treatment, and this is explored here. Though the present paper is theoretical, real examples are used to illustrate application. One clinical-genomic example illustrates experimental design by identifying data which is, or is not, statistically germane to the study. We also report on some impressions based on applying these techniques in studies of real, extensive patient record data which are now emerging, as well as on molecular design data originally studied in part to test the ability to deduce the effects of simple natural patient sequence variations ("SNPs") on patient protein activity. On the basis of these study experiences, methods of rationalizing and condensing the rules implied by associations and variances between data, as well as discussion of the difficulty of what is meant by "condensed", are presented in the Appendix.

  14. Statistics-related and reliability-physics-related failure processes in electronics devices and products

    NASA Astrophysics Data System (ADS)

    Suhir, E.

    2014-05-01

    The well known and widely used experimental reliability "passport" of a mass manufactured electronic or a photonic product — the bathtub curve — reflects the combined contribution of the statistics-related and reliability-physics (physics-of-failure)-related processes. When time progresses, the first process results in a decreasing failure rate, while the second process associated with the material aging and degradation leads to an increased failure rate. An attempt has been made in this analysis to assess the level of the reliability physics-related aging process from the available bathtub curve (diagram). It is assumed that the products of interest underwent the burn-in testing and therefore the obtained bathtub curve does not contain the infant mortality portion. It has been also assumed that the two random processes in question are statistically independent, and that the failure rate of the physical process can be obtained by deducting the theoretically assessed statistical failure rate from the bathtub curve ordinates. In the carried out numerical example, the Raleigh distribution for the statistical failure rate was used, for the sake of a relatively simple illustration. The developed methodology can be used in reliability physics evaluations, when there is a need to better understand the roles of the statistics-related and reliability-physics-related irreversible random processes in reliability evaluations. The future work should include investigations on how powerful and flexible methods and approaches of the statistical mechanics can be effectively employed, in addition to reliability physics techniques, to model the operational reliability of electronic and photonic products.

  15. Equivalence between Step Selection Functions and Biased Correlated Random Walks for Statistical Inference on Animal Movement.

    PubMed

    Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul

    2015-01-01

    Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis.

  16. Thermodynamic properties of non-conformal soft-sphere fluids with effective hard-sphere diameters.

    PubMed

    Rodríguez-López, Tonalli; del Río, Fernando

    2012-01-28

    In this work we study a set of soft-sphere systems characterised by a well-defined variation of their softness. These systems represent an extension of the repulsive Lennard-Jones potential widely used in statistical mechanics of fluids. This type of soft spheres is of interest because they represent quite accurately the effective intermolecular repulsion in fluid substances and also because they exhibit interesting properties. The thermodynamics of the soft-sphere fluids is obtained via an effective hard-sphere diameter approach that leads to a compact and accurate equation of state. The virial coefficients of soft spheres are shown to follow quite simple relationships that are incorporated into the equation of state. The approach followed exhibits the rescaling of the density that produces a unique equation for all systems and temperatures. The scaling is carried through to the level of the structure of the fluids.

  17. Friction Laws Derived From the Acoustic Emissions of a Laboratory Fault by Machine Learning

    NASA Astrophysics Data System (ADS)

    Rouet-Leduc, B.; Hulbert, C.; Ren, C. X.; Bolton, D. C.; Marone, C.; Johnson, P. A.

    2017-12-01

    Fault friction controls nearly all aspects of fault rupture, yet it is only possible to measure in the laboratory. Here we describe laboratory experiments where acoustic emissions are recorded from the fault. We find that by applying a machine learning approach known as "extreme gradient boosting trees" to the continuous acoustical signal, the fault friction can be directly inferred, showing that instantaneous characteristics of the acoustic signal are a fingerprint of the frictional state. This machine learning-based inference leads to a simple law that links the acoustic signal to the friction state, and holds for every stress cycle the laboratory fault goes through. The approach does not use any other measured parameter than instantaneous statistics of the acoustic signal. This finding may have importance for inferring frictional characteristics from seismic waves in Earth where fault friction cannot be measured.

  18. A nonlinear quality-related fault detection approach based on modified kernel partial least squares.

    PubMed

    Jiao, Jianfang; Zhao, Ning; Wang, Guang; Yin, Shen

    2017-01-01

    In this paper, a new nonlinear quality-related fault detection method is proposed based on kernel partial least squares (KPLS) model. To deal with the nonlinear characteristics among process variables, the proposed method maps these original variables into feature space in which the linear relationship between kernel matrix and output matrix is realized by means of KPLS. Then the kernel matrix is decomposed into two orthogonal parts by singular value decomposition (SVD) and the statistics for each part are determined appropriately for the purpose of quality-related fault detection. Compared with relevant existing nonlinear approaches, the proposed method has the advantages of simple diagnosis logic and stable performance. A widely used literature example and an industrial process are used for the performance evaluation for the proposed method. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  19. GPS baseline configuration design based on robustness analysis

    NASA Astrophysics Data System (ADS)

    Yetkin, M.; Berber, M.

    2012-11-01

    The robustness analysis results obtained from a Global Positioning System (GPS) network are dramatically influenced by the configurationof the observed baselines. The selection of optimal GPS baselines may allow for a cost effective survey campaign and a sufficiently robustnetwork. Furthermore, using the approach described in this paper, the required number of sessions, the baselines to be observed, and thesignificance levels for statistical testing and robustness analysis can be determined even before the GPS campaign starts. In this study, wepropose a robustness criterion for the optimal design of geodetic networks, and present a very simple and efficient algorithm based on thiscriterion for the selection of optimal GPS baselines. We also show the relationship between the number of sessions and the non-centralityparameter. Finally, a numerical example is given to verify the efficacy of the proposed approach.

  20. Density functional theory for polymeric systems in 2D.

    PubMed

    Słyk, Edyta; Roth, Roland; Bryk, Paweł

    2016-06-22

    We propose density functional theory for polymeric fluids in two dimensions. The approach is based on Wertheim's first order thermodynamic perturbation theory (TPT) and closely follows density functional theory for polymers proposed by Yu and Wu (2002 J. Chem. Phys. 117 2368). As a simple application we evaluate the density profiles of tangent hard-disk polymers at hard walls. The theoretical predictions are compared against the results of the Monte Carlo simulations. We find that for short chain lengths the theoretical density profiles are in an excellent agreement with the Monte Carlo data. The agreement is less satisfactory for longer chains. The performance of the theory can be improved by recasting the approach using the self-consistent field theory formalism. When the self-avoiding chain statistics is used, the theory yields a marked improvement in the low density limit. Further improvements for long chains could be reached by going beyond the first order of TPT.

  1. Using R-Project for Free Statistical Analysis in Extension Research

    ERIC Educational Resources Information Center

    Mangiafico, Salvatore S.

    2013-01-01

    One option for Extension professionals wishing to use free statistical software is to use online calculators, which are useful for common, simple analyses. A second option is to use a free computing environment capable of performing statistical analyses, like R-project. R-project is free, cross-platform, powerful, and respected, but may be…

  2. Using Data from Climate Science to Teach Introductory Statistics

    ERIC Educational Resources Information Center

    Witt, Gary

    2013-01-01

    This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…

  3. Using R in Introductory Statistics Courses with the pmg Graphical User Interface

    ERIC Educational Resources Information Center

    Verzani, John

    2008-01-01

    The pmg add-on package for the open source statistics software R is described. This package provides a simple to use graphical user interface (GUI) that allows introductory statistics students, without advanced computing skills, to quickly create the graphical and numeric summaries expected of them. (Contains 9 figures.)

  4. Evaluation of Two Statistical Methods Provides Insights into the Complex Patterns of Alternative Polyadenylation Site Switching

    PubMed Central

    Li, Jie; Li, Rui; You, Leiming; Xu, Anlong; Fu, Yonggui; Huang, Shengfeng

    2015-01-01

    Switching between different alternative polyadenylation (APA) sites plays an important role in the fine tuning of gene expression. New technologies for the execution of 3’-end enriched RNA-seq allow genome-wide detection of the genes that exhibit significant APA site switching between different samples. Here, we show that the independence test gives better results than the linear trend test in detecting APA site-switching events. Further examination suggests that the discrepancy between these two statistical methods arises from complex APA site-switching events that cannot be represented by a simple change of average 3’-UTR length. In theory, the linear trend test is only effective in detecting these simple changes. We classify the switching events into four switching patterns: two simple patterns (3’-UTR shortening and lengthening) and two complex patterns. By comparing the results of the two statistical methods, we show that complex patterns account for 1/4 of all observed switching events that happen between normal and cancerous human breast cell lines. Because simple and complex switching patterns may convey different biological meanings, they merit separate study. We therefore propose to combine both the independence test and the linear trend test in practice. First, the independence test should be used to detect APA site switching; second, the linear trend test should be invoked to identify simple switching events; and third, those complex switching events that pass independence testing but fail linear trend testing can be identified. PMID:25875641

  5. Techniques for estimating selected streamflow characteristics of rural unregulated streams in Ohio

    USGS Publications Warehouse

    Koltun, G.F.; Whitehead, Matthew T.

    2002-01-01

    This report provides equations for estimating mean annual streamflow, mean monthly streamflows, harmonic mean streamflow, and streamflow quartiles (the 25th-, 50th-, and 75th-percentile streamflows) as a function of selected basin characteristics for rural, unregulated streams in Ohio. The equations were developed from streamflow statistics and basin-characteristics data for as many as 219 active or discontinued streamflow-gaging stations on rural, unregulated streams in Ohio with 10 or more years of homogenous daily streamflow record. Streamflow statistics and basin-characteristics data for the 219 stations are presented in this report. Simple equations (based on drainage area only) and best-fit equations (based on drainage area and at least two other basin characteristics) were developed by means of ordinary least-squares regression techniques. Application of the best-fit equations generally involves quantification of basin characteristics that require or are facilitated by use of a geographic information system. In contrast, the simple equations can be used with information that can be obtained without use of a geographic information system; however, the simple equations have larger prediction errors than the best-fit equations and exhibit geographic biases for most streamflow statistics. The best-fit equations should be used instead of the simple equations whenever possible.

  6. Simple versus composite indicators of socioeconomic status in resource allocation formulae: the case of the district resource allocation formula in Malawi

    PubMed Central

    2010-01-01

    Background The district resource allocation formula in Malawi was recently reviewed to include stunting as a proxy measure of socioeconomic status. In many countries where the concept of need has been incorporated in resource allocation, composite indicators of socioeconomic status have been used. In the Malawi case, it is important to ascertain whether there are differences between using single variable or composite indicators of socioeconomic status in allocations made to districts, holding all other factors in the resource allocation formula constant. Methods Principal components analysis was used to calculate asset indices for all districts from variables that capture living standards using data from the Malawi Multiple Indicator Cluster Survey 2006. These were normalized and used to weight district populations. District proportions of national population weighted by both the simple and composite indicators were then calculated for all districts and compared. District allocations were also calculated using the two approaches and compared. Results The two types of indicators are highly correlated, with a spearman rank correlation coefficient of 0.97 at the 1% level of significance. For 21 out of the 26 districts included in the study, proportions of national population weighted by the simple indicator are higher by an average of 0.6 percentage points. For the remaining 5 districts, district proportions of national population weighted by the composite indicator are higher by an average of 2 percentage points. Though the average percentage point differences are low and the actual allocations using both approaches highly correlated (ρ of 0.96), differences in actual allocations exceed 10% for 8 districts and have an average of 4.2% for the remaining 17. For 21 districts allocations based on the single variable indicator are higher. Conclusions Variations in district allocations made using either the simple or composite indicators of socioeconomic status are not statistically different to recommend one over the other. However, the single variable indicator is favourable for its ease of computation. PMID:20053274

  7. Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations.

    PubMed

    Szarka, Arpad Z; Hayworth, Carol G; Ramanarayanan, Tharacad S; Joseph, Robert S I

    2018-06-26

    The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.

  8. Measuring the statistical validity of summary meta-analysis and meta-regression results for use in clinical practice.

    PubMed

    Willis, Brian H; Riley, Richard D

    2017-09-20

    An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  9. Characterizing complex networks through statistics of Möbius transformations

    NASA Astrophysics Data System (ADS)

    Jaćimović, Vladimir; Crnkić, Aladin

    2017-04-01

    It is well-known now that dynamics of large populations of globally (all-to-all) coupled oscillators can be reduced to low-dimensional submanifolds (WS transformation and OA ansatz). Marvel et al. (2009) described an intriguing algebraic structure standing behind this reduction: oscillators evolve by the action of the group of Möbius transformations. Of course, dynamics in complex networks of coupled oscillators is highly complex and not reducible. Still, closer look unveils that even in complex networks some (possibly overlapping) groups of oscillators evolve by Möbius transformations. In this paper, we study properties of the network by identifying Möbius transformations in the dynamics of oscillators. This enables us to introduce some new (statistical) concepts that characterize the network. In particular, the notion of coherence of the network (or subnetwork) is proposed. This conceptual approach is meaningful for the broad class of networks, including those with time-delayed, noisy or mixed interactions. In this paper, several simple (random) graphs are studied illustrating the meaning of the concepts introduced in the paper.

  10. Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.

    PubMed

    Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S

    2017-01-05

    The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.

  11. Stochastic Spatial Models in Ecology: A Statistical Physics Approach

    NASA Astrophysics Data System (ADS)

    Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.

    2018-07-01

    Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.

  12. Stochastic Spatial Models in Ecology: A Statistical Physics Approach

    NASA Astrophysics Data System (ADS)

    Pigolotti, Simone; Cencini, Massimo; Molina, Daniel; Muñoz, Miguel A.

    2017-11-01

    Ecosystems display a complex spatial organization. Ecologists have long tried to characterize them by looking at how different measures of biodiversity change across spatial scales. Ecological neutral theory has provided simple predictions accounting for general empirical patterns in communities of competing species. However, while neutral theory in well-mixed ecosystems is mathematically well understood, spatial models still present several open problems, limiting the quantitative understanding of spatial biodiversity. In this review, we discuss the state of the art in spatial neutral theory. We emphasize the connection between spatial ecological models and the physics of non-equilibrium phase transitions and how concepts developed in statistical physics translate in population dynamics, and vice versa. We focus on non-trivial scaling laws arising at the critical dimension D = 2 of spatial neutral models, and their relevance for biological populations inhabiting two-dimensional environments. We conclude by discussing models incorporating non-neutral effects in the form of spatial and temporal disorder, and analyze how their predictions deviate from those of purely neutral theories.

  13. Performance metrics for the assessment of satellite data products: an ocean color case study

    PubMed Central

    Seegers, Bridget N.; Stumpf, Richard P.; Schaeffer, Blake A.; Loftin, Keith A.; Werdell, P. Jeremy

    2018-01-01

    Performance assessment of ocean color satellite data has generally relied on statistical metrics chosen for their common usage and the rationale for selecting certain metrics is infrequently explained. Commonly reported statistics based on mean squared errors, such as the coefficient of determination (r2), root mean square error, and regression slopes, are most appropriate for Gaussian distributions without outliers and, therefore, are often not ideal for ocean color algorithm performance assessment, which is often limited by sample availability. In contrast, metrics based on simple deviations, such as bias and mean absolute error, as well as pair-wise comparisons, often provide more robust and straightforward quantities for evaluating ocean color algorithms with non-Gaussian distributions and outliers. This study uses a SeaWiFS chlorophyll-a validation data set to demonstrate a framework for satellite data product assessment and recommends a multi-metric and user-dependent approach that can be applied within science, modeling, and resource management communities. PMID:29609296

  14. Simple, empirical approach to predict neutron capture cross sections from nuclear masses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Couture, Aaron Joseph; Casten, Richard F.; Cakirli, R. B.

    Here, neutron capture cross sections are essential to understanding the astrophysical s and r processes, the modeling of nuclear reactor design and performance, and for a wide variety of nuclear forensics applications. Often, cross sections are needed for nuclei where experimental measurements are difficult. Enormous effort, over many decades, has gone into attempting to develop sophisticated statistical reaction models to predict these cross sections. Such work has met with some success but is often unable to reproduce measured cross sections to better than 40%, and has limited predictive power, with predictions from different models rapidly differing by an order ofmore » magnitude a few nucleons from the last measurement.« less

  15. Simple, empirical approach to predict neutron capture cross sections from nuclear masses

    DOE PAGES

    Couture, Aaron Joseph; Casten, Richard F.; Cakirli, R. B.

    2017-12-20

    Here, neutron capture cross sections are essential to understanding the astrophysical s and r processes, the modeling of nuclear reactor design and performance, and for a wide variety of nuclear forensics applications. Often, cross sections are needed for nuclei where experimental measurements are difficult. Enormous effort, over many decades, has gone into attempting to develop sophisticated statistical reaction models to predict these cross sections. Such work has met with some success but is often unable to reproduce measured cross sections to better than 40%, and has limited predictive power, with predictions from different models rapidly differing by an order ofmore » magnitude a few nucleons from the last measurement.« less

  16. A Simple and Computationally Efficient Approach to Multifactor Dimensionality Reduction Analysis of Gene-Gene Interactions for Quantitative Traits

    PubMed Central

    Gui, Jiang; Moore, Jason H.; Williams, Scott M.; Andrews, Peter; Hillege, Hans L.; van der Harst, Pim; Navis, Gerjan; Van Gilst, Wiek H.; Asselbergs, Folkert W.; Gilbert-Diamond, Diane

    2013-01-01

    We present an extension of the two-class multifactor dimensionality reduction (MDR) algorithm that enables detection and characterization of epistatic SNP-SNP interactions in the context of a quantitative trait. The proposed Quantitative MDR (QMDR) method handles continuous data by modifying MDR’s constructive induction algorithm to use a T-test. QMDR replaces the balanced accuracy metric with a T-test statistic as the score to determine the best interaction model. We used a simulation to identify the empirical distribution of QMDR’s testing score. We then applied QMDR to genetic data from the ongoing prospective Prevention of Renal and Vascular End-Stage Disease (PREVEND) study. PMID:23805232

  17. Cloud Compute for Global Climate Station Summaries

    NASA Astrophysics Data System (ADS)

    Baldwin, R.; May, B.; Cogbill, P.

    2017-12-01

    Global Climate Station Summaries are simple indicators of observational normals which include climatic data summarizations and frequency distributions. These typically are statistical analyses of station data over 5-, 10-, 20-, 30-year or longer time periods. The summaries are computed from the global surface hourly dataset. This dataset totaling over 500 gigabytes is comprised of 40 different types of weather observations with 20,000 stations worldwide. NCEI and the U.S. Navy developed these value added products in the form of hourly summaries from many of these observations. Enabling this compute functionality in the cloud is the focus of the project. An overview of approach and challenges associated with application transition to the cloud will be presented.

  18. No-Fault Malpractice Insurance

    PubMed Central

    Bush, J. W.; Chen, M. M.; Bush, A. S.

    1975-01-01

    No-fault medical malpractice insurance has been proposed as an alternative to the present tort liability approach. Statistical examination of the concept of proximate cause reveals not only that the question of acceptable care, and therefore of fault, is unavoidable in identifying patients deserving compensation, but also that specifying fault in an individual case is scientifically untenable. A simple formula for a Coefficient of Causality clarifies the question of proximate cause in existing trial practices and suggests that many of the threats associated with malpractice suits arise from the structure of the tort-insurance system rather than from professional responsibility for medical injury. The concepts could provide the basis for a revised claims and compensation procedure. PMID:1146300

  19. Current Status and Challenges of Atmospheric Data Assimilation

    NASA Astrophysics Data System (ADS)

    Atlas, R. M.; Gelaro, R.

    2016-12-01

    The issues of modern atmospheric data assimilation are fairly simple to comprehend but difficult to address, involving the combination of literally billions of model variables and tens of millions of observations daily. In addition to traditional meteorological variables such as wind, temperature pressure and humidity, model state vectors are being expanded to include explicit representation of precipitation, clouds, aerosols and atmospheric trace gases. At the same time, model resolutions are approaching single-kilometer scales globally and new observation types have error characteristics that are increasingly non-Gaussian. This talk describes the current status and challenges of atmospheric data assimilation, including an overview of current methodologies, the difficulty of estimating error statistics, and progress toward coupled earth system analyses.

  20. Do the Modified Uncertainty Principle and Polymer Quantization predict same physics?

    NASA Astrophysics Data System (ADS)

    Majumder, Barun; Sen, Sourav

    2012-10-01

    In this Letter we study the effects of the Modified Uncertainty Principle as proposed in Ali et al. (2009) [5] in simple quantum mechanical systems and study its thermodynamic properties. We have assumed that the quantum particles follow Maxwell-Boltzmann statistics with no spin. We compare our results with the results found in the GUP and polymer quantum mechanical frameworks. Interestingly we find that the corrected thermodynamic entities are exactly the same compared to the polymer results but the length scale considered has a theoretically different origin. Hence we express the need of further study for an investigation whether these two approaches are conceptually connected in the fundamental level.

  1. Tuning the critical solution temperature of polymers by copolymerization

    NASA Astrophysics Data System (ADS)

    Schulz, Bernhard; Chudoba, Richard; Heyda, Jan; Dzubiella, Joachim

    2015-12-01

    We study statistical copolymerization effects on the upper critical solution temperature (CST) of generic homopolymers by means of coarse-grained Langevin dynamics computer simulations and mean-field theory. Our systematic investigation reveals that the CST can change monotonically or non-monotonically with copolymerization, as observed in experimental studies, depending on the degree of non-additivity of the monomer (A-B) cross-interactions. The simulation findings are confirmed and qualitatively explained by a combination of a two-component Flory-de Gennes model for polymer collapse and a simple thermodynamic expansion approach. Our findings provide some rationale behind the effects of copolymerization and may be helpful for tuning CST behavior of polymers in soft material design.

  2. A paper-based cantilever array sensor: Monitoring volatile organic compounds with naked eye.

    PubMed

    Fraiwan, Arwa; Lee, Hankeun; Choi, Seokheun

    2016-09-01

    Volatile organic compound (VOC) detection is critical for controlling industrial and commercial emissions, environmental monitoring, and public health. Simple, portable, rapid and low-cost VOC sensing platforms offer the benefits of on-site and real-time monitoring anytime and anywhere. The best and most practically useful approaches to monitoring would include equipment-free and power-free detection by the naked eye. In this work, we created a novel, paper-based cantilever sensor array that allows simple and rapid naked-eye VOC detection without the need for power, electronics or readout interface/equipment. This simple VOC detection method was achieved using (i) low-cost paper materials as a substrate and (ii) swellable thin polymers adhered to the paper. Upon exposure to VOCs, the polymer swelling adhered to the paper-based cantilever, inducing mechanical deflection that generated a distinctive composite pattern of the deflection angles for a specific VOC. The angle is directly measured by the naked eye on a 3-D protractor printed on a paper facing the cantilevers. The generated angle patterns are subjected to statistical algorithms (linear discriminant analysis (LDA)) to classify each VOC sample and selectively detect a VOC. We classified four VOC samples with 100% accuracy using LDA. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Emergent dynamic structures and statistical law in spherical lattice gas automata.

    PubMed

    Yao, Zhenwei

    2017-12-01

    Various lattice gas automata have been proposed in the past decades to simulate physics and address a host of problems on collective dynamics arising in diverse fields. In this work, we employ the lattice gas model defined on the sphere to investigate the curvature-driven dynamic structures and analyze the statistical behaviors in equilibrium. Under the simple propagation and collision rules, we show that the uniform collective movement of the particles on the sphere is geometrically frustrated, leading to several nonequilibrium dynamic structures not found in the planar lattice, such as the emergent bubble and vortex structures. With the accumulation of the collision effect, the system ultimately reaches equilibrium in the sense that the distribution of the coarse-grained speed approaches the two-dimensional Maxwell-Boltzmann distribution despite the population fluctuations in the coarse-grained cells. The emergent regularity in the statistical behavior of the system is rationalized by mapping our system to a generalized random walk model. This work demonstrates the capability of the spherical lattice gas automaton in revealing the lattice-guided dynamic structures and simulating the equilibrium physics. It suggests the promising possibility of using lattice gas automata defined on various curved surfaces to explore geometrically driven nonequilibrium physics.

  4. Emergent dynamic structures and statistical law in spherical lattice gas automata

    NASA Astrophysics Data System (ADS)

    Yao, Zhenwei

    2017-12-01

    Various lattice gas automata have been proposed in the past decades to simulate physics and address a host of problems on collective dynamics arising in diverse fields. In this work, we employ the lattice gas model defined on the sphere to investigate the curvature-driven dynamic structures and analyze the statistical behaviors in equilibrium. Under the simple propagation and collision rules, we show that the uniform collective movement of the particles on the sphere is geometrically frustrated, leading to several nonequilibrium dynamic structures not found in the planar lattice, such as the emergent bubble and vortex structures. With the accumulation of the collision effect, the system ultimately reaches equilibrium in the sense that the distribution of the coarse-grained speed approaches the two-dimensional Maxwell-Boltzmann distribution despite the population fluctuations in the coarse-grained cells. The emergent regularity in the statistical behavior of the system is rationalized by mapping our system to a generalized random walk model. This work demonstrates the capability of the spherical lattice gas automaton in revealing the lattice-guided dynamic structures and simulating the equilibrium physics. It suggests the promising possibility of using lattice gas automata defined on various curved surfaces to explore geometrically driven nonequilibrium physics.

  5. Linking Mechanics and Statistics in Epidermal Tissues

    NASA Astrophysics Data System (ADS)

    Kim, Sangwoo; Hilgenfeldt, Sascha

    2015-03-01

    Disordered cellular structures, such as foams, polycrystals, or living tissues, can be characterized by quantitative measurements of domain size and topology. In recent work, we showed that correlations between size and topology in 2D systems are sensitive to the shape (eccentricity) of the individual domains: From a local model of neighbor relations, we derived an analytical justification for the famous empirical Lewis law, confirming the theory with experimental data from cucumber epidermal tissue. Here, we go beyond this purely geometrical model and identify mechanical properties of the tissue as the root cause for the domain eccentricity and thus the statistics of tissue structure. The simple model approach is based on the minimization of an interfacial energy functional. Simulations with Surface Evolver show that the domain statistics depend on a single mechanical parameter, while parameter fluctuations from cell to cell play an important role in simultaneously explaining the shape distribution of cells. The simulations are in excellent agreement with experiments and analytical theory, and establish a general link between the mechanical properties of a tissue and its structure. The model is relevant to diagnostic applications in a variety of animal and plant tissues.

  6. Functionalization of surfactant wrapped graphenenanosheets with alkylazides for enhanced dispersibility

    NASA Astrophysics Data System (ADS)

    Vadukumpully, Sajini; Gupta, Jhinuk; Zhang, Yongping; Xu, Guo Qin; Valiyaveettil, Suresh

    2011-01-01

    A facile and simple approach for the covalent functionalization of surfactant wrapped graphene sheets is described. The approach involves functionalization of dispersible graphene sheets with various alkylazides and 11-azidoundecanoic acid proved the best azide for enhanced dispersibility. The functionalization was confirmed by infrared spectroscopy and scanning tunneling microscopy. The free carboxylic acidgroups can bind to gold nanoparticles, which were introduced as markers for the reactive sites. The interaction between gold nanoparticles and the graphene sheets was followed by UV-vis spectroscopy. The gold nanoparticle-graphene composite was characterized by transmission electron microscopy and atomic force microscopy, demonstrating the uniform distribution of gold nanoparticles all over the surface. Our results open the possibility to control the functionalization on graphene in the construction of composite nanomaterials.A facile and simple approach for the covalent functionalization of surfactant wrapped graphene sheets is described. The approach involves functionalization of dispersible graphene sheets with various alkylazides and 11-azidoundecanoic acid proved the best azide for enhanced dispersibility. The functionalization was confirmed by infrared spectroscopy and scanning tunneling microscopy. The free carboxylic acidgroups can bind to gold nanoparticles, which were introduced as markers for the reactive sites. The interaction between gold nanoparticles and the graphene sheets was followed by UV-vis spectroscopy. The gold nanoparticle-graphene composite was characterized by transmission electron microscopy and atomic force microscopy, demonstrating the uniform distribution of gold nanoparticles all over the surface. Our results open the possibility to control the functionalization on graphene in the construction of composite nanomaterials. Electronic Supplementary Information (ESI) available: Synthesis and characterization details of dodecylazide, hexylazide, 11-azidoundecanol (AUO), micrographs (SEM and TEM images) of the various azide functionalized samples and the statistical analysis of the graphene thickness. See 10.1039/c0nr00547a.

  7. Fault Detection and Diagnosis In Hall-Héroult Cells Based on Individual Anode Current Measurements Using Dynamic Kernel PCA

    NASA Astrophysics Data System (ADS)

    Yao, Yuchen; Bao, Jie; Skyllas-Kazacos, Maria; Welch, Barry J.; Akhmetov, Sergey

    2018-04-01

    Individual anode current signals in aluminum reduction cells provide localized cell conditions in the vicinity of each anode, which contain more information than the conventionally measured cell voltage and line current. One common use of this measurement is to identify process faults that can cause significant changes in the anode current signals. While this method is simple and direct, it ignores the interactions between anode currents and other important process variables. This paper presents an approach that applies multivariate statistical analysis techniques to individual anode currents and other process operating data, for the detection and diagnosis of local process abnormalities in aluminum reduction cells. Specifically, since the Hall-Héroult process is time-varying with its process variables dynamically and nonlinearly correlated, dynamic kernel principal component analysis with moving windows is used. The cell is discretized into a number of subsystems, with each subsystem representing one anode and cell conditions in its vicinity. The fault associated with each subsystem is identified based on multivariate statistical control charts. The results show that the proposed approach is able to not only effectively pinpoint the problematic areas in the cell, but also assess the effect of the fault on different parts of the cell.

  8. Star Masses and Star-Planet Distances for Earth-like Habitability.

    PubMed

    Waltham, David

    2017-01-01

    This paper presents statistical estimates for the location and duration of habitable zones (HZs) around stars of different mass. The approach is based upon the assumption that Earth's location, and the Sun's mass, should not be highly atypical of inhabited planets. The results support climate-model-based estimates for the location of the Sun's HZ except models giving a present-day outer-edge beyond 1.64 AU. The statistical approach also demonstrates that there is a habitability issue for stars smaller than 0.65 solar masses since, otherwise, Earth would be an extremely atypical inhabited world. It is difficult to remove this anomaly using the assumption that poor habitability of planets orbiting low-mass stars results from unfavorable radiation regimes either before, or after, their stars enter the main sequence. However, the anomaly is well explained if poor habitability results from tidal locking of planets in the HZs of small stars. The expected host-star mass for planets with intelligent life then has a 95% confidence range of 0.78 M ⊙ < M < 1.04 M ⊙ , and the range for planets with at least simple life is 0.57 M ⊙  < M < 1.64 M ⊙ . Key Words: Habitability-Habitable zone-Anthropic-Red dwarfs-Initial mass function. Astrobiology 17, 61-77.

  9. Star Masses and Star-Planet Distances for Earth-like Habitability

    PubMed Central

    2017-01-01

    Abstract This paper presents statistical estimates for the location and duration of habitable zones (HZs) around stars of different mass. The approach is based upon the assumption that Earth's location, and the Sun's mass, should not be highly atypical of inhabited planets. The results support climate-model-based estimates for the location of the Sun's HZ except models giving a present-day outer-edge beyond 1.64 AU. The statistical approach also demonstrates that there is a habitability issue for stars smaller than 0.65 solar masses since, otherwise, Earth would be an extremely atypical inhabited world. It is difficult to remove this anomaly using the assumption that poor habitability of planets orbiting low-mass stars results from unfavorable radiation regimes either before, or after, their stars enter the main sequence. However, the anomaly is well explained if poor habitability results from tidal locking of planets in the HZs of small stars. The expected host-star mass for planets with intelligent life then has a 95% confidence range of 0.78 M⊙ < M < 1.04 M⊙, and the range for planets with at least simple life is 0.57 M⊙ < M < 1.64 M⊙. Key Words: Habitability—Habitable zone—Anthropic—Red dwarfs—Initial mass function. Astrobiology 17, 61–77. PMID:28103107

  10. Water quality assessment by means of HFNI valvometry and high-frequency data modeling.

    PubMed

    Sow, Mohamedou; Durrieu, Gilles; Briollais, Laurent; Ciret, Pierre; Massabuau, Jean-Charles

    2011-11-01

    The high-frequency measurements of valve activity in bivalves (e.g., valvometry) over a long period of time and in various environmental conditions allow a very accurate study of their behaviors as well as a global analysis of possible perturbations due to the environment. Valvometry uses the bivalve's ability to close its shell when exposed to a contaminant or other abnormal environmental conditions as an alarm to indicate possible perturbations in the environment. The modeling of such high-frequency serial valvometry data is statistically challenging, and here, a nonparametric approach based on kernel estimation is proposed. This method has the advantage of summarizing complex data into a simple density profile obtained from each animal at every 24-h period to ultimately make inference about time effect and external conditions on this profile. The statistical properties of the estimator are presented. Through an application to a sample of 16 oysters living in the Bay of Arcachon (France), we demonstrate that this method can be used to first estimate the normal biological rhythms of permanently immersed oysters and second to detect perturbations of these rhythms due to changes in their environment. We anticipate that this approach could have an important contribution to the survey of aquatic systems.

  11. Simple but accurate GCM-free approach for quantifying anthropogenic climate change

    NASA Astrophysics Data System (ADS)

    Lovejoy, S.

    2014-12-01

    We are so used to analysing the climate with the help of giant computer models (GCM's) that it is easy to get the impression that they are indispensable. Yet anthropogenic warming is so large (roughly 0.9oC) that it turns out that it is straightforward to quantify it with more empirically based methodologies that can be readily understood by the layperson. The key is to use the CO2 forcing as a linear surrogate for all the anthropogenic effects from 1880 to the present (implicitly including all effects due to Greenhouse Gases, aerosols and land use changes). To a good approximation, double the economic activity, double the effects. The relationship between the forcing and global mean temperature is extremely linear as can be seen graphically and understood without fancy statistics, [Lovejoy, 2014a] (see the attached figure and http://www.physics.mcgill.ca/~gang/Lovejoy.htm). To an excellent approximation, the deviations from the linear forcing - temperature relation can be interpreted as the natural variability. For example, this direct - yet accurate approach makes it graphically obvious that the "pause" or "hiatus" in the warming since 1998 is simply a natural cooling event that has roughly offset the anthropogenic warming [Lovejoy, 2014b]. Rather than trying to prove that the warming is anthropogenic, with a little extra work (and some nonlinear geophysics theory and pre-industrial multiproxies) we can disprove the competing theory that it is natural. This approach leads to the estimate that the probability of the industrial scale warming being a giant natural fluctuation is ≈0.1%: it can be dismissed. This destroys the last climate skeptic argument - that the models are wrong and the warming is natural. It finally allows for a closure of the debate. In this talk we argue that this new, direct, simple, intuitive approach provides an indispensable tool for communicating - and convincing - the public of both the reality and the amplitude of anthropogenic warming. ReferencesLovejoy, S. (2014a), Scaling fluctuation analysis and statistical hypothesis testing of anthropogenic warming, Climate Dynamics, 42, 2339-2351 doi: 10.1007/s00382-014-2128-2. Lovejoy, S. (2014b), Return periods of global climate fluctuations and the pause, Geophys. Res. Lett., 41, 4704-4710 doi: doi: 10.1002/2014GL060478.

  12. The transformed-stationary approach: a generic and simplified methodology for non-stationary extreme value analysis

    NASA Astrophysics Data System (ADS)

    Mentaschi, Lorenzo; Vousdoukas, Michalis; Voukouvalas, Evangelos; Sartini, Ludovica; Feyen, Luc; Besio, Giovanni; Alfieri, Lorenzo

    2016-09-01

    Statistical approaches to study extreme events require, by definition, long time series of data. In many scientific disciplines, these series are often subject to variations at different temporal scales that affect the frequency and intensity of their extremes. Therefore, the assumption of stationarity is violated and alternative methods to conventional stationary extreme value analysis (EVA) must be adopted. Using the example of environmental variables subject to climate change, in this study we introduce the transformed-stationary (TS) methodology for non-stationary EVA. This approach consists of (i) transforming a non-stationary time series into a stationary one, to which the stationary EVA theory can be applied, and (ii) reverse transforming the result into a non-stationary extreme value distribution. As a transformation, we propose and discuss a simple time-varying normalization of the signal and show that it enables a comprehensive formulation of non-stationary generalized extreme value (GEV) and generalized Pareto distribution (GPD) models with a constant shape parameter. A validation of the methodology is carried out on time series of significant wave height, residual water level, and river discharge, which show varying degrees of long-term and seasonal variability. The results from the proposed approach are comparable with the results from (a) a stationary EVA on quasi-stationary slices of non-stationary series and (b) the established method for non-stationary EVA. However, the proposed technique comes with advantages in both cases. For example, in contrast to (a), the proposed technique uses the whole time horizon of the series for the estimation of the extremes, allowing for a more accurate estimation of large return levels. Furthermore, with respect to (b), it decouples the detection of non-stationary patterns from the fitting of the extreme value distribution. As a result, the steps of the analysis are simplified and intermediate diagnostics are possible. In particular, the transformation can be carried out by means of simple statistical techniques such as low-pass filters based on the running mean and the standard deviation, and the fitting procedure is a stationary one with a few degrees of freedom and is easy to implement and control. An open-source MATLAB toolbox has been developed to cover this methodology, which is available at https://github.com/menta78/tsEva/ (Mentaschi et al., 2016).

  13. Humans make efficient use of natural image statistics when performing spatial interpolation.

    PubMed

    D'Antona, Anthony D; Perry, Jeffrey S; Geisler, Wilson S

    2013-12-16

    Visual systems learn through evolution and experience over the lifespan to exploit the statistical structure of natural images when performing visual tasks. Understanding which aspects of this statistical structure are incorporated into the human nervous system is a fundamental goal in vision science. To address this goal, we measured human ability to estimate the intensity of missing image pixels in natural images. Human estimation accuracy is compared with various simple heuristics (e.g., local mean) and with optimal observers that have nearly complete knowledge of the local statistical structure of natural images. Human estimates are more accurate than those of simple heuristics, and they match the performance of an optimal observer that knows the local statistical structure of relative intensities (contrasts). This optimal observer predicts the detailed pattern of human estimation errors and hence the results place strong constraints on the underlying neural mechanisms. However, humans do not reach the performance of an optimal observer that knows the local statistical structure of the absolute intensities, which reflect both local relative intensities and local mean intensity. As predicted from a statistical analysis of natural images, human estimation accuracy is negligibly improved by expanding the context from a local patch to the whole image. Our results demonstrate that the human visual system exploits efficiently the statistical structure of natural images.

  14. Morse Code, Scrabble, and the Alphabet

    ERIC Educational Resources Information Center

    Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss

    2004-01-01

    In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…

  15. Computational Modeling of Statistical Learning: Effects of Transitional Probability versus Frequency and Links to Word Learning

    ERIC Educational Resources Information Center

    Mirman, Daniel; Estes, Katharine Graf; Magnuson, James S.

    2010-01-01

    Statistical learning mechanisms play an important role in theories of language acquisition and processing. Recurrent neural network models have provided important insights into how these mechanisms might operate. We examined whether such networks capture two key findings in human statistical learning. In Simulation 1, a simple recurrent network…

  16. A Laboratory Experiment, Based on the Maillard Reaction, Conducted as a Project in Introductory Statistics

    ERIC Educational Resources Information Center

    Kravchuk, Olena; Elliott, Antony; Bhandari, Bhesh

    2005-01-01

    A simple laboratory experiment, based on the Maillard reaction, served as a project in Introductory Statistics for undergraduates in Food Science and Technology. By using the principles of randomization and replication and reflecting on the sources of variation in the experimental data, students reinforced the statistical concepts and techniques…

  17. A Semiparametric Approach for Composite Functional Mapping of Dynamic Quantitative Traits

    PubMed Central

    Yang, Runqing; Gao, Huijiang; Wang, Xin; Zhang, Ji; Zeng, Zhao-Bang; Wu, Rongling

    2007-01-01

    Functional mapping has emerged as a powerful tool for mapping quantitative trait loci (QTL) that control developmental patterns of complex dynamic traits. Original functional mapping has been constructed within the context of simple interval mapping, without consideration of separate multiple linked QTL for a dynamic trait. In this article, we present a statistical framework for mapping QTL that affect dynamic traits by capitalizing on the strengths of functional mapping and composite interval mapping. Within this so-called composite functional-mapping framework, functional mapping models the time-dependent genetic effects of a QTL tested within a marker interval using a biologically meaningful parametric function, whereas composite interval mapping models the time-dependent genetic effects of the markers outside the test interval to control the genome background using a flexible nonparametric approach based on Legendre polynomials. Such a semiparametric framework was formulated by a maximum-likelihood model and implemented with the EM algorithm, allowing for the estimation and the test of the mathematical parameters that define the QTL effects and the regression coefficients of the Legendre polynomials that describe the marker effects. Simulation studies were performed to investigate the statistical behavior of composite functional mapping and compare its advantage in separating multiple linked QTL as compared to functional mapping. We used the new mapping approach to analyze a genetic mapping example in rice, leading to the identification of multiple QTL, some of which are linked on the same chromosome, that control the developmental trajectory of leaf age. PMID:17947431

  18. Path planning in uncertain flow fields using ensemble method

    NASA Astrophysics Data System (ADS)

    Wang, Tong; Le Maître, Olivier P.; Hoteit, Ibrahim; Knio, Omar M.

    2016-10-01

    An ensemble-based approach is developed to conduct optimal path planning in unsteady ocean currents under uncertainty. We focus our attention on two-dimensional steady and unsteady uncertain flows, and adopt a sampling methodology that is well suited to operational forecasts, where an ensemble of deterministic predictions is used to model and quantify uncertainty. In an operational setting, much about dynamics, topography, and forcing of the ocean environment is uncertain. To address this uncertainty, the flow field is parametrized using a finite number of independent canonical random variables with known densities, and the ensemble is generated by sampling these variables. For each of the resulting realizations of the uncertain current field, we predict the path that minimizes the travel time by solving a boundary value problem (BVP), based on the Pontryagin maximum principle. A family of backward-in-time trajectories starting at the end position is used to generate suitable initial values for the BVP solver. This allows us to examine and analyze the performance of the sampling strategy and to develop insight into extensions dealing with general circulation ocean models. In particular, the ensemble method enables us to perform a statistical analysis of travel times and consequently develop a path planning approach that accounts for these statistics. The proposed methodology is tested for a number of scenarios. We first validate our algorithms by reproducing simple canonical solutions, and then demonstrate our approach in more complex flow fields, including idealized, steady and unsteady double-gyre flows.

  19. Statistical bias correction method applied on CMIP5 datasets over the Indian region during the summer monsoon season for climate change applications

    NASA Astrophysics Data System (ADS)

    Prasanna, V.

    2018-01-01

    This study makes use of temperature and precipitation from CMIP5 climate model output for climate change application studies over the Indian region during the summer monsoon season (JJAS). Bias correction of temperature and precipitation from CMIP5 GCM simulation results with respect to observation is discussed in detail. The non-linear statistical bias correction is a suitable bias correction method for climate change data because it is simple and does not add up artificial uncertainties to the impact assessment of climate change scenarios for climate change application studies (agricultural production changes) in the future. The simple statistical bias correction uses observational constraints on the GCM baseline, and the projected results are scaled with respect to the changing magnitude in future scenarios, varying from one model to the other. Two types of bias correction techniques are shown here: (1) a simple bias correction using a percentile-based quantile-mapping algorithm and (2) a simple but improved bias correction method, a cumulative distribution function (CDF; Weibull distribution function)-based quantile-mapping algorithm. This study shows that the percentile-based quantile mapping method gives results similar to the CDF (Weibull)-based quantile mapping method, and both the methods are comparable. The bias correction is applied on temperature and precipitation variables for present climate and future projected data to make use of it in a simple statistical model to understand the future changes in crop production over the Indian region during the summer monsoon season. In total, 12 CMIP5 models are used for Historical (1901-2005), RCP4.5 (2005-2100), and RCP8.5 (2005-2100) scenarios. The climate index from each CMIP5 model and the observed agricultural yield index over the Indian region are used in a regression model to project the changes in the agricultural yield over India from RCP4.5 and RCP8.5 scenarios. The results revealed a better convergence of model projections in the bias corrected data compared to the uncorrected data. The study can be extended to localized regional domains aimed at understanding the changes in the agricultural productivity in the future with an agro-economy or a simple statistical model. The statistical model indicated that the total food grain yield is going to increase over the Indian region in the future, the increase in the total food grain yield is approximately 50 kg/ ha for the RCP4.5 scenario from 2001 until the end of 2100, and the increase in the total food grain yield is approximately 90 kg/ha for the RCP8.5 scenario from 2001 until the end of 2100. There are many studies using bias correction techniques, but this study applies the bias correction technique to future climate scenario data from CMIP5 models and applied it to crop statistics to find future crop yield changes over the Indian region.

  20. Outbreak statistics and scaling laws for externally driven epidemics.

    PubMed

    Singh, Sarabjeet; Myers, Christopher R

    2014-04-01

    Power-law scalings are ubiquitous to physical phenomena undergoing a continuous phase transition. The classic susceptible-infectious-recovered (SIR) model of epidemics is one such example where the scaling behavior near a critical point has been studied extensively. In this system the distribution of outbreak sizes scales as P(n)∼n-3/2 at the critical point as the system size N becomes infinite. The finite-size scaling laws for the outbreak size and duration are also well understood and characterized. In this work, we report scaling laws for a model with SIR structure coupled with a constant force of infection per susceptible, akin to a "reservoir forcing". We find that the statistics of outbreaks in this system fundamentally differ from those in a simple SIR model. Instead of fixed exponents, all scaling laws exhibit tunable exponents parameterized by the dimensionless rate of external forcing. As the external driving rate approaches a critical value, the scale of the average outbreak size converges to that of the maximal size, and above the critical point, the scaling laws bifurcate into two regimes. Whereas a simple SIR process can only exhibit outbreaks of size O(N1/3) and O(N) depending on whether the system is at or above the epidemic threshold, a driven SIR process can exhibit a richer spectrum of outbreak sizes that scale as O(Nξ), where ξ∈(0,1]∖{2/3} and O((N/lnN)2/3) at the multicritical point.

  1. Detection of outliers in the response and explanatory variables of the simple circular regression model

    NASA Astrophysics Data System (ADS)

    Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

    2016-06-01

    The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.

  2. Use of iPhone technology in improving acetabular component position in total hip arthroplasty.

    PubMed

    Tay, Xiau Wei; Zhang, Benny Xu; Gayagay, George

    2017-09-01

    Improper acetabular cup positioning is associated with high risk of complications after total hip arthroplasty. The aim of our study is to objectively compare 3 methods, namely (1) free hand, (2) alignment jig (Sputnik), and (3) iPhone application to identify an easy, reproducible, and accurate method in improving acetabular cup placement. We designed a simple setup and carried out a simple experiment (see Method section). Using statistical analysis, the difference in inclination angles using iPhone application compared with the freehand method was found to be statistically significant ( F [2,51] = 4.17, P = .02) in the "untrained group". There is no statistical significance detected for the other groups. This suggests a potential role for iPhone applications in junior surgeons in overcoming the steep learning curve.

  3. Statistical iterative material image reconstruction for spectral CT using a semi-empirical forward model

    NASA Astrophysics Data System (ADS)

    Mechlem, Korbinian; Ehn, Sebastian; Sellerer, Thorsten; Pfeiffer, Franz; Noël, Peter B.

    2017-03-01

    In spectral computed tomography (spectral CT), the additional information about the energy dependence of attenuation coefficients can be exploited to generate material selective images. These images have found applications in various areas such as artifact reduction, quantitative imaging or clinical diagnosis. However, significant noise amplification on material decomposed images remains a fundamental problem of spectral CT. Most spectral CT algorithms separate the process of material decomposition and image reconstruction. Separating these steps is suboptimal because the full statistical information contained in the spectral tomographic measurements cannot be exploited. Statistical iterative reconstruction (SIR) techniques provide an alternative, mathematically elegant approach to obtaining material selective images with improved tradeoffs between noise and resolution. Furthermore, image reconstruction and material decomposition can be performed jointly. This is accomplished by a forward model which directly connects the (expected) spectral projection measurements and the material selective images. To obtain this forward model, detailed knowledge of the different photon energy spectra and the detector response was assumed in previous work. However, accurately determining the spectrum is often difficult in practice. In this work, a new algorithm for statistical iterative material decomposition is presented. It uses a semi-empirical forward model which relies on simple calibration measurements. Furthermore, an efficient optimization algorithm based on separable surrogate functions is employed. This partially negates one of the major shortcomings of SIR, namely high computational cost and long reconstruction times. Numerical simulations and real experiments show strongly improved image quality and reduced statistical bias compared to projection-based material decomposition.

  4. Evaluating statistical cloud schemes: What can we gain from ground-based remote sensing?

    NASA Astrophysics Data System (ADS)

    Grützun, V.; Quaas, J.; Morcrette, C. J.; Ament, F.

    2013-09-01

    Statistical cloud schemes with prognostic probability distribution functions have become more important in atmospheric modeling, especially since they are in principle scale adaptive and capture cloud physics in more detail. While in theory the schemes have a great potential, their accuracy is still questionable. High-resolution three-dimensional observational data of water vapor and cloud water, which could be used for testing them, are missing. We explore the potential of ground-based remote sensing such as lidar, microwave, and radar to evaluate prognostic distribution moments using the "perfect model approach." This means that we employ a high-resolution weather model as virtual reality and retrieve full three-dimensional atmospheric quantities and virtual ground-based observations. We then use statistics from the virtual observation to validate the modeled 3-D statistics. Since the data are entirely consistent, any discrepancy occurring is due to the method. Focusing on total water mixing ratio, we find that the mean ratio can be evaluated decently but that it strongly depends on the meteorological conditions as to whether the variance and skewness are reliable. Using some simple schematic description of different synoptic conditions, we show how statistics obtained from point or line measurements can be poor at representing the full three-dimensional distribution of water in the atmosphere. We argue that a careful analysis of measurement data and detailed knowledge of the meteorological situation is necessary to judge whether we can use the data for an evaluation of higher moments of the humidity distribution used by a statistical cloud scheme.

  5. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    PubMed

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  6. A weighted generalized score statistic for comparison of predictive values of diagnostic tests

    PubMed Central

    Kosinski, Andrzej S.

    2013-01-01

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343

  7. Correlation and simple linear regression.

    PubMed

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  8. Building gene expression profile classifiers with a simple and efficient rejection option in R.

    PubMed

    Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

    2011-01-01

    The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.

  9. Evaluating and learning from RNA pseudotorsional space: quantitative validation of a reduced representation for RNA structure.

    PubMed

    Wadley, Leven M; Keating, Kevin S; Duarte, Carlos M; Pyle, Anna Marie

    2007-09-28

    Quantitatively describing RNA structure and conformational elements remains a formidable problem. Seven standard torsion angles and the sugar pucker are necessary to characterize the conformation of an RNA nucleotide completely. Progress has been made toward understanding the discrete nature of RNA structure, but classifying simple and ubiquitous structural elements such as helices and motifs remains a difficult task. One approach for describing RNA structure in a simple, mathematically consistent, and computationally accessible manner involves the invocation of two pseudotorsions, eta (C4'(n-1), P(n), C4'(n), P(n+1)) and theta (P(n), C4'(n), P(n+1), C4'(n+1)), which can be used to describe RNA conformation in much the same way that varphi and psi are used to describe backbone configuration of proteins. Here, we conduct an exploration and statistical evaluation of pseudotorsional space and of the Ramachandran-like eta-theta plot. We show that, through the rigorous quantitative analysis of the eta-theta plot, the pseudotorsional descriptors eta and theta, together with sugar pucker, are sufficient to describe RNA backbone conformation fully in most cases. These descriptors are also shown to contain considerable information about nucleotide base conformation, revealing a previously uncharacterized interplay between backbone and base orientation. A window function analysis is used to discern statistically relevant regions of density in the eta-theta scatter plot and then nucleotides in colocalized clusters in the eta-theta plane are shown to have similar 3-D structures through RMSD analysis of the RNA structural constituents. We find that major clusters in the eta-theta plot are few, underscoring the discrete nature of RNA backbone conformation. Like the Ramachandran plot, the eta-theta plot is a valuable system for conceptualizing biomolecular conformation, it is a useful tool for analyzing RNA tertiary structures, and it is a vital component of new approaches for solving the 3-D structures of large RNA molecules and RNA assemblies.

  10. The ambient dose equivalent at flight altitudes: a fit to a large set of data using a Bayesian approach.

    PubMed

    Wissmann, F; Reginatto, M; Möller, T

    2010-09-01

    The problem of finding a simple, generally applicable description of worldwide measured ambient dose equivalent rates at aviation altitudes between 8 and 12 km is difficult to solve due to the large variety of functional forms and parametrisations that are possible. We present an approach that uses Bayesian statistics and Monte Carlo methods to fit mathematical models to a large set of data and to compare the different models. About 2500 data points measured in the periods 1997-1999 and 2003-2006 were used. Since the data cover wide ranges of barometric altitude, vertical cut-off rigidity and phases in the solar cycle 23, we developed functions which depend on these three variables. Whereas the dependence on the vertical cut-off rigidity is described by an exponential, the dependences on barometric altitude and solar activity may be approximated by linear functions in the ranges under consideration. Therefore, a simple Taylor expansion was used to define different models and to investigate the relevance of the different expansion coefficients. With the method presented here, it is possible to obtain probability distributions for each expansion coefficient and thus to extract reliable uncertainties even for the dose rate evaluated. The resulting function agrees well with new measurements made at fixed geographic positions and during long haul flights covering a wide range of latitudes.

  11. The differential impact of the financial crisis on Health in Ireland and Greece: A quasi-experimental approach

    PubMed Central

    Hessel, P; Vandoros, S; Avendano, M

    2015-01-01

    Objectives Greece and Ireland suffered an economic recession of similar magnitude, but whether their health has deteriorated as a result has not yet been well established. Study design Based on five waves (2006-2010) of the European Union Statistics of Income and Living Conditions (EU-SILC) survey we implemented a difference-in-differences (DID) approach that compared trends in self-rated health in Greece and Ireland before and after the crisis with trends in a control population (Poland) that did not experience a recession and had health trends comparable to both countries before the crisis. Methods Logistic regression using a difference-in-differences (DID) approach. Results A simple examination of trends suggests that there was no significant change in health in Greece or Ireland following the onset of the financial crisis. However, DID estimates that incorporated a control population suggest an increase in the prevalence of poor-self rated health in Greece (OR=1.216; CI=1.11 - 1.32). Effects were most pronounced for older individuals and those living in high-density areas, but effects in Greece were overwhelmingly consistent in different population sub-groups. In contrast, DID estimates revealed no effect of the financial crisis in Ireland (OR=0.97; CI=0.81-1.16). Conclusions Contradicting results from a simple comparison of single-country trends, DID estimates suggest that the financial crisis has led to deterioration of population health trends in Greece but not in Ireland, where policies may have prevented a worsening of health as a result of the recent economic crisis. PMID:25369355

  12. Experimental designs and risk assessment in combination toxicology: panel discussion.

    PubMed

    Henschler, D; Bolt, H M; Jonker, D; Pieters, M N; Groten, J P

    1996-01-01

    Advancing our knowledge on the toxicology of combined exposures to chemicals and implementation of this knowledge in guidelines for health risk assessment of such combined exposures are necessities dictated by the simple fact that humans are continuously exposed to a multitude of chemicals. A prerequisite for successful research and fruitful discussions on the toxicology of combined exposures (mixtures of chemicals) is the use of defined terminology implemented by an authoritative international body such as, for example, the International Union of Pure and Applied Chemistry (IUPAC) Toxicology Committee. The extreme complexity of mixture toxicology calls for new research methodologies to study interactive effects, taking into account limited resources. Of these methodologies, statistical designs and mathematical modelling of toxicokinetics and toxicodynamics seem to be most promising. Emphasis should be placed on low-dose modelling and experimental validation. The scientifically sound so-called bottom-up approach should be supplemented with more pragmatic approaches, focusing on selection of the most hazardous chemicals in a mixture and careful consideration of the mode of action and possible interactive effects of these chemicals. Pragmatic approaches may be of particular importance to study and evaluate complex mixtures; after identification of the 'top ten' (most risky) chemicals in the mixture they can be examined and evaluated as a defined (simple) chemical mixture. In setting exposure limits for individual chemicals, the use of an additional safety factor to compensate for potential increased risk due to simultaneous exposure to other chemicals, has no clear scientific justification. The use of such an additional factor is a political rather than a scientific choice.

  13. Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification.

    PubMed

    Spinnato, J; Roubaud, M-C; Burle, B; Torrésani, B

    2015-06-01

    The main goal of this work is to develop a model for multisensor signals, such as magnetoencephalography or electroencephalography (EEG) signals that account for inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI-type experiments. The method involves the linear mixed effects statistical model, wavelet transform, and spatial filtering, and aims at the characterization of localized discriminant features in multisensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e., discriminant) and background noise, using a very simple Gaussian linear mixed model. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. The combination of the linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves upon earlier results on similar problems, and the three main ingredients all play an important role.

  14. Measuring the statistical validity of summary meta‐analysis and meta‐regression results for use in clinical practice

    PubMed Central

    Riley, Richard D.

    2017-01-01

    An important question for clinicians appraising a meta‐analysis is: are the findings likely to be valid in their own practice—does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity—where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple (‘leave‐one‐out’) cross‐validation technique, we demonstrate how we may test meta‐analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta‐analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta‐analysis and a tailored meta‐regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within‐study variance, between‐study variance, study sample size, and the number of studies in the meta‐analysis. Finally, we apply Vn to two published meta‐analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28620945

  15. Evaluation of trace analyte identification in complex matrices by low-resolution gas chromatography--Mass spectrometry through signal simulation.

    PubMed

    Bettencourt da Silva, Ricardo J N

    2016-04-01

    The identification of trace levels of compounds in complex matrices by conventional low-resolution gas chromatography hyphenated with mass spectrometry is based in the comparison of retention times and abundance ratios of characteristic mass spectrum fragments of analyte peaks from calibrators with sample peaks. Statistically sound criteria for the comparison of these parameters were developed based on the normal distribution of retention times and the simulation of possible non-normal distribution of correlated abundances ratios. The confidence level used to set the statistical maximum and minimum limits of parameters defines the true positive rates of identifications. The false positive rate of identification was estimated from worst-case signal noise models. The estimated true and false positive identifications rate from one retention time and two correlated ratios of three fragments abundances were combined using simple Bayes' statistics to estimate the probability of compound identification being correct designated examination uncertainty. Models of the variation of examination uncertainty with analyte quantity allowed the estimation of the Limit of Examination as the lowest quantity that produced "Extremely strong" evidences of compound presence. User friendly MS-Excel files are made available to allow the easy application of developed approach in routine and research laboratories. The developed approach was successfully applied to the identification of chlorpyrifos-methyl and malathion in QuEChERS method extracts of vegetables with high water content for which the estimated Limit of Examination is 0.14 mg kg(-1) and 0.23 mg kg(-1) respectively. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Critical Fluctuations in Cortical Models Near Instability

    PubMed Central

    Aburn, Matthew J.; Holmes, C. A.; Roberts, James A.; Boonstra, Tjeerd W.; Breakspear, Michael

    2012-01-01

    Computational studies often proceed from the premise that cortical dynamics operate in a linearly stable domain, where fluctuations dissipate quickly and show only short memory. Studies of human electroencephalography (EEG), however, have shown significant autocorrelation at time lags on the scale of minutes, indicating the need to consider regimes where non-linearities influence the dynamics. Statistical properties such as increased autocorrelation length, increased variance, power law scaling, and bistable switching have been suggested as generic indicators of the approach to bifurcation in non-linear dynamical systems. We study temporal fluctuations in a widely-employed computational model (the Jansen–Rit model) of cortical activity, examining the statistical signatures that accompany bifurcations. Approaching supercritical Hopf bifurcations through tuning of the background excitatory input, we find a dramatic increase in the autocorrelation length that depends sensitively on the direction in phase space of the input fluctuations and hence on which neuronal subpopulation is stochastically perturbed. Similar dependence on the input direction is found in the distribution of fluctuation size and duration, which show power law scaling that extends over four orders of magnitude at the Hopf bifurcation. We conjecture that the alignment in phase space between the input noise vector and the center manifold of the Hopf bifurcation is directly linked to these changes. These results are consistent with the possibility of statistical indicators of linear instability being detectable in real EEG time series. However, even in a simple cortical model, we find that these indicators may not necessarily be visible even when bifurcations are present because their expression can depend sensitively on the neuronal pathway of incoming fluctuations. PMID:22952464

  17. gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

    PubMed

    Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

    2017-05-01

    Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.

  18. Modeling the Development of Audiovisual Cue Integration in Speech Perception

    PubMed Central

    Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.

    2017-01-01

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558

  19. Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

    PubMed Central

    Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

    2015-01-01

    Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175

  20. Modeling Cross-Situational Word–Referent Learning: Prior Questions

    PubMed Central

    Yu, Chen; Smith, Linda B.

    2013-01-01

    Both adults and young children possess powerful statistical computation capabilities—they can infer the referent of a word from highly ambiguous contexts involving many words and many referents by aggregating cross-situational statistical information across contexts. This ability has been explained by models of hypothesis testing and by models of associative learning. This article describes a series of simulation studies and analyses designed to understand the different learning mechanisms posited by the 2 classes of models and their relation to each other. Variants of a hypothesis-testing model and a simple or dumb associative mechanism were examined under different specifications of information selection, computation, and decision. Critically, these 3 components of the models interact in complex ways. The models illustrate a fundamental tradeoff between amount of data input and powerful computations: With the selection of more information, dumb associative models can mimic the powerful learning that is accomplished by hypothesis-testing models with fewer data. However, because of the interactions among the component parts of the models, the associative model can mimic various hypothesis-testing models, producing the same learning patterns but through different internal components. The simulations argue for the importance of a compositional approach to human statistical learning: the experimental decomposition of the processes that contribute to statistical learning in human learners and models with the internal components that can be evaluated independently and together. PMID:22229490

  1. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    PubMed

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  2. Temporal and spatial intermittencies within Newtonian turbulence

    NASA Astrophysics Data System (ADS)

    Kushwaha, Anubhav; Graham, Michael

    2015-11-01

    Direct numerical simulations of a pressure driven turbulent flow are performed in a large rectangular channel. Intermittent high- and low-drag regimes within turbulence that have earlier been found to exist temporally in minimal channels have been observed both spatially and temporally in full-size turbulent flows. These intermittent regimes, namely, ''active'' and ''hibernating'' turbulence, display very different structural and statistical features. We adopt a very simple sampling technique to identify these intermittent intervals, both temporally and spatially, and present differences between them in terms of simple quantities like mean-velocity, wall-shear stress and flow structures. By conditionally sampling of the low wall-shear stress events in particular, we show that the Maximum Drag Reduction (MDR) velocity profile, that occurs in viscoelastic flows, can also be approached in a Newtonian-fluid flow in the absence of any additives. This suggests that the properties of polymer drag reduction are inherent to all flows and their occurrence is just enhanced by the addition of polymers. We also show how the intermittencies within turbulence vary with Reynolds number. The work was supported by AFOSR grant FA9550-15-1-0062.

  3. Learning to classify in large committee machines

    NASA Astrophysics Data System (ADS)

    O'kane, Dominic; Winther, Ole

    1994-10-01

    The ability of a two-layer neural network to learn a specific non-linearly-separable classification task, the proximity problem, is investigated using a statistical mechanics approach. Both the tree and fully connected architectures are investigated in the limit where the number K of hidden units is large, but still much smaller than the number N of inputs. Both have continuous weights. Within the replica symmetric ansatz, we find that for zero temperature training, the tree architecture exhibits a strong overtraining effect. For nonzero temperature the asymptotic error is lowered, but it is still higher than the corresponding value for the simple perceptron. The fully connected architecture is considered for two regimes. First, for a finite number of examples we find a symmetry among the hidden units as each performs equally well. The asymptotic generalization error is finite, and minimal for T-->∞ where it goes to the same value as for the simple perceptron. For a large number of examples we find a continuous transition to a phase with broken hidden-unit symmetry, which has an asymptotic generalization error equal to zero.

  4. BiDiBlast: comparative genomics pipeline for the PC.

    PubMed

    de Almeida, João M G C F

    2010-06-01

    Bi-directional BLAST is a simple approach to detect, annotate, and analyze candidate orthologous or paralogous sequences in a single go. This procedure is usually confined to the realm of customized Perl scripts, usually tuned for UNIX-like environments. Porting those scripts to other operating systems involves refactoring them, and also the installation of the Perl programming environment with the required libraries. To overcome these limitations, a data pipeline was implemented in Java. This application submits two batches of sequences to local versions of the NCBI BLAST tool, manages result lists, and refines both bi-directional and simple hits. GO Slim terms are attached to hits, several statistics are derived, and molecular evolution rates are estimated through PAML. The results are written to a set of delimited text tables intended for further analysis. The provided graphic user interface allows a friendly interaction with this application, which is documented and available to download at http://moodle.fct.unl.pt/course/view.php?id=2079 or https://sourceforge.net/projects/bidiblast/ under the GNU GPL license. Copyright 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.

  5. DNA curtains for high-throughput single-molecule optical imaging.

    PubMed

    Greene, Eric C; Wind, Shalom; Fazio, Teresa; Gorman, Jason; Visnapuu, Mari-Liis

    2010-01-01

    Single-molecule approaches provide a valuable tool in the arsenal of the modern biologist, and new discoveries continue to be made possible through the use of these state-of-the-art technologies. However, it can be inherently difficult to obtain statistically relevant data from experimental approaches specifically designed to probe individual reactions. This problem is compounded with more complex biochemical reactions, heterogeneous systems, and/or reactions requiring the use of long DNA substrates. Here we give an overview of a technology developed in our laboratory, which relies upon simple micro- or nanofabricated structures in combination with "bio-friendly" lipid bilayers, to align thousands of long DNA molecules into defined patterns on the surface of a microfluidic sample chamber. We call these "DNA curtains," and we have developed several different versions varying in complexity and DNA substrate configuration, which are designed to meet different experimental needs. This novel approach to single-molecule imaging provides a powerful experimental platform that offers the potential for concurrent observation of hundreds or even thousands of protein-DNA interactions in real time. Copyright 2010 Elsevier Inc. All rights reserved.

  6. SPSS and SAS procedures for estimating indirect effects in simple mediation models.

    PubMed

    Preacher, Kristopher J; Hayes, Andrew F

    2004-11-01

    Researchers often conduct mediation analysis in order to indirectly assess the effect of a proposed cause on some outcome through a proposed mediator. The utility of mediation analysis stems from its ability to go beyond the merely descriptive to a more functional understanding of the relationships among variables. A necessary component of mediation is a statistically and practically significant indirect effect. Although mediation hypotheses are frequently explored in psychological research, formal significance tests of indirect effects are rarely conducted. After a brief overview of mediation, we argue the importance of directly testing the significance of indirect effects and provide SPSS and SAS macros that facilitate estimation of the indirect effect with a normal theory approach and a bootstrap approach to obtaining confidence intervals, as well as the traditional approach advocated by Baron and Kenny (1986). We hope that this discussion and the macros will enhance the frequency of formal mediation tests in the psychology literature. Electronic copies of these macros may be downloaded from the Psychonomic Society's Web archive at www.psychonomic.org/archive/.

  7. A Bayesian approach to the characterization of electroencephalographic recordings in premature infants

    NASA Astrophysics Data System (ADS)

    Mitchell, Timothy J.

    Preterm infants are particularly susceptible to cerebral injury, and electroencephalographic (EEG) recordings provide an important diagnostic tool for determining cerebral health. However, interpreting these EEG recordings is challenging and requires the skills of a trained electroencephalographer. Because these EEG specialists are rare, an automated interpretation of newborn EEG recordings would increase access to an important diagnostic tool for physicians. To automate this procedure, we employ a novel Bayesian approach to compute the probability of EEG features (waveforms) including suppression, delta brushes, and delta waves. The power of this approach lies not only in its ability to closely mimic the techniques used by EEG specialists, but also its ability to be generalized to identify other waveforms that may be of interest for future work. The results of these calculations are used in a program designed to output simple statistics related to the presence or absence of such features. Direct comparison of the software with expert human readers has indicated satisfactory performance, and the algorithm has shown promise in its ability to distinguish between infants with normal neurodevelopmental outcome and those with poor neurodevelopmental outcome.

  8. Comparison of hydromorphological assessment methods: Application to the Boise River, USA

    NASA Astrophysics Data System (ADS)

    Benjankar, Rohan; Koenig, Frauke; Tonina, Daniele

    2013-06-01

    Recent national and international legislation (e.g., the European Water Framework Directive) identified the need to quantify the ecological condition of river systems as a critical component for an integrated river management approach. An important defining driver of ecological condition is stream hydromorphology. Several methodologies have been proposed from simple table-based approaches to complex hydraulics-based models. In this paper, three different methods for river hydromorphological assessment are applied to the Boise River, United States of America (USA): (1) the German LAWA overview method (Bund/Laender Arbeitsgemeinschaft Wasser/German Working Group on water issues of the Federal States and the Federal Government represented by the Federal Environment Ministry), (2) a special approach for a hydromorphological assessment of urban rivers and (3) a hydraulic-based method. The hydraulic-based method assessed stream conditions from a statistical analysis of flow properties predicted with hydrodynamic modeling. The investigation focuses on comparing the three methods and defining the transferability of the methods among different contexts, Europe and West United States. It also provides comparison of the hydromorphological conditions of an urban and a rural reaches of the Boise River.

  9. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems.

    PubMed

    Lê Cao, Kim-Anh; Boitard, Simon; Besse, Philippe

    2011-06-22

    Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

  10. Simple Statistics: - Summarized!

    ERIC Educational Resources Information Center

    Blai, Boris, Jr.

    Statistics are an essential tool for making proper judgement decisions. It is concerned with probability distribution models, testing of hypotheses, significance tests and other means of determining the correctness of deductions and the most likely outcome of decisions. Measures of central tendency include the mean, median and mode. A second…

  11. Contrast Analysis: A Tutorial

    ERIC Educational Resources Information Center

    Haans, Antal

    2018-01-01

    Contrast analysis is a relatively simple but effective statistical method for testing theoretical predictions about differences between group means against the empirical data. Despite its advantages, contrast analysis is hardly used to date, perhaps because it is not implemented in a convenient manner in many statistical software packages. This…

  12. Superordinate Shape Classification Using Natural Shape Statistics

    ERIC Educational Resources Information Center

    Wilder, John; Feldman, Jacob; Singh, Manish

    2011-01-01

    This paper investigates the classification of shapes into broad natural categories such as "animal" or "leaf". We asked whether such coarse classifications can be achieved by a simple statistical classification of the shape skeleton. We surveyed databases of natural shapes, extracting shape skeletons and tabulating their…

  13. Modelling unsupervised online-learning of artificial grammars: linking implicit and statistical learning.

    PubMed

    Rohrmeier, Martin A; Cross, Ian

    2014-07-01

    Humans rapidly learn complex structures in various domains. Findings of above-chance performance of some untrained control groups in artificial grammar learning studies raise questions about the extent to which learning can occur in an untrained, unsupervised testing situation with both correct and incorrect structures. The plausibility of unsupervised online-learning effects was modelled with n-gram, chunking and simple recurrent network models. A novel evaluation framework was applied, which alternates forced binary grammaticality judgments and subsequent learning of the same stimulus. Our results indicate a strong online learning effect for n-gram and chunking models and a weaker effect for simple recurrent network models. Such findings suggest that online learning is a plausible effect of statistical chunk learning that is possible when ungrammatical sequences contain a large proportion of grammatical chunks. Such common effects of continuous statistical learning may underlie statistical and implicit learning paradigms and raise implications for study design and testing methodologies. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. A statistical method for measuring activation of gene regulatory networks.

    PubMed

    Esteves, Gustavo H; Reis, Luiz F L

    2018-06-13

    Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.

  15. Contingency and statistical laws in replicate microbial closed ecosystems.

    PubMed

    Hekstra, Doeke R; Leibler, Stanislas

    2012-05-25

    Contingency, the persistent influence of past random events, pervades biology. To what extent, then, is each course of ecological or evolutionary dynamics unique, and to what extent are these dynamics subject to a common statistical structure? Addressing this question requires replicate measurements to search for emergent statistical laws. We establish a readily replicated microbial closed ecosystem (CES), sustaining its three species for years. We precisely measure the local population density of each species in many CES replicates, started from the same initial conditions and kept under constant light and temperature. The covariation among replicates of the three species densities acquires a stable structure, which could be decomposed into discrete eigenvectors, or "ecomodes." The largest ecomode dominates population density fluctuations around the replicate-average dynamics. These fluctuations follow simple power laws consistent with a geometric random walk. Thus, variability in ecological dynamics can be studied with CES replicates and described by simple statistical laws. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. Laparoscopic repair of perforated peptic ulcer: simple closure versus omentopexy.

    PubMed

    Lin, Being-Chuan; Liao, Chien-Hung; Wang, Shang-Yu; Hwang, Tsann-Long

    2017-12-01

    This report presents our experience with laparoscopic repair performed in 118 consecutive patients diagnosed with a perforated peptic ulcer (PPU). We compared the surgical outcome of simple closure with modified Cellan-Jones omentopexy and report the safety and benefit of simple closure. From January 2010 to December 2014, 118 patients with PPU underwent laparoscopic repair with simple closure (n = 27) or omentopexy (n = 91). Charts were retrospectively reviewed for demographic characteristics and outcome. The data were compared by Fisher's exact test, Mann-Whitney U test, Pearson's chi-square test, and the Kruskal-Wallis test. The results were considered statistically significant if P < 0.05. No patients died, whereas three incurred leakage. After matching, the simple closure and omentopexy groups had similarity in sex, systolic blood pressure, pulse rate, respiratory rate, Boey score, Charlson comorbidity index, Mannheim peritonitis index, and leakage. There were statistically significant differences in age, length of hospital stay, perforated size, and operating time. Comparison of the operating time in the ≤4.0 mm and 5.0-12 mm groups revealed that the simple closure took less time than omentopexy in both groups (≤4.0 mm, 76 versus 133 minutes, P < 0.0001; 5.0-12 mm, 97 versus 139.5 minutes; P = 0.006). Compared to the omentopexy, laparoscopic simple closure is a safe procedure and shortens the operating time. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  17. A novel approach to simulate gene-environment interactions in complex diseases.

    PubMed

    Amato, Roberto; Pinelli, Michele; D'Andrea, Daniel; Miele, Gennaro; Nicodemi, Mario; Raiconi, Giancarlo; Cocozza, Sergio

    2010-01-05

    Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones. We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful. By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study.

  18. How different is the time-averaged field from that of a geocentric axial dipole ? Making the best of paleomagnetic directional data using the statistical Giant Gaussian Process approach.

    NASA Astrophysics Data System (ADS)

    Hulot, G.; Khokhlov, A.; Johnson, C. L.

    2012-12-01

    It is well known that the geometry of the recent time-averaged paleomagnetic field (TAF) is very close to that of a geocentric axial dipole (GAD). Yet, numerous numerical dynamo simulations show that some departures from such a simple geometry is to be expected, not least because of the heterogeneous thermal core-mantle boundary conditions that the convecting mantle imposes on the geodynamo. Indeed, many TAF models recovered from averaging lava flow paleomagnetic directional data (the most numerous and reliable of all data) would suggest this is the case. However, assessing the significance of such minor departures from the GAD is particularly challenging, because non-linear directional data are sensitive not only to the time-averaged component of the field, but also to its time fluctuating component, known as the paleosecular variation (PSV). This means that in addition to data errors, PSV also must be taken into account when assessing any lava flow directional data based claims of departures of the TAF from the GAD. Furthermore, because of limited age information for these data , it is necessary to assess departures from the GAD by resorting to a statistical approach. We report recent progress using an approach we have suggested and further developed (Khokhlov et al., Geophysical Journal International, 2001, 2006) to test the compatibility of combined time-averaged (TAF) and paleosecular variation (PSV) field models, against any lava flow paleomagnetic database, asssuming that these TAF and PSV models are defined within the Giant Gaussian Process statistical framework. In particular we will show how sensitive statistical measures of the compatibility of a combined set of TAF and PSV models with a given directional database can be defined. These measures can be used to test published TAF and PSV models with updated 0-5 Ma lava flow paleomagnetic data sets. They also lay the groundwork for designing inverse methods better suited to seek the minimum required departure of the TAF from the GAD.

  19. Intensity correlation-based calibration of FRET.

    PubMed

    Bene, László; Ungvári, Tamás; Fedor, Roland; Sasi Szabó, László; Damjanovich, László

    2013-11-05

    Dual-laser flow cytometric resonance energy transfer (FCET) is a statistically efficient and accurate way of determining proximity relationships for molecules of cells even under living conditions. In the framework of this algorithm, absolute fluorescence resonance energy transfer (FRET) efficiency is determined by the simultaneous measurement of donor-quenching and sensitized emission. A crucial point is the determination of the scaling factor α responsible for balancing the different sensitivities of the donor and acceptor signal channels. The determination of α is not simple, requiring preparation of special samples that are generally different from a double-labeled FRET sample, or by the use of sophisticated statistical estimation (least-squares) procedures. We present an alternative, free-from-spectral-constants approach for the determination of α and the absolute FRET efficiency, by an extension of the presented framework of the FCET algorithm with an analysis of the second moments (variances and covariances) of the detected intensity distributions. A quadratic equation for α is formulated with the intensity fluctuations, which is proved sufficiently robust to give accurate α-values on a cell-by-cell basis in a wide system of conditions using the same double-labeled sample from which the FRET efficiency itself is determined. This seemingly new approach is illustrated by FRET measurements between epitopes of the MHCI receptor on the cell surface of two cell lines, FT and LS174T. The figures show that whereas the common way of α determination fails at large dye-per-protein labeling ratios of mAbs, this presented-as-new approach has sufficient ability to give accurate results. Although introduced in a flow cytometer, the new approach can also be straightforwardly used with fluorescence microscopes. Copyright © 2013 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  20. Monte Carlo Bayesian Inference on a Statistical Model of Sub-Gridcolumn Moisture Variability using High-Resolution Cloud Observations

    NASA Astrophysics Data System (ADS)

    Norris, P. M.; da Silva, A. M., Jr.

    2016-12-01

    Norris and da Silva recently published a method to constrain a statistical model of sub-gridcolumn moisture variability using high-resolution satellite cloud data. The method can be used for large-scale model parameter estimation or cloud data assimilation (CDA). The gridcolumn model includes assumed-PDF intra-layer horizontal variability and a copula-based inter-layer correlation model. The observables used are MODIS cloud-top pressure, brightness temperature and cloud optical thickness, but the method should be extensible to direct cloudy radiance assimilation for a small number of channels. The algorithm is a form of Bayesian inference with a Markov chain Monte Carlo (MCMC) approach to characterizing the posterior distribution. This approach is especially useful in cases where the background state is clear but cloudy observations exist. In traditional linearized data assimilation methods, a subsaturated background cannot produce clouds via any infinitesimal equilibrium perturbation, but the Monte Carlo approach is not gradient-based and allows jumps into regions of non-zero cloud probability. In the example provided, the method is able to restore marine stratocumulus near the Californian coast where the background state has a clear swath. The new approach not only significantly reduces mean and standard deviation biases with respect to the assimilated observables, but also improves the simulated rotational-Ramman scattering cloud optical centroid pressure against independent (non-assimilated) retrievals from the OMI instrument. One obvious difficulty for the method, and other CDA methods, is the lack of information content in passive cloud observables on cloud vertical structure, beyond cloud-top and thickness, thus necessitating strong dependence on the background vertical moisture structure. It is found that a simple flow-dependent correlation modification due to Riishojgaard is helpful, better honoring inversion structures in the background state.

  1. Analysis of femtosecond pump-probe photoelectron-photoion coincidence measurements applying Bayesian probability theory

    NASA Astrophysics Data System (ADS)

    Rumetshofer, M.; Heim, P.; Thaler, B.; Ernst, W. E.; Koch, M.; von der Linden, W.

    2018-06-01

    Ultrafast dynamical processes in photoexcited molecules can be observed with pump-probe measurements, in which information about the dynamics is obtained from the transient signal associated with the excited state. Background signals provoked by pump and/or probe pulses alone often obscure these excited-state signals. Simple subtraction of pump-only and/or probe-only measurements from the pump-probe measurement, as commonly applied, results in a degradation of the signal-to-noise ratio and, in the case of coincidence detection, the danger of overrated background subtraction. Coincidence measurements additionally suffer from false coincidences, requiring long data-acquisition times to keep erroneous signals at an acceptable level. Here we present a probabilistic approach based on Bayesian probability theory that overcomes these problems. For a pump-probe experiment with photoelectron-photoion coincidence detection, we reconstruct the interesting excited-state spectrum from pump-probe and pump-only measurements. This approach allows us to treat background and false coincidences consistently and on the same footing. We demonstrate that the Bayesian formalism has the following advantages over simple signal subtraction: (i) the signal-to-noise ratio is significantly increased, (ii) the pump-only contribution is not overestimated, (iii) false coincidences are excluded, (iv) prior knowledge, such as positivity, is consistently incorporated, (v) confidence intervals are provided for the reconstructed spectrum, and (vi) it is applicable to any experimental situation and noise statistics. Most importantly, by accounting for false coincidences, the Bayesian approach allows us to run experiments at higher ionization rates, resulting in a significant reduction of data acquisition times. The probabilistic approach is thoroughly scrutinized by challenging mock data. The application to pump-probe coincidence measurements on acetone molecules enables quantitative interpretations about the molecular decay dynamics and fragmentation behavior. All results underline the superiority of a consistent probabilistic approach over ad hoc estimations.

  2. Hood of the truck statistics for food animal practitioners.

    PubMed

    Slenning, Barrett D

    2006-03-01

    This article offers some tips on working with statistics and develops four relatively simple procedures to deal with most kinds of data with which veterinarians work. The criterion for a procedure to be a "Hood of the Truck Statistics" (HOT Stats) technique is that it must be simple enough to be done with pencil, paper, and a calculator. The goal of HOT Stats is to have the tools available to run quick analyses in only a few minutes so that decisions can be made in a timely fashion. The discipline allows us to move away from the all-too-common guess work about effects and differences we perceive following a change in treatment or management. The techniques allow us to move toward making more defensible, credible, and more quantifiably "risk-aware" real-time recommendations to our clients.

  3. Multiple-labelling immunoEM using different sizes of colloidal gold: alternative approaches to test for differential distribution and colocalization in subcellular structures.

    PubMed

    Mayhew, Terry M; Lucocq, John M

    2011-03-01

    Various methods for quantifying cellular immunogold labelling on transmission electron microscope thin sections are currently available. All rely on sound random sampling principles and are applicable to single immunolabelling across compartments within a given cell type or between different experimental groups of cells. Although methods are also available to test for colocalization in double/triple immunogold labelling studies, so far, these have relied on making multiple measurements of gold particle densities in defined areas or of inter-particle nearest neighbour distances. Here, we present alternative two-step approaches to codistribution and colocalization assessment that merely require raw counts of gold particles in distinct cellular compartments. For assessing codistribution over aggregate compartments, initial statistical evaluation involves combining contingency table and chi-squared analyses to provide predicted gold particle distributions. The observed and predicted distributions allow testing of the appropriate null hypothesis, namely, that there is no difference in the distribution patterns of proteins labelled by different sizes of gold particle. In short, the null hypothesis is that of colocalization. The approach for assessing colabelling recognises that, on thin sections, a compartment is made up of a set of sectional images (profiles) of cognate structures. The approach involves identifying two groups of compartmental profiles that are unlabelled and labelled for one gold marker size. The proportions in each group that are also labelled for the second gold marker size are then compared. Statistical analysis now uses a 2 × 2 contingency table combined with the Fisher exact probability test. Having identified double labelling, the profiles can be analysed further in order to identify characteristic features that might account for the double labelling. In each case, the approach is illustrated using synthetic and/or experimental datasets and can be refined to correct observed labelling patterns to specific labelling patterns. These simple and efficient approaches should be of more immediate utility to those interested in codistribution and colocalization in multiple immunogold labelling investigations.

  4. Proceedings, Seminar on Probabilistic Methods in Geotechnical Engineering

    NASA Astrophysics Data System (ADS)

    Hynes-Griffin, M. E.; Buege, L. L.

    1983-09-01

    Contents: Applications of Probabilistic Methods in Geotechnical Engineering; Probabilistic Seismic and Geotechnical Evaluation at a Dam Site; Probabilistic Slope Stability Methodology; Probability of Liquefaction in a 3-D Soil Deposit; Probabilistic Design of Flood Levees; Probabilistic and Statistical Methods for Determining Rock Mass Deformability Beneath Foundations: An Overview; Simple Statistical Methodology for Evaluating Rock Mechanics Exploration Data; New Developments in Statistical Techniques for Analyzing Rock Slope Stability.

  5. An instructive model of entropy

    NASA Astrophysics Data System (ADS)

    Zimmerman, Seth

    2010-09-01

    This article first notes the misinterpretation of a common thought experiment, and the misleading comment that 'systems tend to flow from less probable to more probable macrostates'. It analyses the experiment, generalizes it and introduces a new tool of investigation, the simplectic structure. A time-symmetric model is built upon this structure, yielding several non-intuitive results. The approach is combinatorial rather than statistical, and assumes that entropy is equivalent to 'missing information'. The intention of this article is not only to present interesting results, but also, by deliberately starting with a simple example and developing it through proof and computer simulation, to clarify the often confusing subject of entropy. The article should be particularly stimulating to students and instructors of discrete mathematics or undergraduate physics.

  6. Proposal for a new protein therapeutic immunogenicity titer assay cutpoint.

    PubMed

    Wakshull, Eric; Hendricks, Robert; Amaya, Caroline; Coleman, Daniel

    2011-12-01

    Generally, immunogenicity assessment strategies follow this assay triage schema: screen→confirm→titer. Each requires the determination of a threshold value (cutpoint) for decision making. No guidance documents exist for the determination of a specific titration assay cutpoint. The default practice is to use the screening assay cutpoint, frequently leading to controls or samples not reaching this cutpoint. We propose a method for determination of a titration cutpoint based upon the variance of the negative-control sample. Positive-control samples that did not cross a screening cutpoint did cross the titer cutpoint, albeit generating slightly lower titer values. Our approach is consistent with the statistical methods currently recommended for the screening and confirmatory assay cutpoints and is operationally simple and efficient.

  7. Simplicity and complexity

    NASA Astrophysics Data System (ADS)

    Crutchfield, James; Wiesner, Karoline

    2010-02-01

    Is anything ever simple? When confronted with a complicated system, scientists typically strive to identify underlying simplicity, which we articulate as natural laws and fundamental principles. This simplicity is what makes nature appear so organized. Atomic physics, for example, approached a solid theoretical foundation when Niels Bohr uncovered the organization of electronic energy levels, which only later were redescribed as quantum wavefunctions. Charles Darwin's revolutionary idea about the "origin" of species emerged by mapping how species are organized and discovering why they came to be that way. And James Watson and Francis Crick's interpretation of DNA diffraction spectra was a discovery of the structural organization of genetic information - it was neither about the molecule's disorder (thermodynamic entropy) nor about the statistical randomness of its base-pair sequences.

  8. Uncertainties of flood frequency estimation approaches based on continuous simulation using data resampling

    NASA Astrophysics Data System (ADS)

    Arnaud, Patrick; Cantet, Philippe; Odry, Jean

    2017-11-01

    Flood frequency analyses (FFAs) are needed for flood risk management. Many methods exist ranging from classical purely statistical approaches to more complex approaches based on process simulation. The results of these methods are associated with uncertainties that are sometimes difficult to estimate due to the complexity of the approaches or the number of parameters, especially for process simulation. This is the case of the simulation-based FFA approach called SHYREG presented in this paper, in which a rainfall generator is coupled with a simple rainfall-runoff model in an attempt to estimate the uncertainties due to the estimation of the seven parameters needed to estimate flood frequencies. The six parameters of the rainfall generator are mean values, so their theoretical distribution is known and can be used to estimate the generator uncertainties. In contrast, the theoretical distribution of the single hydrological model parameter is unknown; consequently, a bootstrap method is applied to estimate the calibration uncertainties. The propagation of uncertainty from the rainfall generator to the hydrological model is also taken into account. This method is applied to 1112 basins throughout France. Uncertainties coming from the SHYREG method and from purely statistical approaches are compared, and the results are discussed according to the length of the recorded observations, basin size and basin location. Uncertainties of the SHYREG method decrease as the basin size increases or as the length of the recorded flow increases. Moreover, the results show that the confidence intervals of the SHYREG method are relatively small despite the complexity of the method and the number of parameters (seven). This is due to the stability of the parameters and takes into account the dependence of uncertainties due to the rainfall model and the hydrological calibration. Indeed, the uncertainties on the flow quantiles are on the same order of magnitude as those associated with the use of a statistical law with two parameters (here generalised extreme value Type I distribution) and clearly lower than those associated with the use of a three-parameter law (here generalised extreme value Type II distribution). For extreme flood quantiles, the uncertainties are mostly due to the rainfall generator because of the progressive saturation of the hydrological model.

  9. More Powerful Tests of Simple Interaction Contrasts in the Two-Way Factorial Design

    ERIC Educational Resources Information Center

    Hancock, Gregory R.; McNeish, Daniel M.

    2017-01-01

    For the two-way factorial design in analysis of variance, the current article explicates and compares three methods for controlling the Type I error rate for all possible simple interaction contrasts following a statistically significant interaction, including a proposed modification to the Bonferroni procedure that increases the power of…

  10. Quantitation & Case-Study-Driven Inquiry to Enhance Yeast Fermentation Studies

    ERIC Educational Resources Information Center

    Grammer, Robert T.

    2012-01-01

    We propose a procedure for the assay of fermentation in yeast in microcentrifuge tubes that is simple and rapid, permitting assay replicates, descriptive statistics, and the preparation of line graphs that indicate reproducibility. Using regression and simple derivatives to determine initial velocities, we suggest methods to compare the effects of…

  11. A Simple Statistical Thermodynamics Experiment

    ERIC Educational Resources Information Center

    LoPresto, Michael C.

    2010-01-01

    Comparing the predicted and actual rolls of combinations of both two and three dice can help to introduce many of the basic concepts of statistical thermodynamics, including multiplicity, probability, microstates, and macrostates, and demonstrate that entropy is indeed a measure of randomness, that disordered states (those of higher entropy) are…

  12. Standard Entropy of Crystalline Iodine from Vapor Pressure Measurements: A Physical Chemistry Experiment.

    ERIC Educational Resources Information Center

    Harris, Ronald M.

    1978-01-01

    Presents material dealing with an application of statistical thermodynamics to the diatomic solid I-2(s). The objective is to enhance the student's appreciation of the power of the statistical formulation of thermodynamics. The Simple Einstein Model is used. (Author/MA)

  13. Estimating daily climatologies for climate indices derived from climate model data and observations

    PubMed Central

    Mahlstein, Irina; Spirig, Christoph; Liniger, Mark A; Appenzeller, Christof

    2015-01-01

    Climate indices help to describe the past, present, and the future climate. They are usually closer related to possible impacts and are therefore more illustrative to users than simple climate means. Indices are often based on daily data series and thresholds. It is shown that the percentile-based thresholds are sensitive to the method of computation, and so are the climatological daily mean and the daily standard deviation, which are used for bias corrections of daily climate model data. Sample size issues of either the observed reference period or the model data lead to uncertainties in these estimations. A large number of past ensemble seasonal forecasts, called hindcasts, is used to explore these sampling uncertainties and to compare two different approaches. Based on a perfect model approach it is shown that a fitting approach can improve substantially the estimates of daily climatologies of percentile-based thresholds over land areas, as well as the mean and the variability. These improvements are relevant for bias removal in long-range forecasts or predictions of climate indices based on percentile thresholds. But also for climate change studies, the method shows potential for use. Key Points More robust estimates of daily climate characteristics Statistical fitting approach Based on a perfect model approach PMID:26042192

  14. Correlational approach to study interactions between dust Brownian particles in a plasma

    NASA Astrophysics Data System (ADS)

    Lisin, E. A.; Vaulina, O. S.; Petrov, O. F.

    2018-01-01

    A general approach to the correlational analysis of Brownian motion of strongly coupled particles in open dissipative systems is described. This approach can be applied to the theoretical description of various non-ideal statistically equilibrium systems (including non-Hamiltonian systems), as well as for the analysis of experimental data. In this paper, we consider an application of the correlational approach to the problem of experimental exploring the wake-mediated nonreciprocal interactions in complex plasmas. We derive simple analytic equations, which allows one to calculate the gradients of forces acting on a microparticle due to each of other particles as well as the gradients of external field, knowing only the information on time-averaged correlations of particles displacements and velocities. We show the importance of taking dissipative and random processes into account, without which consideration of a system with a nonreciprocal interparticle interaction as linearly coupled oscillators leads to significant errors in determining the characteristic frequencies in a system. In the examples of numerical simulations, we demonstrate that the proposed original approach could be an effective instrument in exploring the longitudinal wake structure of a microparticle in a plasma. Unlike the previous attempts to study the wake-mediated interactions in complex plasmas, our method does not require any external perturbations and is based on Brownian motion analysis only.

  15. Statistics of Epidemics in Networks by Passing Messages

    NASA Astrophysics Data System (ADS)

    Shrestha, Munik Kumar

    Epidemic processes are common out-of-equilibrium phenomena of broad interdisciplinary interest. In this thesis, we show how message-passing approach can be a helpful tool for simulating epidemic models in disordered medium like networks, and in particular for estimating the probability that a given node will become infectious at a particular time. The sort of dynamics we consider are stochastic, where randomness can arise from the stochastic events or from the randomness of network structures. As in belief propagation, variables or messages in message-passing approach are defined on the directed edges of a network. However, unlike belief propagation, where the posterior distributions are updated according to Bayes' rule, in message-passing approach we write differential equations for the messages over time. It takes correlations between neighboring nodes into account while preventing causal signals from backtracking to their immediate source, and thus avoids "echo chamber effects" where a pair of adjacent nodes each amplify the probability that the other is infectious. In our first results, we develop a message-passing approach to threshold models of behavior popular in sociology. These are models, first proposed by Granovetter, where individuals have to hear about a trend or behavior from some number of neighbors before adopting it themselves. In thermodynamic limit of large random networks, we provide an exact analytic scheme while calculating the time dependence of the probabilities and thus learning about the whole dynamics of bootstrap percolation, which is a simple model known in statistical physics for exhibiting discontinuous phase transition. As an application, we apply a similar model to financial networks, studying when bankruptcies spread due to the sudden devaluation of shared assets in overlapping portfolios. We predict that although diversification may be good for individual institutions, it can create dangerous systemic effects, and as a result financial contagion gets worse with too much diversification. We also predict that financial system exhibits "robust yet fragile" behavior, with regions of the parameter space where contagion is rare but catastrophic whenever it occurs. In further results, we develop a message-passing approach to recurrent state epidemics like susceptible-infectious-susceptible and susceptible-infectious-recovered-susceptible where nodes can return to previously inhabited states and multiple waves of infection can pass through the population. Given that message-passing has been applied exclusively to models with one-way state changes like susceptible-infectious and susceptible-infectious-recovered, we develop message-passing for recurrent epidemics based on a new class of differential equations and demonstrate that our approach is simple and efficiently approximates results obtained from Monte Carlo simulation, and that the accuracy of message-passing is often superior to the pair approximation (which also takes second-order correlations into account).

  16. Statistical issues in the design and planning of proteomic profiling experiments.

    PubMed

    Cairns, David A

    2015-01-01

    The statistical design of a clinical proteomics experiment is a critical part of well-undertaken investigation. Standard concepts from experimental design such as randomization, replication and blocking should be applied in all experiments, and this is possible when the experimental conditions are well understood by the investigator. The large number of proteins simultaneously considered in proteomic discovery experiments means that determining the number of required replicates to perform a powerful experiment is more complicated than in simple experiments. However, by using information about the nature of an experiment and making simple assumptions this is achievable for a variety of experiments useful for biomarker discovery and initial validation.

  17. A simple statistical model for geomagnetic reversals

    NASA Technical Reports Server (NTRS)

    Constable, Catherine

    1990-01-01

    The diversity of paleomagnetic records of geomagnetic reversals now available indicate that the field configuration during transitions cannot be adequately described by simple zonal or standing field models. A new model described here is based on statistical properties inferred from the present field and is capable of simulating field transitions like those observed. Some insight is obtained into what one can hope to learn from paleomagnetic records. In particular, it is crucial that the effects of smoothing in the remanence acquisition process be separated from true geomagnetic field behavior. This might enable us to determine the time constants associated with the dominant field configuration during a reversal.

  18. Statistical Properties of Online Auctions

    NASA Astrophysics Data System (ADS)

    Namazi, Alireza; Schadschneider, Andreas

    We characterize the statistical properties of a large number of online auctions run on eBay. Both stationary and dynamic properties, like distributions of prices, number of bids etc., as well as relations between these quantities are studied. The analysis of the data reveals surprisingly simple distributions and relations, typically of power-law form. Based on these findings we introduce a simple method to identify suspicious auctions that could be influenced by a form of fraud known as shill bidding. Furthermore the influence of bidding strategies is discussed. The results indicate that the observed behavior is related to a mixture of agents using a variety of strategies.

  19. A Simple Graphical Method for Quantification of Disaster Management Surge Capacity Using Computer Simulation and Process-control Tools.

    PubMed

    Franc, Jeffrey Michael; Ingrassia, Pier Luigi; Verde, Manuela; Colombo, Davide; Della Corte, Francesco

    2015-02-01

    Surge capacity, or the ability to manage an extraordinary volume of patients, is fundamental for hospital management of mass-casualty incidents. However, quantification of surge capacity is difficult and no universal standard for its measurement has emerged, nor has a standardized statistical method been advocated. As mass-casualty incidents are rare, simulation may represent a viable alternative to measure surge capacity. Hypothesis/Problem The objective of the current study was to develop a statistical method for the quantification of surge capacity using a combination of computer simulation and simple process-control statistical tools. Length-of-stay (LOS) and patient volume (PV) were used as metrics. The use of this method was then demonstrated on a subsequent computer simulation of an emergency department (ED) response to a mass-casualty incident. In the derivation phase, 357 participants in five countries performed 62 computer simulations of an ED response to a mass-casualty incident. Benchmarks for ED response were derived from these simulations, including LOS and PV metrics for triage, bed assignment, physician assessment, and disposition. In the application phase, 13 students of the European Master in Disaster Medicine (EMDM) program completed the same simulation scenario, and the results were compared to the standards obtained in the derivation phase. Patient-volume metrics included number of patients to be triaged, assigned to rooms, assessed by a physician, and disposed. Length-of-stay metrics included median time to triage, room assignment, physician assessment, and disposition. Simple graphical methods were used to compare the application phase group to the derived benchmarks using process-control statistical tools. The group in the application phase failed to meet the indicated standard for LOS from admission to disposition decision. This study demonstrates how simulation software can be used to derive values for objective benchmarks of ED surge capacity using PV and LOS metrics. These objective metrics can then be applied to other simulation groups using simple graphical process-control tools to provide a numeric measure of surge capacity. Repeated use in simulations of actual EDs may represent a potential means of objectively quantifying disaster management surge capacity. It is hoped that the described statistical method, which is simple and reusable, will be useful for investigators in this field to apply to their own research.

  20. Recruitment of prefrontal-striatal circuit in response to skilled motor challenge.

    PubMed

    Guo, Yumei; Wang, Zhuo; Prathap, Sandhya; Holschneider, Daniel P

    2017-12-13

    A variety of physical fitness regimens have been shown to improve cognition, including executive function, yet our understanding of which parameters of motor training are important in optimizing outcomes remains limited. We used functional brain mapping to compare the ability of two motor challenges to acutely recruit the prefrontal-striatal circuit. The two motor tasks - walking in a complex running wheel with irregularly spaced rungs or walking in a running wheel with a smooth internal surface - differed only in the extent of skill required for their execution. Cerebral perfusion was mapped in rats by intravenous injection of [C]-iodoantipyrine during walking in either a motorized complex wheel or in a simple wheel. Regional cerebral blood flow (rCBF) was quantified by whole-brain autoradiography and analyzed in three-dimensional reconstructed brains by statistical parametric mapping and seed-based functional connectivity. Skilled or simple walking compared with rest, increased rCBF in regions of the motor circuit, somatosensory and visual cortex, as well as the hippocampus. Significantly greater rCBF increases were noted during skilled walking than for simple walking. Skilled walking, unlike simple walking or the resting condition, was associated with a significant positive functional connectivity in the prefrontal-striatal circuit (prelimbic cortex-dorsomedial striatum) and greater negative functional connectivity in the prefrontal-hippocampal circuit. Our findings suggest that the level of skill of a motor training task determines the extent of functional recruitment of the prefrontal-corticostriatal circuit, with implications for a new approach in neurorehabilitation that uses circuit-specific neuroplasticity to improve motor and cognitive functions.

  1. Biofilm development in fixed bed biofilm reactors: experiments and simple models for engineering design purposes.

    PubMed

    Szilágyi, N; Kovács, R; Kenyeres, I; Csikor, Zs

    2013-01-01

    Biofilm development in a fixed bed biofilm reactor system performing municipal wastewater treatment was monitored aiming at accumulating colonization and maximum biofilm mass data usable in engineering practice for process design purposes. Initially a 6 month experimental period was selected for investigations where the biofilm formation and the performance of the reactors were monitored. The results were analyzed by two methods: for simple, steady-state process design purposes the maximum biofilm mass on carriers versus influent load and a time constant of the biofilm growth were determined, whereas for design approaches using dynamic models a simple biofilm mass prediction model including attachment and detachment mechanisms was selected and fitted to the experimental data. According to a detailed statistical analysis, the collected data have not allowed us to determine both the time constant of biofilm growth and the maximum biofilm mass on carriers at the same time. The observed maximum biofilm mass could be determined with a reasonable error and ranged between 438 gTS/m(2) carrier surface and 843 gTS/m(2), depending on influent load, and hydrodynamic conditions. The parallel analysis of the attachment-detachment model showed that the experimental data set allowed us to determine the attachment rate coefficient which was in the range of 0.05-0.4 m d(-1) depending on influent load and hydrodynamic conditions.

  2. Airtightness the simple(CS) way

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andrews, S.

    Builders who might buck against such time consuming air sealing methods as polyethylene wrap and the airtight drywall approach (ADA) may respond better to current strategies. One such method, called SimpleCS, has proven especially effective. SimpleCS, pronounced simplex, stands for simple caulk and seal. A modification of the ADA, SimpleCS is an air-sealing management tool, a simplified systems approach to building tight homes. The system address the crucial question of when and by whom various air sealing steps should be done. It avoids the problems that often occur when later contractors cut open polyethylene wrap to drill holes in themore » drywall. The author describes how SimpleCS works, and the cost and training involved.« less

  3. SU-F-J-217: Accurate Dose Volume Parameters Calculation for Revealing Rectum Dose-Toxicity Effect Using Deformable Registration in Cervical Cancer Brachytherapy: A Pilot Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhen, X; Chen, H; Liao, Y

    Purpose: To study the feasibility of employing deformable registration methods for accurate rectum dose volume parameters calculation and their potentials in revealing rectum dose-toxicity between complication and non-complication cervical cancer patients with brachytherapy treatment. Method and Materials: Data from 60 patients treated with BT including planning images, treatment plans, and follow-up clinical exam were retrospectively collected. Among them, 12 patients complained about hematochezia were further examined with colonoscopy and scored as Grade 1–3 complication (CP). Meanwhile, another 12 non-complication (NCP) patients were selected as a reference group. To seek for potential gains in rectum toxicity prediction when fractional anatomical deformationsmore » are account for, the rectum dose volume parameters D0.1/1/2cc of the selected patients were retrospectively computed by three different approaches: the simple “worstcase scenario” (WS) addition method, an intensity-based deformable image registration (DIR) algorithm-Demons, and a more accurate, recent developed local topology preserved non-rigid point matching algorithm (TOP). Statistical significance of the differences between rectum doses of the CP group and the NCP group were tested by a two-tailed t-test and results were considered to be statistically significant if p < 0.05. Results: For the D0.1cc, no statistical differences are found between the CP and NCP group in all three methods. For the D1cc, dose difference is not detected by the WS method, however, statistical differences between the two groups are observed by both Demons and TOP, and more evident in TOP. For the D2cc, the CP and NCP cases are statistically significance of the difference for all three methods but more pronounced with TOP. Conclusion: In this study, we calculated the rectum D0.1/1/2cc by simple WS addition and two DIR methods and seek for gains in rectum toxicity prediction. The results favor the claim that accurate dose deformation and summation tend to be more sensitive in unveiling the dose-toxicity relationship. This work is supported in part by grant from VARIAN MEDICAL SYSTEMS INC, the National Natural Science Foundation of China (no 81428019 and no 81301940), the Guangdong Natural Science Foundation (2015A030313302)and the 2015 Pearl River S&T Nova Program of Guangzhou (201506010096).« less

  4. A joint source-channel distortion model for JPEG compressed images.

    PubMed

    Sabir, Muhammad F; Sheikh, Hamid Rahim; Heath, Robert W; Bovik, Alan C

    2006-06-01

    The need for efficient joint source-channel coding (JSCC) is growing as new multimedia services are introduced in commercial wireless communication systems. An important component of practical JSCC schemes is a distortion model that can predict the quality of compressed digital multimedia such as images and videos. The usual approach in the JSCC literature for quantifying the distortion due to quantization and channel errors is to estimate it for each image using the statistics of the image for a given signal-to-noise ratio (SNR). This is not an efficient approach in the design of real-time systems because of the computational complexity. A more useful and practical approach would be to design JSCC techniques that minimize average distortion for a large set of images based on some distortion model rather than carrying out per-image optimizations. However, models for estimating average distortion due to quantization and channel bit errors in a combined fashion for a large set of images are not available for practical image or video coding standards employing entropy coding and differential coding. This paper presents a statistical model for estimating the distortion introduced in progressive JPEG compressed images due to quantization and channel bit errors in a joint manner. Statistical modeling of important compression techniques such as Huffman coding, differential pulse-coding modulation, and run-length coding are included in the model. Examples show that the distortion in terms of peak signal-to-noise ratio (PSNR) can be predicted within a 2-dB maximum error over a variety of compression ratios and bit-error rates. To illustrate the utility of the proposed model, we present an unequal power allocation scheme as a simple application of our model. Results show that it gives a PSNR gain of around 6.5 dB at low SNRs, as compared to equal power allocation.

  5. Statistical analysis for improving data precision in the SPME GC-MS analysis of blackberry (Rubus ulmifolius Schott) volatiles.

    PubMed

    D'Agostino, M F; Sanz, J; Martínez-Castro, I; Giuffrè, A M; Sicari, V; Soria, A C

    2014-07-01

    Statistical analysis has been used for the first time to evaluate the dispersion of quantitative data in the solid-phase microextraction (SPME) followed by gas chromatography-mass spectrometry (GC-MS) analysis of blackberry (Rubus ulmifolius Schott) volatiles with the aim of improving their precision. Experimental and randomly simulated data were compared using different statistical parameters (correlation coefficients, Principal Component Analysis loadings and eigenvalues). Non-random factors were shown to significantly contribute to total dispersion; groups of volatile compounds could be associated with these factors. A significant improvement of precision was achieved when considering percent concentration ratios, rather than percent values, among those blackberry volatiles with a similar dispersion behavior. As novelty over previous references, and to complement this main objective, the presence of non-random dispersion trends in data from simple blackberry model systems was evidenced. Although the influence of the type of matrix on data precision was proved, the possibility of a better understanding of the dispersion patterns in real samples was not possible from model systems. The approach here used was validated for the first time through the multicomponent characterization of Italian blackberries from different harvest years. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. Data series embedding and scale invariant statistics.

    PubMed

    Michieli, I; Medved, B; Ristov, S

    2010-06-01

    Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated. (c) 2009 Elsevier B.V. All rights reserved.

  7. Statistical image reconstruction from correlated data with applications to PET

    PubMed Central

    Alessio, Adam; Sauer, Ken; Kinahan, Paul

    2008-01-01

    Most statistical reconstruction methods for emission tomography are designed for data modeled as conditionally independent Poisson variates. In reality, due to scanner detectors, electronics and data processing, correlations are introduced into the data resulting in dependent variates. In general, these correlations are ignored because they are difficult to measure and lead to computationally challenging statistical reconstruction algorithms. This work addresses the second concern, seeking to simplify the reconstruction of correlated data and provide a more precise image estimate than the conventional independent methods. In general, correlated variates have a large non-diagonal covariance matrix that is computationally challenging to use as a weighting term in a reconstruction algorithm. This work proposes two methods to simplify the use of a non-diagonal covariance matrix as the weighting term by (a) limiting the number of dimensions in which the correlations are modeled and (b) adopting flexible, yet computationally tractable, models for correlation structure. We apply and test these methods with simple simulated PET data and data processed with the Fourier rebinning algorithm which include the one-dimensional correlations in the axial direction and the two-dimensional correlations in the transaxial directions. The methods are incorporated into a penalized weighted least-squares 2D reconstruction and compared with a conventional maximum a posteriori approach. PMID:17921576

  8. Manufacturing Squares: An Integrative Statistical Process Control Exercise

    ERIC Educational Resources Information Center

    Coy, Steven P.

    2016-01-01

    In the exercise, students in a junior-level operations management class are asked to manufacture a simple product. Given product specifications, they must design a production process, create roles and design jobs for each team member, and develop a statistical process control plan that efficiently and effectively controls quality during…

  9. PEOPLE IN PHYSICS: Nobel prize winners in physics from 1901 to 1990: simple statistics for physics teachers

    NASA Astrophysics Data System (ADS)

    Zhang, Weijia; Fuller, Robert G.

    1998-05-01

    A demographic database for the 139 Nobel prize winners in physics from 1901 to 1990 has been created from a variety of sources. The results of our statistical study are discussed in the light of the implications for physics teaching.

  10. Teaching Statistics with Minitab II.

    ERIC Educational Resources Information Center

    Ryan, T. A., Jr.; And Others

    Minitab is a statistical computing system which uses simple language, produces clear output, and keeps track of bookkeeping automatically. Error checking with English diagnostics and inclusion of several default options help to facilitate use of the system by students. Minitab II is an improved and expanded version of the original Minitab which…

  11. Applying Descriptive Statistics to Teaching the Regional Classification of Climate.

    ERIC Educational Resources Information Center

    Lindquist, Peter S.; Hammel, Daniel J.

    1998-01-01

    Describes an exercise for college and high school students that relates descriptive statistics to the regional climatic classification. The exercise introduces students to simple calculations of central tendency and dispersion, the construction and interpretation of scatterplots, and the definition of climatic regions. Forces students to engage…

  12. School District Enrollment Projections: A Comparison of Three Methods.

    ERIC Educational Resources Information Center

    Pettibone, Timothy J.; Bushan, Latha

    This study assesses three methods of forecasting school enrollments: the cohort-sruvival method (grade progression), the statistical forecasting procedure developed by the Statistical Analysis System (SAS) Institute, and a simple ratio computation. The three methods were used to forecast school enrollments for kindergarten through grade 12 in a…

  13. Replica and extreme-value analysis of the Jarzynski free-energy estimator

    NASA Astrophysics Data System (ADS)

    Palassini, Matteo; Ritort, Felix

    2008-03-01

    We analyze the Jarzynski estimator of free-energy differences from nonequilibrium work measurements. By a simple mapping onto Derrida's Random Energy Model, we obtain a scaling limit for the expectation of the bias of the estimator. We then derive analytical approximations in three different regimes of the scaling parameter x = log(N)/W, where N is the number of measurements and W the mean dissipated work. Our approach is valid for a generic distribution of the dissipated work, and is based on a replica symmetry breaking scheme for x >> 1, the asymptotic theory of extreme value statistics for x << 1, and a direct approach for x near one. The combination of the three analytic approximations describes well Monte Carlo data for the expectation value of the estimator, for a wide range of values of N, from N=1 to large N, and for different work distributions. Based on these results, we introduce improved free-energy estimators and discuss the application to the analysis of experimental data.

  14. Towards the estimation of effect measures in studies using respondent-driven sampling.

    PubMed

    Rotondi, Michael A

    2014-06-01

    Respondent-driven sampling (RDS) is an increasingly common sampling technique to recruit hidden populations. Statistical methods for RDS are not straightforward due to the correlation between individual outcomes and subject weighting; thus, analyses are typically limited to estimation of population proportions. This manuscript applies the method of variance estimates recovery (MOVER) to construct confidence intervals for effect measures such as risk difference (difference of proportions) or relative risk in studies using RDS. To illustrate the approach, MOVER is used to construct confidence intervals for differences in the prevalence of demographic characteristics between an RDS study and convenience study of injection drug users. MOVER is then applied to obtain a confidence interval for the relative risk between education levels and HIV seropositivity and current infection with syphilis, respectively. This approach provides a simple method to construct confidence intervals for effect measures in RDS studies. Since it only relies on a proportion and appropriate confidence limits, it can also be applied to previously published manuscripts.

  15. Competing risk models in reliability systems, an exponential distribution model with Bayesian analysis approach

    NASA Astrophysics Data System (ADS)

    Iskandar, I.

    2018-03-01

    The exponential distribution is the most widely used reliability analysis. This distribution is very suitable for representing the lengths of life of many cases and is available in a simple statistical form. The characteristic of this distribution is a constant hazard rate. The exponential distribution is the lower rank of the Weibull distributions. In this paper our effort is to introduce the basic notions that constitute an exponential competing risks model in reliability analysis using Bayesian analysis approach and presenting their analytic methods. The cases are limited to the models with independent causes of failure. A non-informative prior distribution is used in our analysis. This model describes the likelihood function and follows with the description of the posterior function and the estimations of the point, interval, hazard function, and reliability. The net probability of failure if only one specific risk is present, crude probability of failure due to a specific risk in the presence of other causes, and partial crude probabilities are also included.

  16. Measuring multivariate association and beyond

    PubMed Central

    Josse, Julie; Holmes, Susan

    2017-01-01

    Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association’s underlying patterns. This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research. PMID:29081877

  17. A maximum entropy thermodynamics of small systems.

    PubMed

    Dixit, Purushottam D

    2013-05-14

    We present a maximum entropy approach to analyze the state space of a small system in contact with a large bath, e.g., a solvated macromolecular system. For the solute, the fluctuations around the mean values of observables are not negligible and the probability distribution P(r) of the state space depends on the intricate details of the interaction of the solute with the solvent. Here, we employ a superstatistical approach: P(r) is expressed as a marginal distribution summed over the variation in β, the inverse temperature of the solute. The joint distribution P(β, r) is estimated by maximizing its entropy. We also calculate the first order system-size corrections to the canonical ensemble description of the state space. We test the development on a simple harmonic oscillator interacting with two baths with very different chemical identities, viz., (a) Lennard-Jones particles and (b) water molecules. In both cases, our method captures the state space of the oscillator sufficiently well. Future directions and connections with traditional statistical mechanics are discussed.

  18. A Patch-Based Approach for the Segmentation of Pathologies: Application to Glioma Labelling.

    PubMed

    Cordier, Nicolas; Delingette, Herve; Ayache, Nicholas

    2016-04-01

    In this paper, we describe a novel and generic approach to address fully-automatic segmentation of brain tumors by using multi-atlas patch-based voting techniques. In addition to avoiding the local search window assumption, the conventional patch-based framework is enhanced through several simple procedures: an improvement of the training dataset in terms of both label purity and intensity statistics, augmented features to implicitly guide the nearest-neighbor-search, multi-scale patches, invariance to cube isometries, stratification of the votes with respect to cases and labels. A probabilistic model automatically delineates regions of interest enclosing high-probability tumor volumes, which allows the algorithm to achieve highly competitive running time despite minimal processing power and resources. This method was evaluated on Multimodal Brain Tumor Image Segmentation challenge datasets. State-of-the-art results are achieved, with a limited learning stage thus restricting the risk of overfit. Moreover, segmentation smoothness does not involve any post-processing.

  19. Appropriate Statistics for Determining Chance-Removed Interpractitioner Agreement.

    PubMed

    Popplewell, Michael; Reizes, John; Zaslawski, Chris

    2018-05-31

    Fleiss' Kappa (FK) has been commonly, but incorrectly, employed as the "standard" for evaluating chance-removed inter-rater agreement with ordinal data. This practice may lead to misleading conclusions in inter-rater agreement research. An example is presented that demonstrates the conditions where FK produces inappropriate results, compared with Gwet's AC2, which is proposed as a more appropriate statistic. A novel format for recording a Chinese Medical (CM) diagnoses, called the Diagnostic System of Oriental Medicine (DSOM), was used to record and compare patient diagnostic data, which, unlike the contemporary CM diagnostic format, allows agreement by chance to be considered when evaluating patient data obtained with unrestricted diagnostic options available to diagnosticians. Five CM practitioners diagnosed 42 subjects drawn from an open population. Subjects' diagnoses were recorded using the DSOM format. All the available data were initially used to evaluate agreement. Then, the subjects were sorted into three groups to demonstrate the effects of differing data marginality on the calculated chance-removed agreement. Agreement between the practitioners for each subject was evaluated with linearly weighted simple agreement, FK and Gwet's AC2. In all cases, overall agreement was much lower with FK than Gwet's AC2. Larger differences occurred when the data were more free marginal. Inter-rater agreement determined with FK statistics is unlikely to be correct unless it can be shown that the data from which agreement is determined are, in fact, fixed marginal. It follows that results obtained on agreement between practitioners with FK are probably incorrect. It is shown that inter-rater agreement evaluated with AC2 statistic is an appropriate measure when fixed marginal data are neither expected nor guaranteed. The AC2 statistic should be used as the standard statistical approach for determining agreement between practitioners.

  20. Statistical fluctuations in pedestrian evacuation times and the effect of social contagion

    NASA Astrophysics Data System (ADS)

    Nicolas, Alexandre; Bouzat, Sebastián; Kuperman, Marcelo N.

    2016-08-01

    Mathematical models of pedestrian evacuation and the associated simulation software have become essential tools for the assessment of the safety of public facilities and buildings. While a variety of models is now available, their calibration and test against empirical data are generally restricted to global averaged quantities; the statistics compiled from the time series of individual escapes ("microscopic" statistics) measured in recent experiments are thus overlooked. In the same spirit, much research has primarily focused on the average global evacuation time, whereas the whole distribution of evacuation times over some set of realizations should matter. In the present paper we propose and discuss the validity of a simple relation between this distribution and the microscopic statistics, which is theoretically valid in the absence of correlations. To this purpose, we develop a minimal cellular automaton, with features that afford a semiquantitative reproduction of the experimental microscopic statistics. We then introduce a process of social contagion of impatient behavior in the model and show that the simple relation under test may dramatically fail at high contagion strengths, the latter being responsible for the emergence of strong correlations in the system. We conclude with comments on the potential practical relevance for safety science of calculations based on microscopic statistics.

  1. Learning predictive statistics from temporal sequences: Dynamics and strategies.

    PubMed

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-10-01

    Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.

  2. An integrated logit model for contamination event detection in water distribution systems.

    PubMed

    Housh, Mashor; Ostfeld, Avi

    2015-05-15

    The problem of contamination event detection in water distribution systems has become one of the most challenging research topics in water distribution systems analysis. Current attempts for event detection utilize a variety of approaches including statistical, heuristics, machine learning, and optimization methods. Several existing event detection systems share a common feature in which alarms are obtained separately for each of the water quality indicators. Unifying those single alarms from different indicators is usually performed by means of simple heuristics. A salient feature of the current developed approach is using a statistically oriented model for discrete choice prediction which is estimated using the maximum likelihood method for integrating the single alarms. The discrete choice model is jointly calibrated with other components of the event detection system framework in a training data set using genetic algorithms. The fusing process of each indicator probabilities, which is left out of focus in many existing event detection system models, is confirmed to be a crucial part of the system which could be modelled by exploiting a discrete choice model for improving its performance. The developed methodology is tested on real water quality data, showing improved performances in decreasing the number of false positive alarms and in its ability to detect events with higher probabilities, compared to previous studies. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. The Taguchi methodology as a statistical tool for biotechnological applications: a critical appraisal.

    PubMed

    Rao, Ravella Sreenivas; Kumar, C Ganesh; Prakasham, R Shetty; Hobbs, Phil J

    2008-04-01

    Success in experiments and/or technology mainly depends on a properly designed process or product. The traditional method of process optimization involves the study of one variable at a time, which requires a number of combinations of experiments that are time, cost and labor intensive. The Taguchi method of design of experiments is a simple statistical tool involving a system of tabulated designs (arrays) that allows a maximum number of main effects to be estimated in an unbiased (orthogonal) fashion with a minimum number of experimental runs. It has been applied to predict the significant contribution of the design variable(s) and the optimum combination of each variable by conducting experiments on a real-time basis. The modeling that is performed essentially relates signal-to-noise ratio to the control variables in a 'main effect only' approach. This approach enables both multiple response and dynamic problems to be studied by handling noise factors. Taguchi principles and concepts have made extensive contributions to industry by bringing focused awareness to robustness, noise and quality. This methodology has been widely applied in many industrial sectors; however, its application in biological sciences has been limited. In the present review, the application and comparison of the Taguchi methodology has been emphasized with specific case studies in the field of biotechnology, particularly in diverse areas like fermentation, food processing, molecular biology, wastewater treatment and bioremediation.

  4. Linking climate projections to performance: A yield-based decision scaling assessment of a large urban water resources system

    NASA Astrophysics Data System (ADS)

    Turner, Sean W. D.; Marlow, David; Ekström, Marie; Rhodes, Bruce G.; Kularathna, Udaya; Jeffrey, Paul J.

    2014-04-01

    Despite a decade of research into climate change impacts on water resources, the scientific community has delivered relatively few practical methodological developments for integrating uncertainty into water resources system design. This paper presents an application of the "decision scaling" methodology for assessing climate change impacts on water resources system performance and asks how such an approach might inform planning decisions. The decision scaling method reverses the conventional ethos of climate impact assessment by first establishing the climate conditions that would compel planners to intervene. Climate model projections are introduced at the end of the process to characterize climate risk in such a way that avoids the process of propagating those projections through hydrological models. Here we simulated 1000 multisite synthetic monthly streamflow traces in a model of the Melbourne bulk supply system to test the sensitivity of system performance to variations in streamflow statistics. An empirical relation was derived to convert decision-critical flow statistics to climatic units, against which 138 alternative climate projections were plotted and compared. We defined the decision threshold in terms of a system yield metric constrained by multiple performance criteria. Our approach allows for fast and simple incorporation of demand forecast uncertainty and demonstrates the reach of the decision scaling method through successful execution in a large and complex water resources system. Scope for wider application in urban water resources planning is discussed.

  5. Distinguishing transient signals and instrumental disturbances in semi-coherent searches for continuous gravitational waves with line-robust statistics

    NASA Astrophysics Data System (ADS)

    Keitel, David

    2016-05-01

    Non-axisymmetries in rotating neutron stars emit quasi-monochromatic gravitational waves. These long-duration ‘continuous wave’ signals are among the main search targets of ground-based interferometric detectors. However, standard detection methods are susceptible to false alarms from instrumental artefacts that resemble a continuous-wave signal. Past work [Keitel, Prix, Papa, Leaci and Siddiqi 2014, Phys. Rev. D 89 064023] showed that a Bayesian approach, based on an explicit model of persistent single-detector disturbances, improves robustness against such artefacts. Since many strong outliers in semi-coherent searches of LIGO data are caused by transient disturbances that last only a few hours or days, I describe in a recent paper [Keitel D 2015, LIGO-P1500159] how to extend this approach to cover transient disturbances, and demonstrate increased sensitivity in realistic simulated data. Additionally, neutron stars could emit transient signals which, for a limited time, also follow the continuous-wave signal model. As a pragmatic alternative to specialized transient searches, I demonstrate how to make standard semi-coherent continuous-wave searches more sensitive to transient signals. Focusing on the time-scale of a single segment in the semi-coherent search, Bayesian model selection yields a simple detection statistic without a significant increase in computational cost. This proceedings contribution gives a brief overview of both works.

  6. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    NASA Astrophysics Data System (ADS)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  7. Demonstrating microbial co-occurrence pattern analyses within and between ecosystems

    PubMed Central

    Williams, Ryan J.; Howe, Adina; Hofmockel, Kirsten S.

    2014-01-01

    Co-occurrence patterns are used in ecology to explore interactions between organisms and environmental effects on coexistence within biological communities. Analysis of co-occurrence patterns among microbial communities has ranged from simple pairwise comparisons between all community members to direct hypothesis testing between focal species. However, co-occurrence patterns are rarely studied across multiple ecosystems or multiple scales of biological organization within the same study. Here we outline an approach to produce co-occurrence analyses that are focused at three different scales: co-occurrence patterns between ecosystems at the community scale, modules of co-occurring microorganisms within communities, and co-occurring pairs within modules that are nested within microbial communities. To demonstrate our co-occurrence analysis approach, we gathered publicly available 16S rRNA amplicon datasets to compare and contrast microbial co-occurrence at different taxonomic levels across different ecosystems. We found differences in community composition and co-occurrence that reflect environmental filtering at the community scale and consistent pairwise occurrences that may be used to infer ecological traits about poorly understood microbial taxa. However, we also found that conclusions derived from applying network statistics to microbial relationships can vary depending on the taxonomic level chosen and criteria used to build co-occurrence networks. We present our statistical analysis and code for public use in analysis of co-occurrence patterns across microbial communities. PMID:25101065

  8. A Statistical Test for Identifying the Number of Creep Regimes When Using the Wilshire Equations for Creep Property Predictions

    NASA Astrophysics Data System (ADS)

    Evans, Mark

    2016-12-01

    A new parametric approach, termed the Wilshire equations, offers the realistic potential of being able to accurately lift materials operating at in-service conditions from accelerated test results lasting no more than 5000 hours. The success of this approach can be attributed to a well-defined linear relationship that appears to exist between various creep properties and a log transformation of the normalized stress. However, these linear trends are subject to discontinuities, the number of which appears to differ from material to material. These discontinuities have until now been (1) treated as abrupt in nature and (2) identified by eye from an inspection of simple graphical plots of the data. This article puts forward a statistical test for determining the correct number of discontinuities present within a creep data set and a method for allowing these discontinuities to occur more gradually, so that the methodology is more in line with the accepted view as to how creep mechanisms evolve with changing test conditions. These two developments are fully illustrated using creep data sets on two steel alloys. When these new procedures are applied to these steel alloys, not only do they produce more accurate and realistic looking long-term predictions of the minimum creep rate, but they also lead to different conclusions about the mechanisms determining the rates of creep from those originally put forward by Wilshire.

  9. Statistical strategies for averaging EC50 from multiple dose-response experiments.

    PubMed

    Jiang, Xiaoqi; Kopp-Schneider, Annette

    2015-11-01

    In most dose-response studies, repeated experiments are conducted to determine the EC50 value for a chemical, requiring averaging EC50 estimates from a series of experiments. Two statistical strategies, the mixed-effect modeling and the meta-analysis approach, can be applied to estimate average behavior of EC50 values over all experiments by considering the variabilities within and among experiments. We investigated these two strategies in two common cases of multiple dose-response experiments in (a) complete and explicit dose-response relationships are observed in all experiments and in (b) only in a subset of experiments. In case (a), the meta-analysis strategy is a simple and robust method to average EC50 estimates. In case (b), all experimental data sets can be first screened using the dose-response screening plot, which allows visualization and comparison of multiple dose-response experimental results. As long as more than three experiments provide information about complete dose-response relationships, the experiments that cover incomplete relationships can be excluded from the meta-analysis strategy of averaging EC50 estimates. If there are only two experiments containing complete dose-response information, the mixed-effects model approach is suggested. We subsequently provided a web application for non-statisticians to implement the proposed meta-analysis strategy of averaging EC50 estimates from multiple dose-response experiments.

  10. A New Polarimetric Classification Approach Evaluated for Agricultural Crops

    NASA Astrophysics Data System (ADS)

    Hoekman, D.

    2003-04-01

    Statistical properties of the polarimetric backscatter behaviour for a single homogeneous area are described by the Wishart distribution or its marginal distributions. These distributions do not necessarily well describe the statistics for a collection of homogeneous areas of the same class because of variation in, for example, biophysical parameters. Using Kolmogorov-Smirnov (K-S) tests of fit it is shown that, for example, the Beta distribution is a better descriptor for the coherence magnitude, and the log-normal distribution for the backscatter level. An evaluation is given for a number of agricultural crop classes, grasslands and fruit tree plantations at the Flevoland test site, using an AirSAR (C-, L- and P- band polarimetric) image of 3 July 1991. A new reversible transform of the covariance matrix into backscatter intensities will be introduced in order to describe the full polarimetric target properties in a mathematically alternative way, allowing for the development of simple, versatile and robust classifiers. Moreover, it allows for polarimetric image segmentation using conventional approaches. The effect of azimuthally asymmetric backscatter behaviour on the classification results is discussed. Several models are proposed and results are compared with results from literature for the same test site. It can be concluded that the introduced classifiers perform very well, with levels of accuracy for this test site of 90.4% for C-band, 88.7% for L- band and 96.3% for the combination of C- and L-band.

  11. A simple blind placement of the left-sided double-lumen tubes.

    PubMed

    Zong, Zhi Jun; Shen, Qi Ying; Lu, Yao; Li, Yuan Hai

    2016-11-01

    One-lung ventilation (OLV) has been commonly provided by using a double-lumen tube (DLT). Previous reports have indicated the high incidence of inappropriate DLT positioning in conventional maneuvers.After obtaining approval from the medical ethics committee of First Affiliated Hospital of Anhui Medical University and written consent from patients, 88 adult patients belonging to American society of anesthesiologists (ASA) physical status grade I or II, and undergoing elective thoracic surgery requiring a left-side DLT for OLV were enrolled in this prospective, single-blind, randomized controlled study. Patients were randomly allocated to 1 of 2 groups: simple maneuver group or conventional maneuver group. The simple maneuver is a method that relies on partially inflating the bronchial balloon and recreating the effect of a carinal hook on the DLTs to give an idea of orientation and depth. After the induction of anesthesia the patients were intubated with a left-sided Robertshaw DLT using one of the 2 intubation techniques. After intubation of each DLT, an anesthesiologist used flexible bronchoscopy to evaluate the patient while the patient lay in a supine position. The number of optimal position and the time required to place DLT in correct position were recorded.Time for the intubation of DLT took 100 ± 16.2 seconds (mean ± SD) in simple maneuver group and 95.1 ± 20.8 seconds in conventional maneuver group. The difference was not statistically significant (P = 0.221). Time for fiberoptic bronchoscope (FOB) took 22 ± 4.8 seconds in simple maneuver group and was statistically faster than that in conventional maneuver group (43.6 ± 23.7 seconds, P < 0.001). Nearly 98% of the 44 intubations in simple maneuver group were considered as in optimal position while only 52% of the 44 intubations in conventional maneuver group were in optimal position, and the difference was statistically significant (P < 0.001).This simple maneuver is more rapid and more accurate to position left-sided DLTs, it may be substituted for FOB during positioning of a left-sided DLT in condition that FOB is unavailable or inapplicable.

  12. Screening by imaging: scaling up single-DNA-molecule analysis with a novel parabolic VA-TIRF reflector and noise-reduction techniques.

    PubMed

    van 't Hoff, Marcel; Reuter, Marcel; Dryden, David T F; Oheim, Martin

    2009-09-21

    Bacteriophage lambda-DNA molecules are frequently used as a scaffold to characterize the action of single proteins unwinding, translocating, digesting or repairing DNA. However, scaling up such single-DNA-molecule experiments under identical conditions to attain statistically relevant sample sizes remains challenging. Additionally the movies obtained are frequently noisy and difficult to analyse with any precision. We address these two problems here using, firstly, a novel variable-angle total internal reflection fluorescence (VA-TIRF) reflector composed of a minimal set of optical reflective elements, and secondly, using single value decomposition (SVD) to improve the signal-to-noise ratio prior to analysing time-lapse image stacks. As an example, we visualize under identical optical conditions hundreds of surface-tethered single lambda-DNA molecules, stained with the intercalating dye YOYO-1 iodide, and stretched out in a microcapillary flow. Another novelty of our approach is that we arrange on a mechanically driven stage several capillaries containing saline, calibration buffer and lambda-DNA, respectively, thus extending the approach to high-content, high-throughput screening of single molecules. Our length measurements of individual DNA molecules from noise-reduced kymograph images using SVD display a 6-fold enhanced precision compared to raw-data analysis, reaching approximately 1 kbp resolution. Combining these two methods, our approach provides a straightforward yet powerful way of collecting statistically relevant amounts of data in a semi-automated manner. We believe that our conceptually simple technique should be of interest for a broader range of single-molecule studies, well beyond the specific example of lambda-DNA shown here.

  13. Theoretical and Experimental Investigation of Random Gust Loads Part I : Aerodynamic Transfer Function of a Simple Wing Configuration in Incompressible Flow

    NASA Technical Reports Server (NTRS)

    Hakkinen, Raimo J; Richardson, A S , Jr

    1957-01-01

    Sinusoidally oscillating downwash and lift produced on a simple rigid airfoil were measured and compared with calculated values. Statistically stationary random downwash and the corresponding lift on a simple rigid airfoil were also measured and the transfer functions between their power spectra determined. The random experimental values are compared with theoretically approximated values. Limitations of the experimental technique and the need for more extensive experimental data are discussed.

  14. Role of diversity in ICA and IVA: theory and applications

    NASA Astrophysics Data System (ADS)

    Adalı, Tülay

    2016-05-01

    Independent component analysis (ICA) has been the most popular approach for solving the blind source separation problem. Starting from a simple linear mixing model and the assumption of statistical independence, ICA can recover a set of linearly-mixed sources to within a scaling and permutation ambiguity. It has been successfully applied to numerous data analysis problems in areas as diverse as biomedicine, communications, finance, geo- physics, and remote sensing. ICA can be achieved using different types of diversity—statistical property—and, can be posed to simultaneously account for multiple types of diversity such as higher-order-statistics, sample dependence, non-circularity, and nonstationarity. A recent generalization of ICA, independent vector analysis (IVA), generalizes ICA to multiple data sets and adds the use of one more type of diversity, statistical dependence across the data sets, for jointly achieving independent decomposition of multiple data sets. With the addition of each new diversity type, identification of a broader class of signals become possible, and in the case of IVA, this includes sources that are independent and identically distributed Gaussians. We review the fundamentals and properties of ICA and IVA when multiple types of diversity are taken into account, and then ask the question whether diversity plays an important role in practical applications as well. Examples from various domains are presented to demonstrate that in many scenarios it might be worthwhile to jointly account for multiple statistical properties. This paper is submitted in conjunction with the talk delivered for the "Unsupervised Learning and ICA Pioneer Award" at the 2016 SPIE Conference on Sensing and Analysis Technologies for Biomedical and Cognitive Applications.

  15. Statistics of Atmospheric Circulations from Cumulant Expansions

    NASA Astrophysics Data System (ADS)

    Marston, B.; Sabou, F.

    2010-12-01

    Large-scale atmospheric flows are not so nonlinear as to preclude their direct statistical simulation (DSS) by systematic expansions in equal-time cumulants. Such DSS offers a number of advantages: (i) Low-order statistics are smoother in space and stiffer in time than the underlying instantaneous flows, hence statistically stationary or slowly varying fixed points can be described with fewer degrees of freedom and can also be accessed rapidly. (ii) Convergence with increasing resolution can be demonstrated. (iii) Finally and most importantly, DSS leads more directly to understanding, by integrating out fast modes, leaving only the slow modes that contain the most interesting information. This makes the approach ideal for simulating and understanding modes of the climate system, including changes in these modes that are driven by climate change. The equations of motion for the cumulants form an infinite hierarchy. The simplest closure is to set the third and higher order cumulants to zero. We extend previous work (Marston, Conover, and Schneider 2008) along these lines to two-layer models of the general circulation which has previously been argued to be only weakly nonlinear (O'Gorman and Schneider, 2006). Equal-time statistics so obtained agree reasonably well with those accumulated by direct numerical simulation (DNS) reproducing efficiently the midlatitude westerlies and storm tracks, tropical easterlies, and non-local teleconnection patterns (Marston 2010). Low-frequency modes of variability can also be captured. The primitive equation model of Held & Suarez, with and without latent heat release, is investigated, providing a test of whether DSS accurately reproduces the responses to simple climate forcings as found by DNS.

  16. "Can Simple Metals Be Transmuted into Gold?" Teaching Science through a Historical Approach.

    ERIC Educational Resources Information Center

    Mamlok, Rachel; Ben-Zvi, Ruth; Menis, Joseph; Penick, John E.

    2000-01-01

    Describes the development and enactment of a new teaching unit, "Can simple metals be transmuted into gold?", through an historical approach as well as teacher preparation to teach this unit. (Contains 16 references.) (ASK)

  17. Results of the Sea Ice Model Intercomparison Project: Evaluation of sea ice rheology schemes for use in climate simulations

    NASA Astrophysics Data System (ADS)

    Kreyscher, Martin; Harder, Markus; Lemke, Peter; Flato, Gregory M.

    2000-05-01

    A hierarchy of sea ice rheologies is evaluated on the basis of a comprehensive set of observational data. The investigations are part of the Sea Ice Model Intercomparison Project (SIMIP). Four different sea ice rheology schemes are compared: a viscous-plastic rheology, a cavitating-fluid model, a compressible Newtonian fluid, and a simple free drift approach with velocity correction. The same grid, land boundaries, and forcing fields are applied to all models. As verification data, there are (1) ice thickness data from upward looking sonars (ULS), (2) ice concentration data from the passive microwave radiometers SMMR and SSM/I, (3) daily buoy drift data obtained by the International Arctic Buoy Program (IABP), and (4) satellite-derived ice drift fields based on the 85 GHz channel of SSM/I. All models are optimized individually with respect to mean drift speed and daily drift speed statistics. The impact of ice strength on the ice cover is best revealed by the spatial pattern of ice thickness, ice drift on different timescales, daily drift speed statistics, and the drift velocities in Fram Strait. Overall, the viscous-plastic rheology yields the most realistic simulation. In contrast, the results of the very simple free-drift model with velocity correction clearly show large errors in simulated ice drift as well as in ice thicknesses and ice export through Fram Strait compared to observation. The compressible Newtonian fluid cannot prevent excessive ice thickness buildup in the central Arctic and overestimates the internal forces in Fram Strait. Because of the lack of shear strength, the cavitating-fluid model shows marked differences to the statistics of observed ice drift and the observed spatial pattern of ice thickness. Comparison of required computer resources demonstrates that the additional cost for the viscous-plastic sea ice rheology is minor compared with the atmospheric and oceanic model components in global climate simulations.

  18. A Simple Approach to Improving Student Writing: An Example from Hydrology

    ERIC Educational Resources Information Center

    Carlson, Catherine A.

    2007-01-01

    Using the simple approach described in this article, college science instructors can help students become independent thinkers and writers in science. The unique character of this approach is that it shows students how to formulate the questions they need to answer in their writing, as well as how to answer them. Rather than using a cookbook…

  19. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

    PubMed Central

    Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260

  20. On base station cooperation using statistical CSI in jointly correlated MIMO downlink channels

    NASA Astrophysics Data System (ADS)

    Zhang, Jun; Jiang, Bin; Jin, Shi; Gao, Xiqi; Wong, Kai-Kit

    2012-12-01

    This article studies the transmission of a single cell-edge user's signal using statistical channel state information at cooperative base stations (BSs) with a general jointly correlated multiple-input multiple-output (MIMO) channel model. We first present an optimal scheme to maximize the ergodic sum capacity with per-BS power constraints, revealing that the transmitted signals of all BSs are mutually independent and the optimum transmit directions for each BS align with the eigenvectors of the BS's own transmit correlation matrix of the channel. Then, we employ matrix permanents to derive a closed-form tight upper bound for the ergodic sum capacity. Based on these results, we develop a low-complexity power allocation solution using convex optimization techniques and a simple iterative water-filling algorithm (IWFA) for power allocation. Finally, we derive a necessary and sufficient condition for which a beamforming approach achieves capacity for all BSs. Simulation results demonstrate that the upper bound of ergodic sum capacity is tight and the proposed cooperative transmission scheme increases the downlink system sum capacity considerably.

  1. Alarms about structural alerts.

    PubMed

    Alves, Vinicius; Muratov, Eugene; Capuzzi, Stephen; Politi, Regina; Low, Yen; Braga, Rodolpho; Zakharov, Alexey V; Sedykh, Alexander; Mokshyna, Elena; Farag, Sherif; Andrade, Carolina; Kuz'min, Victor; Fourches, Denis; Tropsha, Alexander

    2016-08-21

    Structural alerts are widely accepted in chemical toxicology and regulatory decision support as a simple and transparent means to flag potential chemical hazards or group compounds into categories for read-across. However, there has been a growing concern that alerts disproportionally flag too many chemicals as toxic, which questions their reliability as toxicity markers. Conversely, the rigorously developed and properly validated statistical QSAR models can accurately and reliably predict the toxicity of a chemical; however, their use in regulatory toxicology has been hampered by the lack of transparency and interpretability. We demonstrate that contrary to the common perception of QSAR models as "black boxes" they can be used to identify statistically significant chemical substructures (QSAR-based alerts) that influence toxicity. We show through several case studies, however, that the mere presence of structural alerts in a chemical, irrespective of the derivation method (expert-based or QSAR-based), should be perceived only as hypotheses of possible toxicological effect. We propose a new approach that synergistically integrates structural alerts and rigorously validated QSAR models for a more transparent and accurate safety assessment of new chemicals.

  2. Unusual Childhood Waking as a Possible Precursor of the 1995 Kobe Earthquake

    PubMed Central

    Ikeya, Motoji; Whitehead, Neil E.

    2013-01-01

    Simple Summary The paper investigates whether young children may waken before earthquakes through a cause other than foreshocks. It concludes there is statistical evidence for this, but the mechanism best supported is anxiety produced by Ultra Low Frequency (ULF) electromagnetic waves. Abstract Nearly 1,100 young students living in Japan at a range of distances up to 500 km from the 1995 Kobe M7 earthquake were interviewed. A statistically significant abnormal rate of early wakening before the earthquake was found, having exponential decrease with distance and a half value approaching 100 km, but decreasing much slower than from a point source such as an epicentre; instead originating from an extended area of more than 100 km in diameter. Because an improbably high amount of variance is explained, this effect is unlikely to be simply psychological and must reflect another mechanism—perhaps Ultra-Low Frequency (ULF) electromagnetic waves creating anxiety—but probably not 222Rn excess. Other work reviewed suggests these conclusions may be valid for animals in general, not just children, but would be very difficult to apply for practical earthquake prediction. PMID:26487316

  3. Length bias correction in gene ontology enrichment analysis using logistic regression.

    PubMed

    Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

    2012-01-01

    When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

  4. Adaptive design optimization: a mutual information-based approach to model discrimination in cognitive science.

    PubMed

    Cavagnaro, Daniel R; Myung, Jay I; Pitt, Mark A; Kujala, Janne V

    2010-04-01

    Discriminating among competing statistical models is a pressing issue for many experimentalists in the field of cognitive science. Resolving this issue begins with designing maximally informative experiments. To this end, the problem to be solved in adaptive design optimization is identifying experimental designs under which one can infer the underlying model in the fewest possible steps. When the models under consideration are nonlinear, as is often the case in cognitive science, this problem can be impossible to solve analytically without simplifying assumptions. However, as we show in this letter, a full solution can be found numerically with the help of a Bayesian computational trick derived from the statistics literature, which recasts the problem as a probability density simulation in which the optimal design is the mode of the density. We use a utility function based on mutual information and give three intuitive interpretations of the utility function in terms of Bayesian posterior estimates. As a proof of concept, we offer a simple example application to an experiment on memory retention.

  5. Analysis of swarm behaviors based on an inversion of the fluctuation theorem.

    PubMed

    Hamann, Heiko; Schmickl, Thomas; Crailsheim, Karl

    2014-01-01

    A grand challenge in the field of artificial life is to find a general theory of emergent self-organizing systems. In swarm systems most of the observed complexity is based on motion of simple entities. Similarly, statistical mechanics focuses on collective properties induced by the motion of many interacting particles. In this article we apply methods from statistical mechanics to swarm systems. We try to explain the emergent behavior of a simulated swarm by applying methods based on the fluctuation theorem. Empirical results indicate that swarms are able to produce negative entropy within an isolated subsystem due to frozen accidents. Individuals of a swarm are able to locally detect fluctuations of the global entropy measure and store them, if they are negative entropy productions. By accumulating these stored fluctuations over time the swarm as a whole is producing negative entropy and the system ends up in an ordered state. We claim that this indicates the existence of an inverted fluctuation theorem for emergent self-organizing dissipative systems. This approach bears the potential of general applicability.

  6. Comparison of Kasai Autocorrelation and Maximum Likelihood Estimators for Doppler Optical Coherence Tomography

    PubMed Central

    Chan, Aaron C.; Srinivasan, Vivek J.

    2013-01-01

    In optical coherence tomography (OCT) and ultrasound, unbiased Doppler frequency estimators with low variance are desirable for blood velocity estimation. Hardware improvements in OCT mean that ever higher acquisition rates are possible, which should also, in principle, improve estimation performance. Paradoxically, however, the widely used Kasai autocorrelation estimator’s performance worsens with increasing acquisition rate. We propose that parametric estimators based on accurate models of noise statistics can offer better performance. We derive a maximum likelihood estimator (MLE) based on a simple additive white Gaussian noise model, and show that it can outperform the Kasai autocorrelation estimator. In addition, we also derive the Cramer Rao lower bound (CRLB), and show that the variance of the MLE approaches the CRLB for moderate data lengths and noise levels. We note that the MLE performance improves with longer acquisition time, and remains constant or improves with higher acquisition rates. These qualities may make it a preferred technique as OCT imaging speed continues to improve. Finally, our work motivates the development of more general parametric estimators based on statistical models of decorrelation noise. PMID:23446044

  7. An improved empirical dynamic control system model of global mean sea level rise and surface temperature change

    NASA Astrophysics Data System (ADS)

    Wu, Qing; Luu, Quang-Hung; Tkalich, Pavel; Chen, Ge

    2018-04-01

    Having great impacts on human lives, global warming and associated sea level rise are believed to be strongly linked to anthropogenic causes. Statistical approach offers a simple and yet conceptually verifiable combination of remotely connected climate variables and indices, including sea level and surface temperature. We propose an improved statistical reconstruction model based on the empirical dynamic control system by taking into account the climate variability and deriving parameters from Monte Carlo cross-validation random experiments. For the historic data from 1880 to 2001, we yielded higher correlation results compared to those from other dynamic empirical models. The averaged root mean square errors are reduced in both reconstructed fields, namely, the global mean surface temperature (by 24-37%) and the global mean sea level (by 5-25%). Our model is also more robust as it notably diminished the unstable problem associated with varying initial values. Such results suggest that the model not only enhances significantly the global mean reconstructions of temperature and sea level but also may have a potential to improve future projections.

  8. Hydrologic Landscape Regionalisation Using Deductive Classification and Random Forests

    PubMed Central

    Brown, Stuart C.; Lester, Rebecca E.; Versace, Vincent L.; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of catchments and continents. PMID:25396410

  9. Hydrologic landscape regionalisation using deductive classification and random forests.

    PubMed

    Brown, Stuart C; Lester, Rebecca E; Versace, Vincent L; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of catchments and continents.

  10. Statistical palaeomagnetic field modelling and dynamo numerical simulation

    NASA Astrophysics Data System (ADS)

    Bouligand, C.; Hulot, G.; Khokhlov, A.; Glatzmaier, G. A.

    2005-06-01

    By relying on two numerical dynamo simulations for which such investigations are possible, we test the validity and sensitivity of a statistical palaeomagnetic field modelling approach known as the giant gaussian process (GGP) modelling approach. This approach is currently used to analyse palaeomagnetic data at times of stable polarity and infer some information about the way the main magnetic field (MF) of the Earth has been behaving in the past and has possibly been influenced by core-mantle boundary (CMB) conditions. One simulation has been run with homogeneous CMB conditions, the other with more realistic non-homogeneous symmetry breaking CMB conditions. In both simulations, it is found that, as required by the GGP approach, the field behaves as a short-term memory process. Some severe non-stationarity is however found in the non-homogeneous case, leading to very significant departures of the Gauss coefficients from a Gaussian distribution, in contradiction with the assumptions underlying the GGP approach. A similar but less severe non-stationarity is found in the case of the homogeneous simulation, which happens to display a more Earth-like temporal behaviour than the non-homogeneous case. This suggests that a GGP modelling approach could nevertheless be applied to try and estimate the mean μ and covariance matrix γ(τ) (first- and second-order statistical moments) of the field produced by the geodynamo. A detailed study of both simulations is carried out to assess the possibility of detecting statistical symmetry breaking properties of the underlying dynamo process by inspection of estimates of μ and γ(τ). As expected (because of the role of the rotation of the Earth in the dynamo process), those estimates reveal spherical symmetry breaking properties. Equatorial symmetry breaking properties are also detected in both simulations, showing that such symmetry breaking properties can occur spontaneously under homogeneous CMB conditions. By contrast axial symmetry breaking is detected only in the non-homogenous simulation, testifying for the constraints imposed by the CMB conditions. The signature of this axial symmetry breaking is however found to be much weaker than the signature of equatorial symmetry breaking. We note that this could be the reason why only equatorial symmetry breaking properties (in the form of the well-known axial quadrupole term in the time-averaged field) have unambiguously been found so far by analysing the real data. However, this could also be because those analyses have all assumed to simple a form for γ(τ) when attempting to estimate μ. Suggestions are provided to make sure future attempts of GGP modelling with real data are being carried out in a more consistent and perhaps more efficient way.

  11. Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images

    NASA Technical Reports Server (NTRS)

    Fischer, Bernd

    2004-01-01

    Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems which use numerical approximations even in cases where closed-form solutions exist. AutoBayes is implemented in Prolog and comprises approximately 75.000 lines of code. In this paper, we take one typical scientific data analysis problem-analyzing planetary nebulae images taken by the Hubble Space Telescope-and show how AutoBayes can be used to automate the implementation of the necessary anal- ysis programs. We initially follow the analysis described by Knuth and Hajian [KHO2] and use AutoBayes to derive code for the published models. We show the details of the code derivation process, including the symbolic computations and automatic integration of library procedures, and compare the results of the automatically generated and manually implemented code. We then go beyond the original analysis and use AutoBayes to derive code for a simple image segmentation procedure based on a mixture model which can be used to automate a manual preproceesing step. Finally, we combine the original approach with the simple segmentation which yields a more detailed analysis. This also demonstrates that AutoBayes makes it easy to combine different aspects of data analysis.

  12. Investigating decoherence in a simple system

    NASA Technical Reports Server (NTRS)

    Albrecht, Andreas

    1991-01-01

    The results of some simple calculations designed to study quantum decoherence are presented. The physics of quantum decoherence are briefly reviewed, and a very simple 'toy' model is analyzed. Exact solutions are found using numerical techniques. The type of incoherence exhibited by the model can be changed by varying a coupling strength. The author explains why the conventional approach to studying decoherence by checking the diagonality of the density matrix is not always adequate. Two other approaches, the decoherence functional and the Schmidt paths approach, are applied to the toy model and contrasted to each other. Possible problems with each are discussed.

  13. Statistical Methods for the Analysis of Discrete Choice Experiments: A Report of the ISPOR Conjoint Analysis Good Research Practices Task Force.

    PubMed

    Hauber, A Brett; González, Juan Marcos; Groothuis-Oudshoorn, Catharina G M; Prior, Thomas; Marshall, Deborah A; Cunningham, Charles; IJzerman, Maarten J; Bridges, John F P

    2016-06-01

    Conjoint analysis is a stated-preference survey method that can be used to elicit responses that reveal preferences, priorities, and the relative importance of individual features associated with health care interventions or services. Conjoint analysis methods, particularly discrete choice experiments (DCEs), have been increasingly used to quantify preferences of patients, caregivers, physicians, and other stakeholders. Recent consensus-based guidance on good research practices, including two recent task force reports from the International Society for Pharmacoeconomics and Outcomes Research, has aided in improving the quality of conjoint analyses and DCEs in outcomes research. Nevertheless, uncertainty regarding good research practices for the statistical analysis of data from DCEs persists. There are multiple methods for analyzing DCE data. Understanding the characteristics and appropriate use of different analysis methods is critical to conducting a well-designed DCE study. This report will assist researchers in evaluating and selecting among alternative approaches to conducting statistical analysis of DCE data. We first present a simplistic DCE example and a simple method for using the resulting data. We then present a pedagogical example of a DCE and one of the most common approaches to analyzing data from such a question format-conditional logit. We then describe some common alternative methods for analyzing these data and the strengths and weaknesses of each alternative. We present the ESTIMATE checklist, which includes a list of questions to consider when justifying the choice of analysis method, describing the analysis, and interpreting the results. Copyright © 2016 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  14. Improving the Statistical Modeling of the TRMM Extreme Precipitation Monitoring System

    NASA Astrophysics Data System (ADS)

    Demirdjian, L.; Zhou, Y.; Huffman, G. J.

    2016-12-01

    This project improves upon an existing extreme precipitation monitoring system based on the Tropical Rainfall Measuring Mission (TRMM) daily product (3B42) using new statistical models. The proposed system utilizes a regional modeling approach, where data from similar grid locations are pooled to increase the quality and stability of the resulting model parameter estimates to compensate for the short data record. The regional frequency analysis is divided into two stages. In the first stage, the region defined by the TRMM measurements is partitioned into approximately 27,000 non-overlapping clusters using a recursive k-means clustering scheme. In the second stage, a statistical model is used to characterize the extreme precipitation events occurring in each cluster. Instead of utilizing the block-maxima approach used in the existing system, where annual maxima are fit to the Generalized Extreme Value (GEV) probability distribution at each cluster separately, the present work adopts the peak-over-threshold (POT) method of classifying points as extreme if they exceed a pre-specified threshold. Theoretical considerations motivate the use of the Generalized-Pareto (GP) distribution for fitting threshold exceedances. The fitted parameters can be used to construct simple and intuitive average recurrence interval (ARI) maps which reveal how rare a particular precipitation event is given its spatial location. The new methodology eliminates much of the random noise that was produced by the existing models due to a short data record, producing more reasonable ARI maps when compared with NOAA's long-term Climate Prediction Center (CPC) ground based observations. The resulting ARI maps can be useful for disaster preparation, warning, and management, as well as increased public awareness of the severity of precipitation events. Furthermore, the proposed methodology can be applied to various other extreme climate records.

  15. Fun with maths: exploring implications of mathematical models for malaria eradication.

    PubMed

    Eckhoff, Philip A; Bever, Caitlin A; Gerardin, Jaline; Wenger, Edward A

    2014-12-11

    Mathematical analyses and modelling have an important role informing malaria eradication strategies. Simple mathematical approaches can answer many questions, but it is important to investigate their assumptions and to test whether simple assumptions affect the results. In this note, four examples demonstrate both the effects of model structures and assumptions and also the benefits of using a diversity of model approaches. These examples include the time to eradication, the impact of vaccine efficacy and coverage, drug programs and the effects of duration of infections and delays to treatment, and the influence of seasonality and migration coupling on disease fadeout. An excessively simple structure can miss key results, but simple mathematical approaches can still achieve key results for eradication strategy and define areas for investigation by more complex models.

  16. Nitrous oxide emissions from agricultural landscapes: quantification tools, policy development, and opportunities for improved management

    NASA Astrophysics Data System (ADS)

    Tonitto, C.; Gurwick, N. P.

    2012-12-01

    Policy initiatives to reduce greenhouse gas emissions (GHG) have promoted the development of agricultural management protocols to increase SOC storage and reduce GHG emissions. We review approaches for quantifying N2O flux from agricultural landscapes. We summarize the temporal and spatial extent of observations across representative soil classes, climate zones, cropping systems, and management scenarios. We review applications of simulation and empirical modeling approaches and compare validation outcomes across modeling tools. Subsequently, we review current model application in agricultural management protocols. In particular, we compare approaches adapted for compliance with the California Global Warming Solutions Act, the Alberta Climate Change and Emissions Management Act, and by the American Carbon Registry. In the absence of regional data to drive model development, policies that require GHG quantification often use simple empirical models based on highly aggregated data of N2O flux as a function of applied N - Tier 1 models according to IPCC categorization. As participants in development of protocols that could be used in carbon offset markets, we observed that stakeholders outside of the biogeochemistry community favored outcomes from simulation modeling (Tier 3) rather than empirical modeling (Tier 2). In contrast, scientific advisors were more accepting of outcomes based on statistical approaches that rely on local observations, and their views sometimes swayed policy practitioners over the course of policy development. Both Tier 2 and Tier 3 approaches have been implemented in current policy development, and it is important that the strengths and limitations of both approaches, in the face of available data, be well-understood by those drafting and adopting policies and protocols. The reliability of all models is contingent on sufficient observations for model development and validation. Simulation models applied without site-calibration generally result in poor validation results, and this point particularly needs to be emphasized during policy development. For cases where sufficient calibration data are available, simulation models have demonstrated the ability to capture seasonal patterns of N2O flux. The reliability of statistical models likewise depends on data availability. Because soil moisture is a significant driver of N2O flux, the best outcomes occur when empirical models are applied to systems with relevant soil classification and climate. The structure of current carbon offset protocols is not well-aligned with a budgetary approach to GHG accounting. Current protocols credit field-scale reduction in N2O flux as a result of reduced fertilizer use. Protocols do not award farmers credit for reductions in CO2 emissions resulting from reduced production of synthetic N fertilizer. To achieve the greatest GHG emission reductions through reduced synthetic N production and reduced landscape N saturation requires a re-envisioning of the agricultural landscape to include cropping systems with legume and manure N sources. The current focus on on-farm GHG sources focuses credits on simple reductions of N applied in conventional systems rather than on developing cropping systems which promote higher recycling and retention of N.

  17. The Feasibility of Real-Time Intraoperative Performance Assessment With SIMPL (System for Improving and Measuring Procedural Learning): Early Experience From a Multi-institutional Trial.

    PubMed

    Bohnen, Jordan D; George, Brian C; Williams, Reed G; Schuller, Mary C; DaRosa, Debra A; Torbeck, Laura; Mullen, John T; Meyerson, Shari L; Auyang, Edward D; Chipman, Jeffrey G; Choi, Jennifer N; Choti, Michael A; Endean, Eric D; Foley, Eugene F; Mandell, Samuel P; Meier, Andreas H; Smink, Douglas S; Terhune, Kyla P; Wise, Paul E; Soper, Nathaniel J; Zwischenberger, Joseph B; Lillemoe, Keith D; Dunnington, Gary L; Fryer, Jonathan P

    Intraoperative performance assessment of residents is of growing interest to trainees, faculty, and accreditors. Current approaches to collect such assessments are limited by low participation rates and long delays between procedure and evaluation. We deployed an innovative, smartphone-based tool, SIMPL (System for Improving and Measuring Procedural Learning), to make real-time intraoperative performance assessment feasible for every case in which surgical trainees participate, and hypothesized that SIMPL could be feasibly integrated into surgical training programs. Between September 1, 2015 and February 29, 2016, 15 U.S. general surgery residency programs were enrolled in an institutional review board-approved trial. SIMPL was made available after 70% of faculty and residents completed a 1-hour training session. Descriptive and univariate statistics analyzed multiple dimensions of feasibility, including training rates, volume of assessments, response rates/times, and dictation rates. The 20 most active residents and attendings were evaluated in greater detail. A total of 90% of eligible users (1267/1412) completed training. Further, 13/15 programs began using SIMPL. Totally, 6024 assessments were completed by 254 categorical general surgery residents (n = 3555 assessments) and 259 attendings (n = 2469 assessments), and 3762 unique operations were assessed. There was significant heterogeneity in participation within and between programs. Mean percentage (range) of users who completed ≥1, 5, and 20 assessments were 62% (21%-96%), 34% (5%-75%), and 10% (0%-32%) across all programs, and 96%, 75%, and 32% in the most active program. Overall, response rate was 70%, dictation rate was 24%, and mean response time was 12 hours. Assessments increased from 357 (September 2015) to 1146 (February 2016). The 20 most active residents each received mean 46 assessments by 10 attendings for 20 different procedures. SIMPL can be feasibly integrated into surgical training programs to enhance the frequency and timeliness of intraoperative performance assessment. We believe SIMPL could help facilitate a national competency-based surgical training system, although local and systemic challenges still need to be addressed. Copyright © 2016. Published by Elsevier Inc.

  18. Accounting for selection bias in association studies with complex survey data.

    PubMed

    Wirth, Kathleen E; Tchetgen Tchetgen, Eric J

    2014-05-01

    Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.

  19. Outcomes of robot-assisted simple enucleation of renal masses: A single European center experience.

    PubMed

    Matei, Deliu Victor; Vartolomei, Mihai Dorin; Musi, Gennaro; Renne, Giuseppe; Tringali, Valeria Maria Lucia; Mistretta, Francesco Alessandro; Delor, Maurizio; Russo, Andrea; Cioffi, Antonio; Bianchi, Roberto; Cozzi, Gabriele; Di Trapani, Ettore; Bottero, Danilo; Cordima, Giovanni; Lucarelli, Giuseppe; Ferro, Matteo; de Cobelli, Ottavio

    2017-05-01

    The aim of this study was to assess the ability of pre-and intraoperative parameters, to predict the risk of perioperative complications after robot-assisted laparoscopic simple enucleation (RASE) of renal masses, and to evaluate the rate of trifecta achievement of this approach stratifying the cohort according to the use of ischemia during the enucleation.From April 2009 to June 2016, 129 patients underwent RASE at our Institution. We stratified the procedures in 2 groups: clamping and clamp-less RASE. After RASE, all specimens were retrospectively reviewed to assess the surface-intermediate-base (SIB) scoring system. Patients were followed-up according to the European Association of Urology guidelines recommendations. All pre-, intra-, and postoperative outcomes were prospectively collected in a customized database and retrospectively analyzed.A total of 112 (86.8%) patients underwent a pure RASE and 17 (13.2%) had a hybrid according to SIB classification system. The mean age was 61.17 years. In 21 patients (16.3%), complications occurred, 13 (61.9%) were Clavien 1 and 2, while 8 were Clavien 3a and b complications. Statistical significant association with complications was found in patients with American Society of Anestesiology (ASA) score 3 (44.5%, P = .04), longer mean operative time (OT) 195 versus 161.36 minutes (P =.03), mean postoperative hemoglobin (Hb) 10.1 versus 11.8 (P <.001), and mean ΔHb 3.59 versus 2.18 (P <.001). In multivariate logistic regression, only longer OT and ΔHb were statistical significant predictive factors for complications. In sub-group analysis, clamp-less RASE was safe in terms of complications (14.1%), positive surgical margins (1.3%), and mid-term local recurrence (1.3%). Although in this approach there is higher EBL (P = .01), this had no impact on ΔHb (P = .28). A clamp-less approach was associated with a higher rate of SIB 0 (71.8% vs 51%, P = .02), higher trifecta achievement (84.6% vs 62.7%, P = .004), and better impact on serum creatinine (mean 0.83 vs 0.91, P = .01).RASE of renal tumors is a safe technique with very good postoperative outcomes. Complication rate is low and associated with ASA score >3, longer OT, and ΔHb. RASE is suitable for the clamp-less approach, which allows to perform easier the pure enucleation (SIB 0) and to obtain higher rates of trifecta outcomes.

  20. Improve ay101 teaching by single modifications versus unchanged controls: statistically-supported examples

    NASA Astrophysics Data System (ADS)

    Byrd, Gene G.; Byrd, Dana

    2017-06-01

    The two main purposes of this paper on improving Ay101 courses are presentations of (1) some very effective single changes and (2) a method to improve teaching by making just single changes which are evaluated statistically versus a control group class. We show how simple statistical comparison can be done even with Excel in Windows. Of course, other more sophisticated and powerful methods could be used if available. One of several examples to be discussed on our poster is our modification of an online introductory astronomy lab course evaluated by the multiple choice final exam. We composed questions related to the learning objectives of the course modules (LOQs). Students could “talk to themselves” by discursively answering these for extra credit prior to the final. Results were compared to an otherwise identical previous unmodified class. Modified classes showed statistically much better final exam average scores (78% vs. 66%). This modification helped those students who most need help. Students in the lower third of the class preferentially answered the LOQs to improve their scores and the class average on the exam. These results also show the effectiveness of relevant extra credit work. Other examples will be discussed as specific examples of evaluating improvement by making one change and then testing it versus a control. Essentially, this is an evolutionary approach in which single favorable “mutations” are retained and the unfavorable removed. The temptation to make more than one change each time must be resisted!

Top