statistical analysis allowed: Topics by Science.gov

Sample records for statistical analysis allowed

General specifications for the development of a USL NASA PC R and D statistical analysis support package

NASA Technical Reports Server (NTRS)

Dominick, Wayne D. (Editor); Bassari, Jinous; Triantafyllopoulos, Spiros

1984-01-01

The University of Southwestern Louisiana (USL) NASA PC R and D statistical analysis support package is designed to be a three-level package to allow statistical analysis for a variety of applications within the USL Data Base Management System (DBMS) contract work. The design addresses usage of the statistical facilities as a library package, as an interactive statistical analysis system, and as a batch processing package.
DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Amanda M.; Daly, Don S.; Willse, Alan R.

The Automated Microarray Image Analysis (AMIA) Toolbox for MATLAB is a flexible, open-source microarray image analysis tool that allows the user to customize analysis of sets of microarray images. This tool provides several methods of identifying and quantify spot statistics, as well as extensive diagnostic statistics and images to identify poor data quality or processing. The open nature of this software allows researchers to understand the algorithms used to provide intensity estimates and to modify them easily if desired.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

PubMed

Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

2007-01-01

The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
A Review of Meta-Analysis Packages in R

ERIC Educational Resources Information Center

Polanin, Joshua R.; Hennessy, Emily A.; Tanner-Smith, Emily E.

2017-01-01

Meta-analysis is a statistical technique that allows an analyst to synthesize effect sizes from multiple primary studies. To estimate meta-analysis models, the open-source statistical environment R is quickly becoming a popular choice. The meta-analytic community has contributed to this growth by developing numerous packages specific to…
Single-case research design in pediatric psychology: considerations regarding data analysis.

PubMed

Cohen, Lindsey L; Feinstein, Amanda; Masuda, Akihiko; Vowles, Kevin E

2014-03-01

Single-case research allows for an examination of behavior and can demonstrate the functional relation between intervention and outcome in pediatric psychology. This review highlights key assumptions, methodological and design considerations, and options for data analysis. Single-case methodology and guidelines are reviewed with an in-depth focus on visual and statistical analyses. Guidelines allow for the careful evaluation of design quality and visual analysis. A number of statistical techniques have been introduced to supplement visual analysis, but to date, there is no consensus on their recommended use in single-case research design. Single-case methodology is invaluable for advancing pediatric psychology science and practice, and guidelines have been introduced to enhance the consistency, validity, and reliability of these studies. Experts generally agree that visual inspection is the optimal method of analysis in single-case design; however, statistical approaches are becoming increasingly evaluated and used to augment data interpretation.
PRANAS: A New Platform for Retinal Analysis and Simulation.

PubMed

Cessac, Bruno; Kornprobst, Pierre; Kraria, Selim; Nasser, Hassan; Pamplona, Daniela; Portelli, Geoffrey; Viéville, Thierry

2017-01-01

The retina encodes visual scenes by trains of action potentials that are sent to the brain via the optic nerve. In this paper, we describe a new free access user-end software allowing to better understand this coding. It is called PRANAS (https://pranas.inria.fr), standing for Platform for Retinal ANalysis And Simulation. PRANAS targets neuroscientists and modelers by providing a unique set of retina-related tools. PRANAS integrates a retina simulator allowing large scale simulations while keeping a strong biological plausibility and a toolbox for the analysis of spike train population statistics. The statistical method (entropy maximization under constraints) takes into account both spatial and temporal correlations as constraints, allowing to analyze the effects of memory on statistics. PRANAS also integrates a tool computing and representing in 3D (time-space) receptive fields. All these tools are accessible through a friendly graphical user interface. The most CPU-costly of them have been implemented to run in parallel.
Developing Sampling Frame for Case Study: Challenges and Conditions

ERIC Educational Resources Information Center

Ishak, Noriah Mohd; Abu Bakar, Abu Yazid

2014-01-01

Due to statistical analysis, the issue of random sampling is pertinent to any quantitative study. Unlike quantitative study, the elimination of inferential statistical analysis, allows qualitative researchers to be more creative in dealing with sampling issue. Since results from qualitative study cannot be generalized to the bigger population,…
A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis

ERIC Educational Resources Information Center

Gonzalez, Oscar; MacKinnon, David P.

2018-01-01

Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to…
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

NASA Astrophysics Data System (ADS)

Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.

2014-12-01

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
Travelogue--a newcomer encounters statistics and the computer.

PubMed

Bruce, Peter

2011-11-01

Computer-intensive methods have revolutionized statistics, giving rise to new areas of analysis and expertise in predictive analytics, image processing, pattern recognition, machine learning, genomic analysis, and more. Interest naturally centers on the new capabilities the computer allows the analyst to bring to the table. This article, instead, focuses on the account of how computer-based resampling methods, with their relative simplicity and transparency, enticed one individual, untutored in statistics or mathematics, on a long journey into learning statistics, then teaching it, then starting an education institution.
Students' Appreciation of Expectation and Variation as a Foundation for Statistical Understanding

ERIC Educational Resources Information Center

Watson, Jane M.; Callingham, Rosemary A.; Kelly, Ben A.

2007-01-01

This study presents the results of a partial credit Rasch analysis of in-depth interview data exploring statistical understanding of 73 school students in 6 contextual settings. The use of Rasch analysis allowed the exploration of a single underlying variable across contexts, which included probability sampling, representation of temperature…
A Survey of Statistical Capstone Projects

ERIC Educational Resources Information Center

Martonosi, Susan E.; Williams, Talithia D.

2016-01-01

In this article, we highlight the advantages of incorporating a statistical capstone experience in the undergraduate curriculum, where students perform an in-depth analysis of real-world data. Capstone experiences develop statistical thinking by allowing students to engage in a consulting-like experience that requires skills outside the scope of…
RooStatsCms: A tool for analysis modelling, combination and statistical studies

NASA Astrophysics Data System (ADS)

Piparo, D.; Schott, G.; Quast, G.

2010-04-01

RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.
Introduction of statistical information in a syntactic analyzer for document image recognition

NASA Astrophysics Data System (ADS)

Maroneze, André O.; Coüasnon, Bertrand; Lemaitre, Aurélie

2011-01-01

This paper presents an improvement to document layout analysis systems, offering a possible solution to Sayre's paradox (which states that an element "must be recognized before it can be segmented; and it must be segmented before it can be recognized"). This improvement, based on stochastic parsing, allows integration of statistical information, obtained from recognizers, during syntactic layout analysis. We present how this fusion of numeric and symbolic information in a feedback loop can be applied to syntactic methods to improve document description expressiveness. To limit combinatorial explosion during exploration of solutions, we devised an operator that allows optional activation of the stochastic parsing mechanism. Our evaluation on 1250 handwritten business letters shows this method allows the improvement of global recognition scores.
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

DOE PAGES

Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...

2014-12-02

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi

2014-12-01

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Statistical process control methods allow the analysis and improvement of anesthesia care.

PubMed

Fasting, Sigurd; Gisvold, Sven E

2003-10-01

Quality aspects of the anesthetic process are reflected in the rate of intraoperative adverse events. The purpose of this report is to illustrate how the quality of the anesthesia process can be analyzed using statistical process control methods, and exemplify how this analysis can be used for quality improvement. We prospectively recorded anesthesia-related data from all anesthetics for five years. The data included intraoperative adverse events, which were graded into four levels, according to severity. We selected four adverse events, representing important quality and safety aspects, for statistical process control analysis. These were: inadequate regional anesthesia, difficult emergence from general anesthesia, intubation difficulties and drug errors. We analyzed the underlying process using 'p-charts' for statistical process control. In 65,170 anesthetics we recorded adverse events in 18.3%; mostly of lesser severity. Control charts were used to define statistically the predictable normal variation in problem rate, and then used as a basis for analysis of the selected problems with the following results: Inadequate plexus anesthesia: stable process, but unacceptably high failure rate; Difficult emergence: unstable process, because of quality improvement efforts; Intubation difficulties: stable process, rate acceptable; Medication errors: methodology not suited because of low rate of errors. By applying statistical process control methods to the analysis of adverse events, we have exemplified how this allows us to determine if a process is stable, whether an intervention is required, and if quality improvement efforts have the desired effect.
TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

PubMed

Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

2017-03-01

High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
Geomatic Methods for the Analysis of Data in the Earth Sciences: Lecture Notes in Earth Sciences, Vol. 95

NASA Astrophysics Data System (ADS)

Pavlis, Nikolaos K.

Geomatics is a trendy term that has been used in recent years to describe academic departments that teach and research theories, methods, algorithms, and practices used in processing and analyzing data related to the Earth and other planets. Naming trends aside, geomatics could be considered as the mathematical and statistical “toolbox” that allows Earth scientists to extract information about physically relevant parameters from the available data and accompany such information with some measure of its reliability. This book is an attempt to present the mathematical-statistical methods used in data analysis within various disciplines—geodesy, geophysics, photogrammetry and remote sensing—from a unifying perspective that inverse problem formalism permits. At the same time, it allows us to stretch the relevance of statistical methods in achieving an optimal solution.
STATWIZ - AN ELECTRONIC STATISTICAL TOOL (ABSTRACT)

EPA Science Inventory

StatWiz is a web-based, interactive, and dynamic statistical tool for researchers. It will allow researchers to input information and/or data and then receive experimental design options, or outputs from data analysis. StatWiz is envisioned as an expert system that will walk rese...

Hybrid statistics-simulations based method for atom-counting from ADF STEM images.

PubMed

De Wael, Annelies; De Backer, Annick; Jones, Lewys; Nellist, Peter D; Van Aert, Sandra

2017-06-01

A hybrid statistics-simulations based method for atom-counting from annular dark field scanning transmission electron microscopy (ADF STEM) images of monotype crystalline nanostructures is presented. Different atom-counting methods already exist for model-like systems. However, the increasing relevance of radiation damage in the study of nanostructures demands a method that allows atom-counting from low dose images with a low signal-to-noise ratio. Therefore, the hybrid method directly includes prior knowledge from image simulations into the existing statistics-based method for atom-counting, and accounts in this manner for possible discrepancies between actual and simulated experimental conditions. It is shown by means of simulations and experiments that this hybrid method outperforms the statistics-based method, especially for low electron doses and small nanoparticles. The analysis of a simulated low dose image of a small nanoparticle suggests that this method allows for far more reliable quantitative analysis of beam-sensitive materials. Copyright © 2017 Elsevier B.V. All rights reserved.
Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation.

PubMed

Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don

2015-10-01

Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Method for data analysis in different institutions: example of image guidance of prostate cancer patients.

PubMed

Piotrowski, T; Rodrigues, G; Bajon, T; Yartsev, S

2014-03-01

Multi-institutional collaborations allow for more information to be analyzed but the data from different sources may vary in the subgroup sizes and/or conditions of measuring. Rigorous statistical analysis is required for pooling the data in a larger set. Careful comparison of all the components of the data acquisition is indispensable: identical conditions allow for enlargement of the database with improved statistical analysis, clearly defined differences provide opportunity for establishing a better practice. The optimal sequence of required normality, asymptotic normality, and independence tests is proposed. An example of analysis of six subgroups of position corrections in three directions obtained during image guidance procedures for 216 prostate cancer patients from two institutions is presented. Copyright © 2013 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Effect of multiple spin species on spherical shell neutron transmission analysis

NASA Technical Reports Server (NTRS)

Semler, T. T.

1972-01-01

A series of Monte Carlo calculations were performed in order to evaluate the effect of separated against merged spin statistics on the analysis of spherical shell neutron transmission experiments for gold. It is shown that the use of separated spin statistics results in larger average capture cross sections of gold at 24 KeV. This effect is explained by stronger windows in the total cross section caused by the interference between potential and J(+) resonances and by J(+) and J(-) resonance overlap allowed by the use of separated spin statistics.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Steed, Chad Allen

EDENx is a multivariate data visualization tool that allows interactive user driven analysis of large-scale data sets with high dimensionality. EDENx builds on our earlier system, called EDEN to enable analysis of more dimensions and larger scale data sets. EDENx provides an initial overview of summary statistics for each variable in the data set under investigation. EDENx allows the user to interact with graphical summary plots of the data to investigate subsets and their statistical associations. These plots include histograms, binned scatterplots, binned parallel coordinate plots, timeline plots, and graphical correlation indicators. From the EDENx interface, a user can selectmore » a subsample of interest and launch a more detailed data visualization via the EDEN system. EDENx is best suited for high-level, aggregate analysis tasks while EDEN is more appropriate for detail data investigations.« less
Obscure phenomena in statistical analysis of quantitative structure-activity relationships. Part 1: Multicollinearity of physicochemical descriptors.

PubMed

Mager, P P; Rothe, H

1990-10-01

Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.
Baseline estimation in flame's spectra by using neural networks and robust statistics

NASA Astrophysics Data System (ADS)

Garces, Hugo; Arias, Luis; Rojas, Alejandro

2014-09-01

This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.
[Statistical analysis using freely-available "EZR (Easy R)" software].

PubMed

Kanda, Yoshinobu

2015-10-01

Clinicians must often perform statistical analyses for purposes such evaluating preexisting evidence and designing or executing clinical studies. R is a free software environment for statistical computing. R supports many statistical analysis functions, but does not incorporate a statistical graphical user interface (GUI). The R commander provides an easy-to-use basic-statistics GUI for R. However, the statistical function of the R commander is limited, especially in the field of biostatistics. Therefore, the author added several important statistical functions to the R commander and named it "EZR (Easy R)", which is now being distributed on the following website: http://www.jichi.ac.jp/saitama-sct/. EZR allows the application of statistical functions that are frequently used in clinical studies, such as survival analyses, including competing risk analyses and the use of time-dependent covariates and so on, by point-and-click access. In addition, by saving the script automatically created by EZR, users can learn R script writing, maintain the traceability of the analysis, and assure that the statistical process is overseen by a supervisor.
SAP- FORTRAN STATIC SOURCE CODE ANALYZER PROGRAM (IBM VERSION)

NASA Technical Reports Server (NTRS)

Manteufel, R.

1994-01-01

The FORTRAN Static Source Code Analyzer program, SAP, was developed to automatically gather statistics on the occurrences of statements and structures within a FORTRAN program and to provide for the reporting of those statistics. Provisions have been made for weighting each statistic and to provide an overall figure of complexity. Statistics, as well as figures of complexity, are gathered on a module by module basis. Overall summed statistics are also accumulated for the complete input source file. SAP accepts as input syntactically correct FORTRAN source code written in the FORTRAN 77 standard language. In addition, code written using features in the following languages is also accepted: VAX-11 FORTRAN, IBM S/360 FORTRAN IV Level H Extended; and Structured FORTRAN. The SAP program utilizes two external files in its analysis procedure. A keyword file allows flexibility in classifying statements and in marking a statement as either executable or non-executable. A statistical weight file allows the user to assign weights to all output statistics, thus allowing the user flexibility in defining the figure of complexity. The SAP program is written in FORTRAN IV for batch execution and has been implemented on a DEC VAX series computer under VMS and on an IBM 370 series computer under MVS. The SAP program was developed in 1978 and last updated in 1985.
SAP- FORTRAN STATIC SOURCE CODE ANALYZER PROGRAM (DEC VAX VERSION)

NASA Technical Reports Server (NTRS)

Merwarth, P. D.

1994-01-01

The FORTRAN Static Source Code Analyzer program, SAP, was developed to automatically gather statistics on the occurrences of statements and structures within a FORTRAN program and to provide for the reporting of those statistics. Provisions have been made for weighting each statistic and to provide an overall figure of complexity. Statistics, as well as figures of complexity, are gathered on a module by module basis. Overall summed statistics are also accumulated for the complete input source file. SAP accepts as input syntactically correct FORTRAN source code written in the FORTRAN 77 standard language. In addition, code written using features in the following languages is also accepted: VAX-11 FORTRAN, IBM S/360 FORTRAN IV Level H Extended; and Structured FORTRAN. The SAP program utilizes two external files in its analysis procedure. A keyword file allows flexibility in classifying statements and in marking a statement as either executable or non-executable. A statistical weight file allows the user to assign weights to all output statistics, thus allowing the user flexibility in defining the figure of complexity. The SAP program is written in FORTRAN IV for batch execution and has been implemented on a DEC VAX series computer under VMS and on an IBM 370 series computer under MVS. The SAP program was developed in 1978 and last updated in 1985.
Measuring the Gas Constant "R": Propagation of Uncertainty and Statistics

ERIC Educational Resources Information Center

Olsen, Robert J.; Sattar, Simeen

2013-01-01

Determining the gas constant "R" by measuring the properties of hydrogen gas collected in a gas buret is well suited for comparing two approaches to uncertainty analysis using a single data set. The brevity of the experiment permits multiple determinations, allowing for statistical evaluation of the standard uncertainty u[subscript…
Bayesian Statistics in Educational Research: A Look at the Current State of Affairs

ERIC Educational Resources Information Center

König, Christoph; van de Schoot, Rens

2018-01-01

The ability of a scientific discipline to build cumulative knowledge depends on its predominant method of data analysis. A steady accumulation of knowledge requires approaches which allow researchers to consider results from comparable prior research. Bayesian statistics is especially relevant for establishing a cumulative scientific discipline,…
Statistical and Economic Techniques for Site-specific Nematode Management.

PubMed

Liu, Zheng; Griffin, Terry; Kirkpatrick, Terrence L

2014-03-01

Recent advances in precision agriculture technologies and spatial statistics allow realistic, site-specific estimation of nematode damage to field crops and provide a platform for the site-specific delivery of nematicides within individual fields. This paper reviews the spatial statistical techniques that model correlations among neighboring observations and develop a spatial economic analysis to determine the potential of site-specific nematicide application. The spatial econometric methodology applied in the context of site-specific crop yield response contributes to closing the gap between data analysis and realistic site-specific nematicide recommendations and helps to provide a practical method of site-specifically controlling nematodes.
Monte Carlo based statistical power analysis for mediation models: methods and software.

PubMed

Zhang, Zhiyong

2014-12-01

The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.
BCM: toolkit for Bayesian analysis of Computational Models using samplers.

PubMed

Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

2016-10-21

Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach.

PubMed

Verde, Pablo E

2010-12-30

In the last decades, the amount of published results on clinical diagnostic tests has expanded very rapidly. The counterpart to this development has been the formal evaluation and synthesis of diagnostic results. However, published results present substantial heterogeneity and they can be regarded as so far removed from the classical domain of meta-analysis, that they can provide a rather severe test of classical statistical methods. Recently, bivariate random effects meta-analytic methods, which model the pairs of sensitivities and specificities, have been presented from the classical point of view. In this work a bivariate Bayesian modeling approach is presented. This approach substantially extends the scope of classical bivariate methods by allowing the structural distribution of the random effects to depend on multiple sources of variability. Meta-analysis is summarized by the predictive posterior distributions for sensitivity and specificity. This new approach allows, also, to perform substantial model checking, model diagnostic and model selection. Statistical computations are implemented in the public domain statistical software (WinBUGS and R) and illustrated with real data examples. Copyright © 2010 John Wiley & Sons, Ltd.
Advances in Statistical Methods for Substance Abuse Prevention Research

PubMed Central

MacKinnon, David P.; Lockwood, Chondra M.

2010-01-01

The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467
DnaSAM: Software to perform neutrality testing for large datasets with complex null models.

PubMed

Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B

2010-05-01

Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.
MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing.

PubMed

Zackay, Arie; Steinhoff, Christine

2010-12-15

Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.
MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing

PubMed Central

2010-01-01

Background Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. Findings MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. Conclusions The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org. PMID:21159174

Ripening-dependent metabolic changes in the volatiles of pineapple (Ananas comosus (L.) Merr.) fruit: II. Multivariate statistical profiling of pineapple aroma compounds based on comprehensive two-dimensional gas chromatography-mass spectrometry.

PubMed

Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg

2015-03-01

Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.
A statistical model of operational impacts on the framework of the bridge crane

NASA Astrophysics Data System (ADS)

Antsev, V. Yu; Tolokonnikov, A. S.; Gorynin, A. D.; Reutov, A. A.

2017-02-01

The technical regulations of the Customs Union demands implementation of the risk analysis of the bridge cranes operation at their design stage. The statistical model has been developed for performance of random calculations of risks, allowing us to model possible operational influences on the bridge crane metal structure in their various combination. The statistical model is practically actualized in the software product automated calculation of risks of failure occurrence of bridge cranes.
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

NASA Technical Reports Server (NTRS)

Wharton, S. W.; Turner, B. J.

1981-01-01

An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
Point-by-point compositional analysis for atom probe tomography.

PubMed

Stephenson, Leigh T; Ceguerra, Anna V; Li, Tong; Rojhirunsakool, Tanaporn; Nag, Soumya; Banerjee, Rajarshi; Cairney, Julie M; Ringer, Simon P

2014-01-01

This new alternate approach to data processing for analyses that traditionally employed grid-based counting methods is necessary because it removes a user-imposed coordinate system that not only limits an analysis but also may introduce errors. We have modified the widely used "binomial" analysis for APT data by replacing grid-based counting with coordinate-independent nearest neighbour identification, improving the measurements and the statistics obtained, allowing quantitative analysis of smaller datasets, and datasets from non-dilute solid solutions. It also allows better visualisation of compositional fluctuations in the data. Our modifications include:.•using spherical k-atom blocks identified by each detected atom's first k nearest neighbours.•3D data visualisation of block composition and nearest neighbour anisotropy.•using z-statistics to directly compare experimental and expected composition curves. Similar modifications may be made to other grid-based counting analyses (contingency table, Langer-Bar-on-Miller, sinusoidal model) and could be instrumental in developing novel data visualisation options.
Wavelet Statistical Analysis of Low-Latitude Geomagnetic Measurements

NASA Astrophysics Data System (ADS)

Papa, A. R.; Akel, A. F.

2009-05-01

Following previous works by our group (Papa et al., JASTP, 2006), where we analyzed a series of records acquired at the Vassouras National Geomagnetic Observatory in Brazil for the month of October 2000, we introduced a wavelet analysis for the same type of data and for other periods. It is well known that wavelets allow a more detailed study in several senses: the time window for analysis can be drastically reduced if compared to other traditional methods (Fourier, for example) and at the same time allow an almost continuous accompaniment of both amplitude and frequency of signals as time goes by. This advantage brings some possibilities for potentially useful forecasting methods of the type also advanced by our group in previous works (see for example, Papa and Sosman, JASTP, 2008). However, the simultaneous statistical analysis of both time series (in our case amplitude and frequency) is a challenging matter and is in this sense that we have found what we consider our main goal. Some possible trends for future works are advanced.
Some Interesting Applications of Probabilistic Techiques in Structural Dynamic Analysis of Rocket Engines

NASA Technical Reports Server (NTRS)

Brown, Andrew M.

2014-01-01

Numerical and Analytical methods developed to determine damage accumulation in specific engine components when speed variation included. Dither Life Ratio shown to be well over factor of 2 for specific example. Steady-State assumption shown to be accurate for most turbopump cases, allowing rapid calculation of DLR. If hot-fire speed data unknown, Monte Carlo method developed that uses speed statistics for similar engines. Application of techniques allow analyst to reduce both uncertainty and excess conservatism. High values of DLR could allow previously unacceptable part to pass HCF criteria without redesign. Given benefit and ease of implementation, recommend that any finite life turbomachine component analysis adopt these techniques. Probability Values calculated, compared, and evaluated for several industry-proposed methods for combining random and harmonic loads. Two new excel macros written to calculate combined load for any specific probability level. Closed form Curve fits generated for widely used 3(sigma) and 2(sigma) probability levels. For design of lightweight aerospace components, obtaining accurate, reproducible, statistically meaningful answer critical.
All individuals are not created equal; accounting for interindividual variation in fitting life-history responses to toxicants.

PubMed

Jager, Tjalling

2013-02-05

The individuals of a species are not equal. These differences frustrate experimental biologists and ecotoxicologists who wish to study the response of a species (in general) to a treatment. In the analysis of data, differences between model predictions and observations on individual animals are usually treated as random measurement error around the true response. These deviations, however, are mainly caused by real differences between the individuals (e.g., differences in physiology and in initial conditions). Understanding these intraspecies differences, and accounting for them in the data analysis, will improve our understanding of the response to the treatment we are investigating and allow for a more powerful, less biased, statistical analysis. Here, I explore a basic scheme for statistical inference to estimate parameters governing stress that allows individuals to differ in their basic physiology. This scheme is illustrated using a simple toxicokinetic-toxicodynamic model and a data set for growth of the springtail Folsomia candida exposed to cadmium in food. This article should be seen as proof of concept; a first step in bringing more realism into the statistical inference for process-based models in ecotoxicology.
Event coincidence analysis for quantifying statistical interrelationships between event time series. On the role of flood events as triggers of epidemic outbreaks

NASA Astrophysics Data System (ADS)

Donges, J. F.; Schleussner, C.-F.; Siegmund, J. F.; Donner, R. V.

2016-05-01

Studying event time series is a powerful approach for analyzing the dynamics of complex dynamical systems in many fields of science. In this paper, we describe the method of event coincidence analysis to provide a framework for quantifying the strength, directionality and time lag of statistical interrelationships between event series. Event coincidence analysis allows to formulate and test null hypotheses on the origin of the observed interrelationships including tests based on Poisson processes or, more generally, stochastic point processes with a prescribed inter-event time distribution and other higher-order properties. Applying the framework to country-level observational data yields evidence that flood events have acted as triggers of epidemic outbreaks globally since the 1950s. Facing projected future changes in the statistics of climatic extreme events, statistical techniques such as event coincidence analysis will be relevant for investigating the impacts of anthropogenic climate change on human societies and ecosystems worldwide.
A Matlab user interface for the statistically assisted fluid registration algorithm and tensor-based morphometry

NASA Astrophysics Data System (ADS)

Yepes-Calderon, Fernando; Brun, Caroline; Sant, Nishita; Thompson, Paul; Lepore, Natasha

2015-01-01

Tensor-Based Morphometry (TBM) is an increasingly popular method for group analysis of brain MRI data. The main steps in the analysis consist of a nonlinear registration to align each individual scan to a common space, and a subsequent statistical analysis to determine morphometric differences, or difference in fiber structure between groups. Recently, we implemented the Statistically-Assisted Fluid Registration Algorithm or SAFIRA,1 which is designed for tracking morphometric differences among populations. To this end, SAFIRA allows the inclusion of statistical priors extracted from the populations being studied as regularizers in the registration. This flexibility and degree of sophistication limit the tool to expert use, even more so considering that SAFIRA was initially implemented in command line mode. Here, we introduce a new, intuitive, easy to use, Matlab-based graphical user interface for SAFIRA's multivariate TBM. The interface also generates different choices for the TBM statistics, including both the traditional univariate statistics on the Jacobian matrix, and comparison of the full deformation tensors.2 This software will be freely disseminated to the neuroimaging research community.
Robustness of Type I Error and Power in Set Correlation Analysis of Contingency Tables.

ERIC Educational Resources Information Center

Cohen, Jacob; Nee, John C. M.

1990-01-01

The analysis of contingency tables via set correlation allows the assessment of subhypotheses involving contrast functions of the categories of the nominal scales. The robustness of such methods with regard to Type I error and statistical power was studied via a Monte Carlo experiment. (TJH)
Statistical analysis plan for the family-led rehabilitation after stroke in India (ATTEND) trial: A multicenter randomized controlled trial of a new model of stroke rehabilitation compared to usual care.

PubMed

Billot, Laurent; Lindley, Richard I; Harvey, Lisa A; Maulik, Pallab K; Hackett, Maree L; Murthy, Gudlavalleti Vs; Anderson, Craig S; Shamanna, Bindiganavale R; Jan, Stephen; Walker, Marion; Forster, Anne; Langhorne, Peter; Verma, Shweta J; Felix, Cynthia; Alim, Mohammed; Gandhi, Dorcas Bc; Pandian, Jeyaraj Durai

2017-02-01

Background In low- and middle-income countries, few patients receive organized rehabilitation after stroke, yet the burden of chronic diseases such as stroke is increasing in these countries. Affordable models of effective rehabilitation could have a major impact. The ATTEND trial is evaluating a family-led caregiver delivered rehabilitation program after stroke. Objective To publish the detailed statistical analysis plan for the ATTEND trial prior to trial unblinding. Methods Based upon the published registration and protocol, the blinded steering committee and management team, led by the trial statistician, have developed a statistical analysis plan. The plan has been informed by the chosen outcome measures, the data collection forms and knowledge of key baseline data. Results The resulting statistical analysis plan is consistent with best practice and will allow open and transparent reporting. Conclusions Publication of the trial statistical analysis plan reduces potential bias in trial reporting, and clearly outlines pre-specified analyses. Clinical Trial Registrations India CTRI/2013/04/003557; Australian New Zealand Clinical Trials Registry ACTRN1261000078752; Universal Trial Number U1111-1138-6707.
Advanced functional network analysis in the geosciences: The pyunicorn package

NASA Astrophysics Data System (ADS)

Donges, Jonathan F.; Heitzig, Jobst; Runge, Jakob; Schultz, Hanna C. H.; Wiedermann, Marc; Zech, Alraune; Feldhoff, Jan; Rheinwalt, Aljoscha; Kutza, Hannes; Radebach, Alexander; Marwan, Norbert; Kurths, Jürgen

2013-04-01

Functional networks are a powerful tool for analyzing large geoscientific datasets such as global fields of climate time series originating from observations or model simulations. pyunicorn (pythonic unified complex network and recurrence analysis toolbox) is an open-source, fully object-oriented and easily parallelizable package written in the language Python. It allows for constructing functional networks (aka climate networks) representing the structure of statistical interrelationships in large datasets and, subsequently, investigating this structure using advanced methods of complex network theory such as measures for networks of interacting networks, node-weighted statistics or network surrogates. Additionally, pyunicorn allows to study the complex dynamics of geoscientific systems as recorded by time series by means of recurrence networks and visibility graphs. The range of possible applications of the package is outlined drawing on several examples from climatology.
Nonlinear estimation of parameters in biphasic Arrhenius plots.

PubMed

Puterman, M L; Hrboticky, N; Innis, S M

1988-05-01

This paper presents a formal procedure for the statistical analysis of data on the thermotropic behavior of membrane-bound enzymes generated using the Arrhenius equation and compares the analysis to several alternatives. Data is modeled by a bent hyperbola. Nonlinear regression is used to obtain estimates and standard errors of the intersection of line segments, defined as the transition temperature, and slopes, defined as energies of activation of the enzyme reaction. The methodology allows formal tests of the adequacy of a biphasic model rather than either a single straight line or a curvilinear model. Examples on data concerning the thermotropic behavior of pig brain synaptosomal acetylcholinesterase are given. The data support the biphasic temperature dependence of this enzyme. The methodology represents a formal procedure for statistical validation of any biphasic data and allows for calculation of all line parameters with estimates of precision.
Implementing a Web-Based Decision Support System to Spatially and Statistically Analyze Ecological Conditions of the Sierra Nevada

NASA Astrophysics Data System (ADS)

Nguyen, A.; Mueller, C.; Brooks, A. N.; Kislik, E. A.; Baney, O. N.; Ramirez, C.; Schmidt, C.; Torres-Perez, J. L.

2014-12-01

The Sierra Nevada is experiencing changes in hydrologic regimes, such as decreases in snowmelt and peak runoff, which affect forest health and the availability of water resources. Currently, the USDA Forest Service Region 5 is undergoing Forest Plan revisions to include climate change impacts into mitigation and adaptation strategies. However, there are few processes in place to conduct quantitative assessments of forest conditions in relation to mountain hydrology, while easily and effectively delivering that information to forest managers. To assist the USDA Forest Service, this study is the final phase of a three-term project to create a Decision Support System (DSS) to allow ease of access to historical and forecasted hydrologic, climatic, and terrestrial conditions for the entire Sierra Nevada. This data is featured within three components of the DSS: the Mapping Viewer, Statistical Analysis Portal, and Geospatial Data Gateway. Utilizing ArcGIS Online, the Sierra DSS Mapping Viewer enables users to visually analyze and locate areas of interest. Once the areas of interest are targeted, the Statistical Analysis Portal provides subbasin level statistics for each variable over time by utilizing a recently developed web-based data analysis and visualization tool called Plotly. This tool allows users to generate graphs and conduct statistical analyses for the Sierra Nevada without the need to download the dataset of interest. For more comprehensive analysis, users are also able to download datasets via the Geospatial Data Gateway. The third phase of this project focused on Python-based data processing, the adaptation of the multiple capabilities of ArcGIS Online and Plotly, and the integration of the three Sierra DSS components within a website designed specifically for the USDA Forest Service.
SimHap GUI: an intuitive graphical user interface for genetic association analysis.

PubMed

Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

2008-12-25

Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis.
Method and system of Jones-matrix mapping of blood plasma films with "fuzzy" analysis in differentiation of breast pathology changes

NASA Astrophysics Data System (ADS)

Zabolotna, Natalia I.; Radchenko, Kostiantyn O.; Karas, Oleksandr V.

2018-01-01

A fibroadenoma diagnosing of breast using statistical analysis (determination and analysis of statistical moments of the 1st-4th order) of the obtained polarization images of Jones matrix imaginary elements of the optically thin (attenuation coefficient τ <= 0,1 ) blood plasma films with further intellectual differentiation based on the method of "fuzzy" logic and discriminant analysis were proposed. The accuracy of the intellectual differentiation of blood plasma samples to the "norm" and "fibroadenoma" of breast was 82.7% by the method of linear discriminant analysis, and by the "fuzzy" logic method is 95.3%. The obtained results allow to confirm the potentially high level of reliability of the method of differentiation by "fuzzy" analysis.
Statistical analysis for understanding and predicting battery degradations in real-life electric vehicle use

NASA Astrophysics Data System (ADS)

Barré, Anthony; Suard, Frédéric; Gérard, Mathias; Montaru, Maxime; Riu, Delphine

2014-01-01

This paper describes the statistical analysis of recorded data parameters of electrical battery ageing during electric vehicle use. These data permit traditional battery ageing investigation based on the evolution of the capacity fade and resistance raise. The measured variables are examined in order to explain the correlation between battery ageing and operating conditions during experiments. Such study enables us to identify the main ageing factors. Then, detailed statistical dependency explorations present the responsible factors on battery ageing phenomena. Predictive battery ageing models are built from this approach. Thereby results demonstrate and quantify a relationship between variables and battery ageing global observations, and also allow accurate battery ageing diagnosis through predictive models.
Genome Expression Pathway Analysis Tool – Analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context

PubMed Central

Weniger, Markus; Engelmann, Julia C; Schultz, Jörg

2007-01-01

Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at . PMID:17543125
Two-Bin Kanban: Ordering Impact at Navy Medical Center San Diego

DTIC Science & Technology

2016-06-17

pretest (2013 data set) and posttest (2015 data set) analysis to avoid having the findings influenced by price changes. DMLSS does not track shipping...statistics based on those observations (Kabacoff, 2011, p. 112). Replacing the groups of observations with summary statistics allows the analyst...listed on the Acquisition Research Program website (www.acquisitionresearch.net). Acquisition Research Program Graduate School of Business & Public
Assessment of Reliable Change Using 95% Credible Intervals for the Differences in Proportions: A Statistical Analysis for Case-Study Methodology

ERIC Educational Resources Information Center

Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally

2015-01-01

Purpose: Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable…

A critique of Rasch residual fit statistics.

PubMed

Karabatsos, G

2000-01-01

In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.
Data analysis of the benefits of an electronic registry of information in a neonatal intensive care unit in Greece.

PubMed

Skouroliakou, Maria; Soloupis, George; Gounaris, Antonis; Charitou, Antonia; Papasarantopoulos, Petros; Markantonis, Sophia L; Golna, Christina; Souliotis, Kyriakos

2008-07-28

This study assesses the results of implementation of a software program that allows for input of admission/discharge summary data (including cost) in a neonatal intensive care unit (NICU) in Greece, based on the establishment of a baseline statistical database for infants treated in a NICU and the statistical analysis of epidemiological and resource utilization data thus collected. A software tool was designed, developed, and implemented between April 2004 and March 2005 in the NICU of the LITO private maternity hospital in Athens, Greece, to allow for the first time for step-by-step collection and management of summary treatment data. Data collected over this period were subsequently analyzed using defined indicators as a basis to extract results related to treatment options, treatment duration, and relative resource utilization. Data for 499 babies were entered in the tool and processed. Information on medical costs (e.g., mean total cost +/- SD of treatment was euro310.44 +/- 249.17 and euro6704.27 +/- 4079.53 for babies weighing more than 2500 g and 1000-1500 g respectively), incidence of complications or disease (e.g., 4.3 percent and 14.3 percent of study babies weighing 1,000 to 1,500 g suffered from cerebral bleeding [grade I] and bronchopulmonary dysplasia, respectively, while overall 6.0 percent had microbial infections), and medical statistics (e.g., perinatal mortality was 6.8 percent) was obtained in a quick and robust manner. The software tool allowed for collection and analysis of data traditionally maintained in paper medical records in the NICU with greater ease and accuracy. Data codification and analysis led to significant findings at the epidemiological, medical resource utilization, and respective hospital cost levels that allowed comparisons with literature findings for the first time in Greece. The tool thus contributed to a clearer understanding of treatment practices in the NICU and set the baseline for the assessment of the impact of future interventions at the policy or hospital level.
Evaluation of Solid Rocket Motor Component Data Using a Commercially Available Statistical Software Package

NASA Technical Reports Server (NTRS)

Stefanski, Philip L.

2015-01-01

Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.
Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

NASA Astrophysics Data System (ADS)

Guadagnini, A.; Riva, M.; Dell'Oca, A.

2017-12-01

We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.
Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

ERIC Educational Resources Information Center

Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

2008-01-01

Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Analysis Tools (AT)

Treesearch

Larry J. Gangi

2006-01-01

The FIREMON Analysis Tools program is designed to let the user perform grouped or ungrouped summary calculations of single measurement plot data, or statistical comparisons of grouped or ungrouped plot data taken at different sampling periods. The program allows the user to create reports and graphs, save and print them, or cut and paste them into a word processor....
Multilevel Structural Equation Models for the Analysis of Comparative Data on Educational Performance

ERIC Educational Resources Information Center

Goldstein, Harvey; Bonnet, Gerard; Rocher, Thierry

2007-01-01

The Programme for International Student Assessment comparative study of reading performance among 15-year-olds is reanalyzed using statistical procedures that allow the full complexity of the data structures to be explored. The article extends existing multilevel factor analysis and structural equation models and shows how this can extract richer…
Validating an Air Traffic Management Concept of Operation Using Statistical Modeling

NASA Technical Reports Server (NTRS)

He, Yuning; Davies, Misty Dawn

2013-01-01

Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis
A new u-statistic with superior design sensitivity in matched observational studies.

PubMed

Rosenbaum, Paul R

2011-09-01

In an observational or nonrandomized study of treatment effects, a sensitivity analysis indicates the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naïve analysis that presumes adjustments for observed covariates suffice to remove all bias. The power of sensitivity analysis is the probability that it will reject a false hypothesis about treatment effects allowing for a departure from random assignment of a specified magnitude; in particular, if this specified magnitude is "no departure" then this is the same as the power of a randomization test in a randomized experiment. A new family of u-statistics is proposed that includes Wilcoxon's signed rank statistic but also includes other statistics with substantially higher power when a sensitivity analysis is performed in an observational study. Wilcoxon's statistic has high power to detect small effects in large randomized experiments-that is, it often has good Pitman efficiency-but small effects are invariably sensitive to small unobserved biases. Members of this family of u-statistics that emphasize medium to large effects can have substantially higher power in a sensitivity analysis. For example, in one situation with 250 pair differences that are Normal with expectation 1/2 and variance 1, the power of a sensitivity analysis that uses Wilcoxon's statistic is 0.08 while the power of another member of the family of u-statistics is 0.66. The topic is examined by performing a sensitivity analysis in three observational studies, using an asymptotic measure called the design sensitivity, and by simulating power in finite samples. The three examples are drawn from epidemiology, clinical medicine, and genetic toxicology. © 2010, The International Biometric Society.
Physics Teachers and Students: A Statistical and Historical Analysis of Women

NASA Astrophysics Data System (ADS)

Gregory, Amanda

2009-10-01

Historically, women have been denied an education comparable to that available to men. Since women have been allowed into institutions of higher learning, they have been studying and earning physics degrees. The aim of this poster is to discuss the statistical relationship between the number of women enrolled in university physics programs and the number of female physics faculty members. Special care has been given to examining the statistical data in the context of the social climate at the time that these women were teaching or pursuing their education.
Statistical Modeling for Radiation Hardness Assurance

NASA Technical Reports Server (NTRS)

Ladbury, Raymond L.

2014-01-01

We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.
Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis

PubMed Central

Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.

2006-01-01

In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709
Random-Effects Meta-Analysis of Time-to-Event Data Using the Expectation-Maximisation Algorithm and Shrinkage Estimators

ERIC Educational Resources Information Center

Simmonds, Mark C.; Higgins, Julian P. T.; Stewart, Lesley A.

2013-01-01

Meta-analysis of time-to-event data has proved difficult in the past because consistent summary statistics often cannot be extracted from published results. The use of individual patient data allows for the re-analysis of each study in a consistent fashion and thus makes meta-analysis of time-to-event data feasible. Time-to-event data can be…
Measurement and statistical analysis of single-molecule current-voltage characteristics, transition voltage spectroscopy, and tunneling barrier height.

PubMed

Guo, Shaoyin; Hihath, Joshua; Díez-Pérez, Ismael; Tao, Nongjian

2011-11-30

We report on the measurement and statistical study of thousands of current-voltage characteristics and transition voltage spectra (TVS) of single-molecule junctions with different contact geometries that are rapidly acquired using a new break junction method at room temperature. This capability allows one to obtain current-voltage, conductance voltage, and transition voltage histograms, thus adding a new dimension to the previous conductance histogram analysis at a fixed low-bias voltage for single molecules. This method confirms the low-bias conductance values of alkanedithiols and biphenyldithiol reported in literature. However, at high biases the current shows large nonlinearity and asymmetry, and TVS allows for the determination of a critically important parameter, the tunneling barrier height or energy level alignment between the molecule and the electrodes of single-molecule junctions. The energy level alignment is found to depend on the molecule and also on the contact geometry, revealing the role of contact geometry in both the contact resistance and energy level alignment of a molecular junction. Detailed statistical analysis further reveals that, despite the dependence of the energy level alignment on contact geometry, the variation in single-molecule conductance is primarily due to contact resistance rather than variations in the energy level alignment.
Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

PubMed

Ramón, M; Martínez-Pastor, F

2018-04-23

Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.
Statistical process control: separating signal from noise in emergency department operations.

PubMed

Pimentel, Laura; Barrueto, Fermin

2015-05-01

Statistical process control (SPC) is a visually appealing and statistically rigorous methodology very suitable to the analysis of emergency department (ED) operations. We demonstrate that the control chart is the primary tool of SPC; it is constructed by plotting data measuring the key quality indicators of operational processes in rationally ordered subgroups such as units of time. Control limits are calculated using formulas reflecting the variation in the data points from one another and from the mean. SPC allows managers to determine whether operational processes are controlled and predictable. We review why the moving range chart is most appropriate for use in the complex ED milieu, how to apply SPC to ED operations, and how to determine when performance improvement is needed. SPC is an excellent tool for operational analysis and quality improvement for these reasons: 1) control charts make large data sets intuitively coherent by integrating statistical and visual descriptions; 2) SPC provides analysis of process stability and capability rather than simple comparison with a benchmark; 3) SPC allows distinction between special cause variation (signal), indicating an unstable process requiring action, and common cause variation (noise), reflecting a stable process; and 4) SPC keeps the focus of quality improvement on process rather than individual performance. Because data have no meaning apart from their context, and every process generates information that can be used to improve it, we contend that SPC should be seriously considered for driving quality improvement in emergency medicine. Copyright © 2015 Elsevier Inc. All rights reserved.
Assessing statistical differences between parameters estimates in Partial Least Squares path modeling.

PubMed

Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten

2018-01-01

Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.
Extracting chemical information from high-resolution Kβ X-ray emission spectroscopy

NASA Astrophysics Data System (ADS)

Limandri, S.; Robledo, J.; Tirao, G.

2018-06-01

High-resolution X-ray emission spectroscopy allows studying the chemical environment of a wide variety of materials. Chemical information can be obtained by fitting the X-ray spectra and observing the behavior of some spectral features. Spectral changes can also be quantified by means of statistical parameters calculated by considering the spectrum as a probability distribution. Another possibility is to perform statistical multivariate analysis, such as principal component analysis. In this work the performance of these procedures for extracting chemical information in X-ray emission spectroscopy spectra for mixtures of Mn2+ and Mn4+ oxides are studied. A detail analysis of the parameters obtained, as well as the associated uncertainties is shown. The methodologies are also applied for Mn oxidation state characterization of double perovskite oxides Ba1+xLa1-xMnSbO6 (with 0 ≤ x ≤ 0.7). The results show that statistical parameters and multivariate analysis are the most suitable for the analysis of this kind of spectra.
Big-Data RHEED analysis for understanding epitaxial film growth processes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P

Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in-situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED image, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the dataset are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of RHEED image sequence.more » This approach is illustrated for growth of LaxCa1-xMnO3 films grown on etched (001) SrTiO3 substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the assymetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.« less
SimHap GUI: An intuitive graphical user interface for genetic association analysis

PubMed Central

Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

2008-01-01

Background Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. Results We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. Conclusion SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis. PMID:19109877

SWATH2stats: An R/Bioconductor Package to Process and Convert Quantitative SWATH-MS Proteomics Data for Downstream Analysis Tools.

PubMed

Blattmann, Peter; Heusel, Moritz; Aebersold, Ruedi

2016-01-01

SWATH-MS is an acquisition and analysis technique of targeted proteomics that enables measuring several thousand proteins with high reproducibility and accuracy across many samples. OpenSWATH is popular open-source software for peptide identification and quantification from SWATH-MS data. For downstream statistical and quantitative analysis there exist different tools such as MSstats, mapDIA and aLFQ. However, the transfer of data from OpenSWATH to the downstream statistical tools is currently technically challenging. Here we introduce the R/Bioconductor package SWATH2stats, which allows convenient processing of the data into a format directly readable by the downstream analysis tools. In addition, SWATH2stats allows annotation, analyzing the variation and the reproducibility of the measurements, FDR estimation, and advanced filtering before submitting the processed data to downstream tools. These functionalities are important to quickly analyze the quality of the SWATH-MS data. Hence, SWATH2stats is a new open-source tool that summarizes several practical functionalities for analyzing, processing, and converting SWATH-MS data and thus facilitates the efficient analysis of large-scale SWATH/DIA datasets.
An Exploratory Data Analysis System for Support in Medical Decision-Making

PubMed Central

Copeland, J. A.; Hamel, B.; Bourne, J. R.

1979-01-01

An experimental system was developed to allow retrieval and analysis of data collected during a study of neurobehavioral correlates of renal disease. After retrieving data organized in a relational data base, simple bivariate statistics of parametric and nonparametric nature could be conducted. An “exploratory” mode in which the system provided guidance in selection of appropriate statistical analyses was also available to the user. The system traversed a decision tree using the inherent qualities of the data (e.g., the identity and number of patients, tests, and time epochs) to search for the appropriate analyses to employ.
[Character of refractive errors in population study performed by the Area Military Medical Commission in Lodz].

PubMed

Nowak, Michał S; Goś, Roman; Smigielski, Janusz

2008-01-01

To determine the prevalence of refractive errors in population. A retrospective review of medical examinations for entry to the military service from The Area Military Medical Commission in Lodz. Ophthalmic examinations were performed. We used statistic analysis to review the results. Statistic analysis revealed that refractive errors occurred in 21.68% of the population. The most commen refractive error was myopia. 1) The most commen ocular diseases are refractive errors, especially myopia (21.68% in total). 2) Refractive surgery and contact lenses should be allowed as the possible correction of refractive errors for military service.
Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates.

PubMed

Quartagno, M; Carpenter, J R

2016-07-30

Recently, multiple imputation has been proposed as a tool for individual patient data meta-analysis with sporadically missing observations, and it has been suggested that within-study imputation is usually preferable. However, such within study imputation cannot handle variables that are completely missing within studies. Further, if some of the contributing studies are relatively small, it may be appropriate to share information across studies when imputing. In this paper, we develop and evaluate a joint modelling approach to multiple imputation of individual patient data in meta-analysis, with an across-study probability distribution for the study specific covariance matrices. This retains the flexibility to allow for between-study heterogeneity when imputing while allowing (i) sharing information on the covariance matrix across studies when this is appropriate, and (ii) imputing variables that are wholly missing from studies. Simulation results show both equivalent performance to the within-study imputation approach where this is valid, and good results in more general, practically relevant, scenarios with studies of very different sizes, non-negligible between-study heterogeneity and wholly missing variables. We illustrate our approach using data from an individual patient data meta-analysis of hypertension trials. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
pyblocxs: Bayesian Low-Counts X-ray Spectral Analysis in Sherpa

NASA Astrophysics Data System (ADS)

Siemiginowska, A.; Kashyap, V.; Refsdal, B.; van Dyk, D.; Connors, A.; Park, T.

2011-07-01

Typical X-ray spectra have low counts and should be modeled using the Poisson distribution. However, χ2 statistic is often applied as an alternative and the data are assumed to follow the Gaussian distribution. A variety of weights to the statistic or a binning of the data is performed to overcome the low counts issues. However, such modifications introduce biases or/and a loss of information. Standard modeling packages such as XSPEC and Sherpa provide the Poisson likelihood and allow computation of rudimentary MCMC chains, but so far do not allow for setting a full Bayesian model. We have implemented a sophisticated Bayesian MCMC-based algorithm to carry out spectral fitting of low counts sources in the Sherpa environment. The code is a Python extension to Sherpa and allows to fit a predefined Sherpa model to high-energy X-ray spectral data and other generic data. We present the algorithm and discuss several issues related to the implementation, including flexible definition of priors and allowing for variations in the calibration information.
Probabilistic Modeling and Visualization of the Flexibility in Morphable Models

NASA Astrophysics Data System (ADS)

Lüthi, M.; Albrecht, T.; Vetter, T.

Statistical shape models, and in particular morphable models, have gained widespread use in computer vision, computer graphics and medical imaging. Researchers have started to build models of almost any anatomical structure in the human body. While these models provide a useful prior for many image analysis task, relatively little information about the shape represented by the morphable model is exploited. We propose a method for computing and visualizing the remaining flexibility, when a part of the shape is fixed. Our method, which is based on Probabilistic PCA, not only leads to an approach for reconstructing the full shape from partial information, but also allows us to investigate and visualize the uncertainty of a reconstruction. To show the feasibility of our approach we performed experiments on a statistical model of the human face and the femur bone. The visualization of the remaining flexibility allows for greater insight into the statistical properties of the shape.
Protocol Analysis of Man-Computer Languages: Design and Preliminary Findings

DTIC Science & Technology

1975-07-01

describes a statistical moael o propot of the exercise. It is one particular model used for analysis of variance which allows us to test the significance of... Body : Congressman Blake will be visiting Camp Smith to confer with J6, J612, and Col. Smith with regard to operation of the pilot project on
Using Multi-Group Confirmatory Factor Analysis to Evaluate Cross-Cultural Research: Identifying and Understanding Non-Invariance

ERIC Educational Resources Information Center

Brown, Gavin T. L.; Harris, Lois R.; O'Quin, Chrissie; Lane, Kenneth E.

2017-01-01

Multi-group confirmatory factor analysis (MGCFA) allows researchers to determine whether a research inventory elicits similar response patterns across samples. If statistical equivalence in responding is found, then scale score comparisons become possible and samples can be said to be from the same population. This paper illustrates the use of…
Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros

PubMed Central

Valeri, Linda; VanderWeele, Tyler J.

2012-01-01

Mediation analysis is a useful and widely employed approach to studies in the field of psychology and in the social and biomedical sciences. The contributions of this paper are several-fold. First we seek to bring the developments in mediation analysis for non linear models within the counterfactual framework to the psychology audience in an accessible format and compare the sorts of inferences about mediation that are possible in the presence of exposure-mediator interaction when using a counterfactual versus the standard statistical approach. Second, the work by VanderWeele and Vansteelandt (2009, 2010) is extended here to allow for dichotomous mediators and count outcomes. Third, we provide SAS and SPSS macros to implement all of these mediation analysis techniques automatically and we compare the types of inferences about mediation that are allowed by a variety of software macros. PMID:23379553
Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros.

PubMed

Valeri, Linda; Vanderweele, Tyler J

2013-06-01

Mediation analysis is a useful and widely employed approach to studies in the field of psychology and in the social and biomedical sciences. The contributions of this article are several-fold. First we seek to bring the developments in mediation analysis for nonlinear models within the counterfactual framework to the psychology audience in an accessible format and compare the sorts of inferences about mediation that are possible in the presence of exposure-mediator interaction when using a counterfactual versus the standard statistical approach. Second, the work by VanderWeele and Vansteelandt (2009, 2010) is extended here to allow for dichotomous mediators and count outcomes. Third, we provide SAS and SPSS macros to implement all of these mediation analysis techniques automatically, and we compare the types of inferences about mediation that are allowed by a variety of software macros. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Uncertainty Analysis of Seebeck Coefficient and Electrical Resistivity Characterization

NASA Technical Reports Server (NTRS)

Mackey, Jon; Sehirlioglu, Alp; Dynys, Fred

2014-01-01

In order to provide a complete description of a materials thermoelectric power factor, in addition to the measured nominal value, an uncertainty interval is required. The uncertainty may contain sources of measurement error including systematic bias error and precision error of a statistical nature. The work focuses specifically on the popular ZEM-3 (Ulvac Technologies) measurement system, but the methods apply to any measurement system. The analysis accounts for sources of systematic error including sample preparation tolerance, measurement probe placement, thermocouple cold-finger effect, and measurement parameters; in addition to including uncertainty of a statistical nature. Complete uncertainty analysis of a measurement system allows for more reliable comparison of measurement data between laboratories.
Application of modified profile analysis to function testing of the motion/no-motion issue in an aircraft ground-handling simulation. [statistical analysis procedure for man machine systems flight simulation

NASA Technical Reports Server (NTRS)

Parrish, R. V.; Mckissick, B. T.; Steinmetz, G. G.

1979-01-01

A recent modification of the methodology of profile analysis, which allows the testing for differences between two functions as a whole with a single test, rather than point by point with multiple tests is discussed. The modification is applied to the examination of the issue of motion/no motion conditions as shown by the lateral deviation curve as a function of engine cut speed of a piloted 737-100 simulator. The results of this application are presented along with those of more conventional statistical test procedures on the same simulator data.
NIRS-SPM: statistical parametric mapping for near infrared spectroscopy

NASA Astrophysics Data System (ADS)

Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul

2008-02-01

Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.
PIVOT: platform for interactive analysis and visualization of transcriptomics data.

PubMed

Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong

2018-01-05

Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

PubMed

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-07-01

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

PubMed Central

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-01-01

Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689
Detecting subtle hydrochemical anomalies with multivariate statistics: an example from homogeneous groundwaters in the Great Artesian Basin, Australia

NASA Astrophysics Data System (ADS)

O'Shea, Bethany; Jankowski, Jerzy

2006-12-01

The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright
Comparative analysis of atmosphere temperature variability for Northern Eurasia based on the Reanalysis and in-situ observed data

NASA Astrophysics Data System (ADS)

Shulgina, T.; Genina, E.; Gordov, E.; Nikitchuk, K.

2009-04-01

At present numerous data archives which include meteorological observations as well as climate processes modeling data are available for Earth Science specialists. Methods of mathematical statistics are widely used for their processing and analysis. In many cases they represent the only way of quantitative assessment of the meteorological and climatic information. Unified set of analysis methods allows us to compare climatic characteristics calculated on the basis of different datasets with the purpose of performing more detailed analysis of climate dynamics for both regional and global levels. The report presents the results of comparative analysis of atmosphere temperature behavior for the Northern Eurasia territory for the period from 1979 to 2004 based on the NCEP/NCAR Reanalysis, NCEP/DOE Reanalysis AMIP II, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis data and observation data obtained from meteorological stations of the former Soviet Union. Statistical processing of atmosphere temperature data included analysis of time series homogeneity of climate indices approved by WMO, such as "Number of frost days", "Number of summer days", "Number of icing days", "Number of tropical nights", etc. by means of parametric methods of mathematical statistics (Fisher and Student tests). That allowed conducting comprehensive research of spatio-temporal features of the atmosphere temperature. Analysis of the atmosphere temperature dynamics revealed inhomogeneity of the data obtained for large observation intervals. Particularly, analysis performed for the period 1979 - 2004 showed the significant increase of the number of frost and icing days approximately by 1 day for every 2 years and decrease roughly by 1 day for 2 years for the number of summer days. Also it should be mentioned that the growth period mean temperature have increased by 1.5 - 2° C for the time period being considered. The usage of different Reanalysis datasets in conjunction with in-situ observed data allowed comparing of climate indices values calculated on the basis of different datasets that improves the reliability of the results obtained. Partial support of SB RAS Basic Research Program 4.5.2 (Project 2) is acknowledged.
Rare variant association analysis in case-parents studies by allowing for missing parental genotypes.

PubMed

Li, Yumei; Xiang, Yang; Xu, Chao; Shen, Hui; Deng, Hongwen

2018-01-15

The development of next-generation sequencing technologies has facilitated the identification of rare variants. Family-based design is commonly used to effectively control for population admixture and substructure, which is more prominent for rare variants. Case-parents studies, as typical strategies in family-based design, are widely used in rare variant-disease association analysis. Current methods in case-parents studies are based on complete case-parents data; however, parental genotypes may be missing in case-parents trios, and removing these data may lead to a loss in statistical power. The present study focuses on testing for rare variant-disease association in case-parents study by allowing for missing parental genotypes. In this report, we extended the collapsing method for rare variant association analysis in case-parents studies to allow for missing parental genotypes, and investigated the performance of two methods by using the difference of genotypes between affected offspring and their corresponding "complements" in case-parent trios and TDT framework. Using simulations, we showed that, compared with the methods just only using complete case-parents data, the proposed strategy allowing for missing parental genotypes, or even adding unrelated affected individuals, can greatly improve the statistical power and meanwhile is not affected by population stratification. We conclude that adding case-parents data with missing parental genotypes to complete case-parents data set can greatly improve the power of our strategy for rare variant-disease association.
Estimating individual benefits of medical or behavioral treatments in severely ill patients.

PubMed

Diaz, Francisco J

2017-01-01

There is a need for statistical methods appropriate for the analysis of clinical trials from a personalized-medicine viewpoint as opposed to the common statistical practice that simply examines average treatment effects. This article proposes an approach to quantifying, reporting and analyzing individual benefits of medical or behavioral treatments to severely ill patients with chronic conditions, using data from clinical trials. The approach is a new development of a published framework for measuring the severity of a chronic disease and the benefits treatments provide to individuals, which utilizes regression models with random coefficients. Here, a patient is considered to be severely ill if the patient's basal severity is close to one. This allows the derivation of a very flexible family of probability distributions of individual benefits that depend on treatment duration and the covariates included in the regression model. Our approach may enrich the statistical analysis of clinical trials of severely ill patients because it allows investigating the probability distribution of individual benefits in the patient population and the variables that influence it, and we can also measure the benefits achieved in specific patients including new patients. We illustrate our approach using data from a clinical trial of the anti-depressant imipramine.

Models of dyadic social interaction.

PubMed Central

Griffin, Dale; Gonzalez, Richard

2003-01-01

We discuss the logic of research designs for dyadic interaction and present statistical models with parameters that are tied to psychologically relevant constructs. Building on Karl Pearson's classic nineteenth-century statistical analysis of within-organism similarity, we describe several approaches to indexing dyadic interdependence and provide graphical methods for visualizing dyadic data. We also describe several statistical and conceptual solutions to the 'levels of analytic' problem in analysing dyadic data. These analytic strategies allow the researcher to examine and measure psychological questions of interdependence and social influence. We provide illustrative data from casually interacting and romantic dyads. PMID:12689382
Consequences of common data analysis inaccuracies in CNS trauma injury basic research.

PubMed

Burke, Darlene A; Whittemore, Scott R; Magnuson, David S K

2013-05-15

The development of successful treatments for humans after traumatic brain or spinal cord injuries (TBI and SCI, respectively) requires animal research. This effort can be hampered when promising experimental results cannot be replicated because of incorrect data analysis procedures. To identify and hopefully avoid these errors in future studies, the articles in seven journals with the highest number of basic science central nervous system TBI and SCI animal research studies published in 2010 (N=125 articles) were reviewed for their data analysis procedures. After identifying the most common statistical errors, the implications of those findings were demonstrated by reanalyzing previously published data from our laboratories using the identified inappropriate statistical procedures, then comparing the two sets of results. Overall, 70% of the articles contained at least one type of inappropriate statistical procedure. The highest percentage involved incorrect post hoc t-tests (56.4%), followed by inappropriate parametric statistics (analysis of variance and t-test; 37.6%). Repeated Measures analysis was inappropriately missing in 52.0% of all articles and, among those with behavioral assessments, 58% were analyzed incorrectly. Reanalysis of our published data using the most common inappropriate statistical procedures resulted in a 14.1% average increase in significant effects compared to the original results. Specifically, an increase of 15.5% occurred with Independent t-tests and 11.1% after incorrect post hoc t-tests. Utilizing proper statistical procedures can allow more-definitive conclusions, facilitate replicability of research results, and enable more accurate translation of those results to the clinic.
Analysis of BSRT Profiles in the LHC at Injection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fitterer, M.; Stancari, G.; Papadopoulou, S.

The beam synchrotron radiation telescope (BSRT) at the LHC allows to take profiles of the transverse beam distribution, which can provide useful additional insight in the evolution of the transverse beam distribution. A python class has been developed [1], which allows to read in the BSRT profiles, usually stored in binary format, run different analysis tools and generate plots of the statistical parameters and profiles as well as videos of the the profiles. The detailed analysis will be described in this note. The analysis is based on the data obtained at injection energy (450 GeV) during MD1217 [2] and MD1415more » [3] which will be also used as illustrative example. A similar approach is also taken with a MATLAB based analysis described in [4].« less
Multilevel Latent Class Analysis for Large-Scale Educational Assessment Data: Exploring the Relation between the Curriculum and Students' Mathematical Strategies

ERIC Educational Resources Information Center

Fagginger Auer, Marije F.; Hickendorff, Marian; Van Putten, Cornelis M.; Béguin, Anton A.; Heiser, Willem J.

2016-01-01

A first application of multilevel latent class analysis (MLCA) to educational large-scale assessment data is demonstrated. This statistical technique addresses several of the challenges that assessment data offers. Importantly, MLCA allows modeling of the often ignored teacher effects and of the joint influence of teacher and student variables.…
A Tutorial in Bayesian Potential Outcomes Mediation Analysis.

PubMed

Miočević, Milica; Gonzalez, Oscar; Valente, Matthew J; MacKinnon, David P

2018-01-01

Statistical mediation analysis is used to investigate intermediate variables in the relation between independent and dependent variables. Causal interpretation of mediation analyses is challenging because randomization of subjects to levels of the independent variable does not rule out the possibility of unmeasured confounders of the mediator to outcome relation. Furthermore, commonly used frequentist methods for mediation analysis compute the probability of the data given the null hypothesis, which is not the probability of a hypothesis given the data as in Bayesian analysis. Under certain assumptions, applying the potential outcomes framework to mediation analysis allows for the computation of causal effects, and statistical mediation in the Bayesian framework gives indirect effects probabilistic interpretations. This tutorial combines causal inference and Bayesian methods for mediation analysis so the indirect and direct effects have both causal and probabilistic interpretations. Steps in Bayesian causal mediation analysis are shown in the application to an empirical example.
Game-Related Statistics that Discriminated Winning, Drawing and Losing Teams from the Spanish Soccer League

PubMed Central

Lago-Peñas, Carlos; Lago-Ballesteros, Joaquín; Dellal, Alexandre; Gómez, Maite

2010-01-01

The aim of the present study was to analyze men’s football competitions, trying to identify which game-related statistics allow to discriminate winning, drawing and losing teams. The sample used corresponded to 380 games from the 2008-2009 season of the Spanish Men’s Professional League. The game-related statistics gathered were: total shots, shots on goal, effectiveness, assists, crosses, offsides commited and received, corners, ball possession, crosses against, fouls committed and received, corners against, yellow and red cards, and venue. An univariate (t-test) and multivariate (discriminant) analysis of data was done. The results showed that winning teams had averages that were significantly higher for the following game statistics: total shots (p < 0.001), shots on goal (p < 0.01), effectiveness (p < 0.01), assists (p < 0.01), offsides committed (p < 0.01) and crosses against (p < 0.01). Losing teams had significantly higher averages in the variable crosses (p < 0.01), offsides received (p < 0. 01) and red cards (p < 0.01). Discriminant analysis allowed to conclude the following: the variables that discriminate between winning, drawing and losing teams were the total shots, shots on goal, crosses, crosses against, ball possession and venue. Coaches and players should be aware for these different profiles in order to increase knowledge about game cognitive and motor solicitation and, therefore, to evaluate specificity at the time of practice and game planning. Key points This paper increases the knowledge about soccer match analysis. Give normative values to establish practice and match objectives. Give applications ideas to connect research with coaches’ practice. PMID:24149698
Creation of a virtual cutaneous tissue bank

NASA Astrophysics Data System (ADS)

LaFramboise, William A.; Shah, Sujal; Hoy, R. W.; Letbetter, D.; Petrosko, P.; Vennare, R.; Johnson, Peter C.

2000-04-01

Cellular and non-cellular constituents of skin contain fundamental morphometric features and structural patterns that correlate with tissue function. High resolution digital image acquisitions performed using an automated system and proprietary software to assemble adjacent images and create a contiguous, lossless, digital representation of individual microscope slide specimens. Serial extraction, evaluation and statistical analysis of cutaneous feature is performed utilizing an automated analysis system, to derive normal cutaneous parameters comprising essential structural skin components. Automated digital cutaneous analysis allows for fast extraction of microanatomic dat with accuracy approximating manual measurement. The process provides rapid assessment of feature both within individual specimens and across sample populations. The images, component data, and statistical analysis comprise a bioinformatics database to serve as an architectural blueprint for skin tissue engineering and as a diagnostic standard of comparison for pathologic specimens.
Differentiation of chocolates according to the cocoa's geographical origin using chemometrics.

PubMed

Cambrai, Amandine; Marcic, Christophe; Morville, Stéphane; Sae Houer, Pierre; Bindler, Françoise; Marchioni, Eric

2010-02-10

The determination of the geographical origin of cocoa used to produce chocolate has been assessed through the analysis of the volatile compounds of chocolate samples. The analysis of the volatile content and their statistical processing by multivariate analyses tended to form independent groups for both Africa and Madagascar, even if some of the chocolate samples analyzed appeared in a mixed zone together with those from America. This analysis also allowed a clear separation between Caribbean chocolates and those from other origins. Height compounds (such as linalool or (E,E)-2,4-decadienal) characteristic of chocolate's different geographical origins were also identified. The method described in this work (hydrodistillation, GC analysis, and statistic treatment) may improve the control of the geographical origin of chocolate during its long production process.
Phenotypic mapping of metabolic profiles using self-organizing maps of high-dimensional mass spectrometry data.

PubMed

Goodwin, Cody R; Sherrod, Stacy D; Marasco, Christina C; Bachmann, Brian O; Schramm-Sapyta, Nicole; Wikswo, John P; McLean, John A

2014-07-01

A metabolic system is composed of inherently interconnected metabolic precursors, intermediates, and products. The analysis of untargeted metabolomics data has conventionally been performed through the use of comparative statistics or multivariate statistical analysis-based approaches; however, each falls short in representing the related nature of metabolic perturbations. Herein, we describe a complementary method for the analysis of large metabolite inventories using a data-driven approach based upon a self-organizing map algorithm. This workflow allows for the unsupervised clustering, and subsequent prioritization of, correlated features through Gestalt comparisons of metabolic heat maps. We describe this methodology in detail, including a comparison to conventional metabolomics approaches, and demonstrate the application of this method to the analysis of the metabolic repercussions of prolonged cocaine exposure in rat sera profiles.
mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.

PubMed

Teo, Guoshou; Kim, Sinae; Tsou, Chih-Chiang; Collins, Ben; Gingras, Anne-Claude; Nesvizhskii, Alexey I; Choi, Hyungwon

2015-11-03

Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Big-data reflection high energy electron diffraction analysis for understanding epitaxial film growth processes.

PubMed

Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P; Kalinin, Sergei V

2014-10-28

Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED images, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the data set are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of a RHEED image sequence. This approach is illustrated for growth of La(x)Ca(1-x)MnO(3) films grown on etched (001) SrTiO(3) substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the asymmetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.
Evaluating Cellular Polyfunctionality with a Novel Polyfunctionality Index

PubMed Central

Larsen, Martin; Sauce, Delphine; Arnaud, Laurent; Fastenackels, Solène; Appay, Victor; Gorochov, Guy

2012-01-01

Functional evaluation of naturally occurring or vaccination-induced T cell responses in mice, men and monkeys has in recent years advanced from single-parameter (e.g. IFN-γ-secretion) to much more complex multidimensional measurements. Co-secretion of multiple functional molecules (such as cytokines and chemokines) at the single-cell level is now measurable due primarily to major advances in multiparametric flow cytometry. The very extensive and complex datasets generated by this technology raise the demand for proper analytical tools that enable the analysis of combinatorial functional properties of T cells, hence polyfunctionality. Presently, multidimensional functional measures are analysed either by evaluating all combinations of parameters individually or by summing frequencies of combinations that include the same number of simultaneous functions. Often these evaluations are visualized as pie charts. Whereas pie charts effectively represent and compare average polyfunctionality profiles of particular T cell subsets or patient groups, they do not document the degree or variation of polyfunctionality within a group nor does it allow more sophisticated statistical analysis. Here we propose a novel polyfunctionality index that numerically evaluates the degree and variation of polyfuntionality, and enable comparative and correlative parametric and non-parametric statistical tests. Moreover, it allows the usage of more advanced statistical approaches, such as cluster analysis. We believe that the polyfunctionality index will render polyfunctionality an appropriate end-point measure in future studies of T cell responsiveness. PMID:22860124
Statistical Learning Analysis in Neuroscience: Aiming for Transparency

PubMed Central

Hanke, Michael; Halchenko, Yaroslav O.; Haxby, James V.; Pollmann, Stefan

2009-01-01

Encouraged by a rise of reciprocal interest between the machine learning and neuroscience communities, several recent studies have demonstrated the explanatory power of statistical learning techniques for the analysis of neural data. In order to facilitate a wider adoption of these methods, neuroscientific research needs to ensure a maximum of transparency to allow for comprehensive evaluation of the employed procedures. We argue that such transparency requires “neuroscience-aware” technology for the performance of multivariate pattern analyses of neural data that can be documented in a comprehensive, yet comprehensible way. Recently, we introduced PyMVPA, a specialized Python framework for machine learning based data analysis that addresses this demand. Here, we review its features and applicability to various neural data modalities. PMID:20582270
Analysis of one gravitational slope cycle

NASA Astrophysics Data System (ADS)

Palis, Edouard; Lebourg, Thomas; Vidal, Maurin; Tric, Emmanuel

2015-04-01

Since about twenty years of studies on landslides, we realized the role and subtle interactions that existed between the structural complexity, masses dynamics and complex internal circulation of fluids. The La Clapière DSL (Deep-Seated Landslide) is now very well known by the scientific community (volume, impact, challenges, observations...), but this mass of knowledge, has not yet been compiled nor looked through a coupled analysis of its spatial and temporal variability. Since 2007, a will to share and access to uniform data was set up by the Versant Instabilities Multidisciplinary Observatory (OMIV, National Service of French Observation (SNO)). This observatory (with associated laboratories) allowed the installation of permanent and autonomous measuring stations: GPS, meteorology, seismology, water chemistry sources. For two years now, a permanent electrical tomography device is installed at the bottom of the slope to complement the current monitoring system, and allowed a deeper understanding of the physical changes in the massif. The analysis of these data allows to observe different dynamic regimes, as well as different responses to external factors: instantaneous, delayed, long-term variability. The purpose of this synthesis study is to analyze the temporal and spatial evolution of the electrical resistivity, displacement and hydrometeors for one year cycle (November 2012 to November 2013). Thus, a qualitative and statistical approach by clusters, principal component analysis (PCA), and temporal pseudo-3D of these variables was established. This new statistical study also explains the major role of the fault and the base of the landslide, as well as the chronology of the water flow in the massif, allowing a better understanding of the complex and uneven in time dynamic in this area.
Computer program documentation for the pasture/range condition assessment processor

NASA Technical Reports Server (NTRS)

Mcintyre, K. S.; Miller, T. G. (Principal Investigator)

1982-01-01

The processor which drives for the RANGE software allows the user to analyze LANDSAT data containing pasture and rangeland. Analysis includes mapping, generating statistics, calculating vegetative indexes, and plotting vegetative indexes. Routines for using the processor are given. A flow diagram is included.
Automated Tracking of Cell Migration with Rapid Data Analysis.

PubMed

DuChez, Brian J

2017-09-01

Cell migration is essential for many biological processes including development, wound healing, and metastasis. However, studying cell migration often requires the time-consuming and labor-intensive task of manually tracking cells. To accelerate the task of obtaining coordinate positions of migrating cells, we have developed a graphical user interface (GUI) capable of automating the tracking of fluorescently labeled nuclei. This GUI provides an intuitive user interface that makes automated tracking accessible to researchers with no image-processing experience or familiarity with particle-tracking approaches. Using this GUI, users can interactively determine a minimum of four parameters to identify fluorescently labeled cells and automate acquisition of cell trajectories. Additional features allow for batch processing of numerous time-lapse images, curation of unwanted tracks, and subsequent statistical analysis of tracked cells. Statistical outputs allow users to evaluate migratory phenotypes, including cell speed, distance, displacement, and persistence, as well as measures of directional movement, such as forward migration index (FMI) and angular displacement. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Rainfall Threshold Assessment Corresponding to the Maximum Allowable Turbidity for Source Water.

PubMed

Fan, Shu-Kai S; Kuan, Wen-Hui; Fan, Chihhao; Chen, Chiu-Yang

2016-12-01

This study aims to assess the upstream rainfall thresholds corresponding to the maximum allowable turbidity of source water, using monitoring data and artificial neural network computation. The Taipei Water Source Domain was selected as the study area, and the upstream rainfall records were collected for statistical analysis. Using analysis of variance (ANOVA), the cumulative rainfall records of one-day Ping-lin, two-day Ping-lin, two-day Tong-hou, one-day Guie-shan, and one-day Tai-ping (rainfall in the previous 24 or 48 hours at the named weather stations) were found to be the five most significant parameters for downstream turbidity development. An artificial neural network model was constructed to predict the downstream turbidity in the area investigated. The observed and model-calculated turbidity data were applied to assess the rainfall thresholds in the studied area. By setting preselected turbidity criteria, the upstream rainfall thresholds for these statistically determined rain gauge stations were calculated.
Time Series Expression Analyses Using RNA-seq: A Statistical Approach

PubMed Central

Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.

2013-01-01

RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021
Mediation analysis in nursing research: a methodological review.

PubMed

Liu, Jianghong; Ulrich, Connie

2016-12-01

Mediation statistical models help clarify the relationship between independent predictor variables and dependent outcomes of interest by assessing the impact of third variables. This type of statistical analysis is applicable for many clinical nursing research questions, yet its use within nursing remains low. Indeed, mediational analyses may help nurse researchers develop more effective and accurate prevention and treatment programs as well as help bridge the gap between scientific knowledge and clinical practice. In addition, this statistical approach allows nurse researchers to ask - and answer - more meaningful and nuanced questions that extend beyond merely determining whether an outcome occurs. Therefore, the goal of this paper is to provide a brief tutorial on the use of mediational analyses in clinical nursing research by briefly introducing the technique and, through selected empirical examples from the nursing literature, demonstrating its applicability in advancing nursing science.
Numerical 3D flow simulation of attached cavitation structures at ultrasonic horn tips and statistical evaluation of flow aggressiveness via load collectives

NASA Astrophysics Data System (ADS)

Mottyll, S.; Skoda, R.

2015-12-01

A compressible inviscid flow solver with barotropic cavitation model is applied to two different ultrasonic horn set-ups and compared to hydrophone, shadowgraphy as well as erosion test data. The statistical analysis of single collapse events in wall-adjacent flow regions allows the determination of the flow aggressiveness via load collectives (cumulative event rate vs collapse pressure), which show an exponential decrease in agreement to studies on hydrodynamic cavitation [1]. A post-processing projection of event rate and collapse pressure on a reference grid reduces the grid dependency significantly. In order to evaluate the erosion-sensitive areas a statistical analysis of transient wall loads is utilised. Predicted erosion sensitive areas as well as temporal pressure and vapour volume evolution are in good agreement to the experimental data.

Time series expression analyses using RNA-seq: a statistical approach.

PubMed

Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P

2013-01-01

RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Econometrics of joint production: another approach. [Petroleum refining and petrochemicals production

DOE Office of Scientific and Technical Information (OSTI.GOV)

Griffin, J.M.

1977-11-01

The pseudo data approach to the joint production of petroleum refining and chemicals is described as an alternative that avoids the multicollinearity of time series data and allows a complex technology to be characterized in a statistical price possibility frontier. Intended primarily for long-range analysis, the pseudo data method can be used as a source of elasticity estimate for policy analysis. 19 references.
Evaluation of adding item-response theory analysis for evaluation of the European Board of Ophthalmology Diploma examination.

PubMed

Mathysen, Danny G P; Aclimandos, Wagih; Roelant, Ella; Wouters, Kristien; Creuzot-Garcher, Catherine; Ringens, Peter J; Hawlina, Marko; Tassignon, Marie-José

2013-11-01

To investigate whether introduction of item-response theory (IRT) analysis, in parallel to the 'traditional' statistical analysis methods available for performance evaluation of multiple T/F items as used in the European Board of Ophthalmology Diploma (EBOD) examination, has proved beneficial, and secondly, to study whether the overall assessment performance of the current written part of EBOD is sufficiently high (KR-20≥ 0.90) to be kept as examination format in future EBOD editions. 'Traditional' analysis methods for individual MCQ item performance comprise P-statistics, Rit-statistics and item discrimination, while overall reliability is evaluated through KR-20 for multiple T/F items. The additional set of statistical analysis methods for the evaluation of EBOD comprises mainly IRT analysis. These analysis techniques are used to monitor whether the introduction of negative marking for incorrect answers (since EBOD 2010) has a positive influence on the statistical performance of EBOD as a whole and its individual test items in particular. Item-response theory analysis demonstrated that item performance parameters should not be evaluated individually, but should be related to one another. Before the introduction of negative marking, the overall EBOD reliability (KR-20) was good though with room for improvement (EBOD 2008: 0.81; EBOD 2009: 0.78). After the introduction of negative marking, the overall reliability of EBOD improved significantly (EBOD 2010: 0.92; EBOD 2011:0.91; EBOD 2012: 0.91). Although many statistical performance parameters are available to evaluate individual items, our study demonstrates that the overall reliability assessment remains the only crucial parameter to be evaluated allowing comparison. While individual item performance analysis is worthwhile to undertake as secondary analysis, drawing final conclusions seems to be more difficult. Performance parameters need to be related, as shown by IRT analysis. Therefore, IRT analysis has proved beneficial for the statistical analysis of EBOD. Introduction of negative marking has led to a significant increase in the reliability (KR-20 > 0.90), indicating that the current examination format can be kept for future EBOD examinations. © 2013 Acta Ophthalmologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
Disentangling methodological and biological sources of gene tree discordance on oryza (poaceae) chromosome 3

USDA-ARS?s Scientific Manuscript database

We describe new methods for characterizing gene tree discordance in phylogenomic datasets, which screen for deviations from neutral expectations, summarize variation in statistical support among gene trees, and allow comparison of the patterns of discordance induced by various analysis choices. Usin...
Modular reweighting software for statistical mechanical analysis of biased equilibrium data

NASA Astrophysics Data System (ADS)

Sindhikara, Daniel J.

2012-07-01

Here a simple, useful, modular approach and software suite designed for statistical reweighting and analysis of equilibrium ensembles is presented. Statistical reweighting is useful and sometimes necessary for analysis of equilibrium enhanced sampling methods, such as umbrella sampling or replica exchange, and also in experimental cases where biasing factors are explicitly known. Essentially, statistical reweighting allows extrapolation of data from one or more equilibrium ensembles to another. Here, the fundamental separable steps of statistical reweighting are broken up into modules - allowing for application to the general case and avoiding the black-box nature of some “all-inclusive” reweighting programs. Additionally, the programs included are, by-design, written with little dependencies. The compilers required are either pre-installed on most systems, or freely available for download with minimal trouble. Examples of the use of this suite applied to umbrella sampling and replica exchange molecular dynamics simulations will be shown along with advice on how to apply it in the general case. New version program summaryProgram title: Modular reweighting version 2 Catalogue identifier: AEJH_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJH_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 No. of lines in distributed program, including test data, etc.: 179 118 No. of bytes in distributed program, including test data, etc.: 8 518 178 Distribution format: tar.gz Programming language: C++, Python 2.6+, Perl 5+ Computer: Any Operating system: Any RAM: 50-500 MB Supplementary material: An updated version of the original manuscript (Comput. Phys. Commun. 182 (2011) 2227) is available Classification: 4.13 Catalogue identifier of previous version: AEJH_v1_0 Journal reference of previous version: Comput. Phys. Commun. 182 (2011) 2227 Does the new version supersede the previous version?: Yes Nature of problem: While equilibrium reweighting is ubiquitous, there are no public programs available to perform the reweighting in the general case. Further, specific programs often suffer from many library dependencies and numerical instability. Solution method: This package is written in a modular format that allows for easy applicability of reweighting in the general case. Modules are small, numerically stable, and require minimal libraries. Reasons for new version: Some minor bugs, some upgrades needed, error analysis added. analyzeweight.py/analyzeweight.py2 has been replaced by “multihist.py”. This new program performs all the functions of its predecessor while being versatile enough to handle other types of histograms and probability analysis. “bootstrap.py” was added. This script performs basic bootstrap resampling allowing for error analysis of data. “avg_dev_distribution.py” was added. This program computes the averages and standard deviations of multiple distributions, making error analysis (e.g. from bootstrap resampling) easier to visualize. WRE.cpp was slightly modified purely for cosmetic reasons. The manual was updated for clarity and to reflect version updates. Examples were removed from the manual in favor of online tutorials (packaged examples remain). Examples were updated to reflect the new format. An additional example is included to demonstrate error analysis. Running time: Preprocessing scripts 1-5 minutes, WHAM engine <1 minute, postprocess script ∼1-5 minutes.
Statistical methods for quantitative mass spectrometry proteomic experiments with labeling.

PubMed

Oberg, Ann L; Mahoney, Douglas W

2012-01-01

Mass Spectrometry utilizing labeling allows multiple specimens to be subjected to mass spectrometry simultaneously. As a result, between-experiment variability is reduced. Here we describe use of fundamental concepts of statistical experimental design in the labeling framework in order to minimize variability and avoid biases. We demonstrate how to export data in the format that is most efficient for statistical analysis. We demonstrate how to assess the need for normalization, perform normalization, and check whether it worked. We describe how to build a model explaining the observed values and test for differential protein abundance along with descriptive statistics and measures of reliability of the findings. Concepts are illustrated through the use of three case studies utilizing the iTRAQ 4-plex labeling protocol.
On computations of variance, covariance and correlation for interval data

NASA Astrophysics Data System (ADS)

Kishida, Masako

2017-02-01

In many practical situations, the data on which statistical analysis is to be performed is only known with interval uncertainty. Different combinations of values from the interval data usually lead to different values of variance, covariance, and correlation. Hence, it is desirable to compute the endpoints of possible values of these statistics. This problem is, however, NP-hard in general. This paper shows that the problem of computing the endpoints of possible values of these statistics can be rewritten as the problem of computing skewed structured singular values ν, for which there exist feasible (polynomial-time) algorithms that compute reasonably tight bounds in most practical cases. This allows one to find tight intervals of the aforementioned statistics for interval data.
Interpretation of statistical results.

PubMed

García Garmendia, J L; Maroto Monserrat, F

2018-02-21

The appropriate interpretation of the statistical results is crucial to understand the advances in medical science. The statistical tools allow us to transform the uncertainty and apparent chaos in nature to measurable parameters which are applicable to our clinical practice. The importance of understanding the meaning and actual extent of these instruments is essential for researchers, the funders of research and for professionals who require a permanent update based on good evidence and supports to decision making. Various aspects of the designs, results and statistical analysis are reviewed, trying to facilitate his comprehension from the basics to what is most common but no better understood, and bringing a constructive, non-exhaustive but realistic look. Copyright © 2018 Elsevier España, S.L.U. y SEMICYUC. All rights reserved.
Covariance approximation for fast and accurate computation of channelized Hotelling observer statistics

NASA Astrophysics Data System (ADS)

Bonetto, P.; Qi, Jinyi; Leahy, R. M.

2000-08-01

Describes a method for computing linear observer statistics for maximum a posteriori (MAP) reconstructions of PET images. The method is based on a theoretical approximation for the mean and covariance of MAP reconstructions. In particular, the authors derive here a closed form for the channelized Hotelling observer (CHO) statistic applied to 2D MAP images. The theoretical analysis models both the Poission statistics of PET data and the inhomogeneity of tracer uptake. The authors show reasonably good correspondence between these theoretical results and Monte Carlo studies. The accuracy and low computational cost of the approximation allow the authors to analyze the observer performance over a wide range of operating conditions and parameter settings for the MAP reconstruction algorithm.
Statistical crystallography of surface micelle spacing

NASA Technical Reports Server (NTRS)

Noever, David A.

1992-01-01

The aggregation of the recently reported surface micelles of block polyelectrolytes is analyzed using techniques of statistical crystallography. A polygonal lattice (Voronoi mosaic) connects center-to-center points, yielding statistical agreement with crystallographic predictions; Aboav-Weaire's law and Lewis's law are verified. This protocol supplements the standard analysis of surface micelles leading to aggregation number determination and, when compared to numerical simulations, allows further insight into the random partitioning of surface films. In particular, agreement with Lewis's law has been linked to the geometric packing requirements of filling two-dimensional space which compete with (or balance) physical forces such as interfacial tension, electrostatic repulsion, and van der Waals attraction.
QGene 4.0, an extensible Java QTL-analysis platform.

PubMed

Joehanes, Roby; Nelson, James C

2008-12-01

Of many statistical methods developed to date for quantitative trait locus (QTL) analysis, only a limited subset are available in public software allowing their exploration, comparison and practical application by researchers. We have developed QGene 4.0, a plug-in platform that allows execution and comparison of a variety of modern QTL-mapping methods and supports third-party addition of new ones. The software accommodates line-cross mating designs consisting of any arbitrary sequence of selfing, backcrossing, intercrossing and haploid-doubling steps that includes map, population, and trait simulators; and is scriptable. Software and documentation are available at http://coding.plantpath.ksu.edu/qgene. Source code is available on request.
Computer image analysis in obtaining characteristics of images: greenhouse tomatoes in the process of generating learning sets of artificial neural networks

NASA Astrophysics Data System (ADS)

Zaborowicz, M.; Przybył, J.; Koszela, K.; Boniecki, P.; Mueller, W.; Raba, B.; Lewicki, A.; Przybył, K.

2014-04-01

The aim of the project was to make the software which on the basis on image of greenhouse tomato allows for the extraction of its characteristics. Data gathered during the image analysis and processing were used to build learning sets of artificial neural networks. Program enables to process pictures in jpeg format, acquisition of statistical information of the picture and export them to an external file. Produced software is intended to batch analyze collected research material and obtained information saved as a csv file. Program allows for analysis of 33 independent parameters implicitly to describe tested image. The application is dedicated to processing and image analysis of greenhouse tomatoes. The program can be used for analysis of other fruits and vegetables of a spherical shape.
FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data.

PubMed

Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs

2011-01-01

This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages.
Conducting Simulation Studies in the R Programming Environment.

PubMed

Hallgren, Kevin A

2013-10-12

Simulation studies allow researchers to answer specific questions about data analysis, statistical power, and best-practices for obtaining accurate results in empirical research. Despite the benefits that simulation research can provide, many researchers are unfamiliar with available tools for conducting their own simulation studies. The use of simulation studies need not be restricted to researchers with advanced skills in statistics and computer programming, and such methods can be implemented by researchers with a variety of abilities and interests. The present paper provides an introduction to methods used for running simulation studies using the R statistical programming environment and is written for individuals with minimal experience running simulation studies or using R. The paper describes the rationale and benefits of using simulations and introduces R functions relevant for many simulation studies. Three examples illustrate different applications for simulation studies, including (a) the use of simulations to answer a novel question about statistical analysis, (b) the use of simulations to estimate statistical power, and (c) the use of simulations to obtain confidence intervals of parameter estimates through bootstrapping. Results and fully annotated syntax from these examples are provided.
Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines

NASA Astrophysics Data System (ADS)

Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.

2016-12-01

Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.
Methods for Assessment of Memory Reactivation.

PubMed

Liu, Shizhao; Grosmark, Andres D; Chen, Zhe

2018-04-13

It has been suggested that reactivation of previously acquired experiences or stored information in declarative memories in the hippocampus and neocortex contributes to memory consolidation and learning. Understanding memory consolidation depends crucially on the development of robust statistical methods for assessing memory reactivation. To date, several statistical methods have seen established for assessing memory reactivation based on bursts of ensemble neural spike activity during offline states. Using population-decoding methods, we propose a new statistical metric, the weighted distance correlation, to assess hippocampal memory reactivation (i.e., spatial memory replay) during quiet wakefulness and slow-wave sleep. The new metric can be combined with an unsupervised population decoding analysis, which is invariant to latent state labeling and allows us to detect statistical dependency beyond linearity in memory traces. We validate the new metric using two rat hippocampal recordings in spatial navigation tasks. Our proposed analysis framework may have a broader impact on assessing memory reactivations in other brain regions under different behavioral tasks.
α -induced reactions on 115In: Cross section measurements and statistical model analysis

NASA Astrophysics Data System (ADS)

Kiss, G. G.; Szücs, T.; Mohr, P.; Török, Zs.; Huszánk, R.; Gyürky, Gy.; Fülöp, Zs.

2018-05-01

Background: α -nucleus optical potentials are basic ingredients of statistical model calculations used in nucleosynthesis simulations. While the nucleon+nucleus optical potential is fairly well known, for the α +nucleus optical potential several different parameter sets exist and large deviations, reaching sometimes even an order of magnitude, are found between the cross section predictions calculated using different parameter sets. Purpose: A measurement of the radiative α -capture and the α -induced reaction cross sections on the nucleus 115In at low energies allows a stringent test of statistical model predictions. Since experimental data are scarce in this mass region, this measurement can be an important input to test the global applicability of α +nucleus optical model potentials and further ingredients of the statistical model. Methods: The reaction cross sections were measured by means of the activation method. The produced activities were determined by off-line detection of the γ rays and characteristic x rays emitted during the electron capture decay of the produced Sb isotopes. The 115In(α ,γ )119Sb and 115In(α ,n )Sb118m reaction cross sections were measured between Ec .m .=8.83 and 15.58 MeV, and the 115In(α ,n )Sb118g reaction was studied between Ec .m .=11.10 and 15.58 MeV. The theoretical analysis was performed within the statistical model. Results: The simultaneous measurement of the (α ,γ ) and (α ,n ) cross sections allowed us to determine a best-fit combination of all parameters for the statistical model. The α +nucleus optical potential is identified as the most important input for the statistical model. The best fit is obtained for the new Atomki-V1 potential, and good reproduction of the experimental data is also achieved for the first version of the Demetriou potentials and the simple McFadden-Satchler potential. The nucleon optical potential, the γ -ray strength function, and the level density parametrization are also constrained by the data although there is no unique best-fit combination. Conclusions: The best-fit calculations allow us to extrapolate the low-energy (α ,γ ) cross section of 115In to the astrophysical Gamow window with reasonable uncertainties. However, still further improvements of the α -nucleus potential are required for a global description of elastic (α ,α ) scattering and α -induced reactions in a wide range of masses and energies.
Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads

NASA Technical Reports Server (NTRS)

Hailperin, Max

1993-01-01

This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that our techniques allow more accurate estimation of the global system load ing, resulting in fewer object migration than local methods. Our method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive methods.
Bayesian approach for counting experiment statistics applied to a neutrino point source analysis

NASA Astrophysics Data System (ADS)

Bose, D.; Brayeur, L.; Casier, M.; de Vries, K. D.; Golup, G.; van Eijndhoven, N.

2013-12-01

In this paper we present a model independent analysis method following Bayesian statistics to analyse data from a generic counting experiment and apply it to the search for neutrinos from point sources. We discuss a test statistic defined following a Bayesian framework that will be used in the search for a signal. In case no signal is found, we derive an upper limit without the introduction of approximations. The Bayesian approach allows us to obtain the full probability density function for both the background and the signal rate. As such, we have direct access to any signal upper limit. The upper limit derivation directly compares with a frequentist approach and is robust in the case of low-counting observations. Furthermore, it allows also to account for previous upper limits obtained by other analyses via the concept of prior information without the need of the ad hoc application of trial factors. To investigate the validity of the presented Bayesian approach, we have applied this method to the public IceCube 40-string configuration data for 10 nearby blazars and we have obtained a flux upper limit, which is in agreement with the upper limits determined via a frequentist approach. Furthermore, the upper limit obtained compares well with the previously published result of IceCube, using the same data set.
GWAMA: software for genome-wide association meta-analysis.

PubMed

Mägi, Reedik; Morris, Andrew P

2010-05-28

Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.

Analysis of select Dalbergia and trade timber using direct analysis in real time and time-of-flight mass spectrometry for CITES enforcement.

PubMed

Lancaster, Cady; Espinoza, Edgard

2012-05-15

International trade of several Dalbergia wood species is regulated by The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). In order to supplement morphological identification of these species, a rapid chemical method of analysis was developed. Using Direct Analysis in Real Time (DART) ionization coupled with Time-of-Flight (TOF) Mass Spectrometry (MS), selected Dalbergia and common trade species were analyzed. Each of the 13 wood species was classified using principal component analysis and linear discriminant analysis (LDA). These statistical data clusters served as reliable anchors for species identification of unknowns. Analysis of 20 or more samples from the 13 species studied in this research indicates that the DART-TOFMS results are reproducible. Statistical analysis of the most abundant ions gave good classifications that were useful for identifying unknown wood samples. DART-TOFMS and LDA analysis of 13 species of selected timber samples and the statistical classification allowed for the correct assignment of unknown wood samples. This method is rapid and can be useful when anatomical identification is difficult but needed in order to support CITES enforcement. Published 2012. This article is a US Government work and is in the public domain in the USA.
Assessment of Reliable Change Using 95% Credible Intervals for the Differences in Proportions: A Statistical Analysis for Case-Study Methodology.

PubMed

Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally

2015-06-01

Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable and can further inform large-scale experimental designs. In this research note, a statistical analysis for case-study data is outlined that employs a modification to the Reliable Change Index (Jacobson & Truax, 1991). The relationship between reliable change and clinical significance is discussed. Example data are used to guide the reader through the use and application of this analysis. A method of analysis is detailed that is suitable for assessing change in measures with binary categorical outcomes. The analysis is illustrated using data from one individual, measured before and after treatment for stuttering. The application of this approach to assess change in categorical, binary data has potential application in speech-language pathology. It enables clinicians and researchers to analyze results from case studies for their statistical and clinical significance. This new method addresses a gap in the research design literature, that is, the lack of analysis methods for noncontinuous data (such as counts, rates, proportions of events) that may be used in case-study designs.
Network Meta-Analysis Using R: A Review of Currently Available Automated Packages

PubMed Central

Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph

2014-01-01

Network meta-analysis (NMA) – a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously – has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA. PMID:25541687
Network meta-analysis using R: a review of currently available automated packages.

PubMed

Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph

2014-01-01

Network meta-analysis (NMA)--a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously--has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA.
Delay, change and bifurcation of the immunofluorescence distribution attractors in health statuses diagnostics and in medical treatment

NASA Astrophysics Data System (ADS)

Galich, Nikolay E.; Filatov, Michael V.

2008-07-01

Communication contains the description of the immunology experiments and the experimental data treatment. New nonlinear methods of immunofluorescence statistical analysis of peripheral blood neutrophils have been developed. We used technology of respiratory burst reaction of DNA fluorescence in the neutrophils cells nuclei due to oxidative activity. The histograms of photon count statistics the radiant neutrophils populations' in flow cytometry experiments are considered. Distributions of the fluorescence flashes frequency as functions of the fluorescence intensity are analyzed. Statistic peculiarities of histograms set for healthy and unhealthy donors allow dividing all histograms on the three classes. The classification is based on three different types of smoothing and long-range scale averaged immunofluorescence distributions and their bifurcation. Heterogeneity peculiarities of long-range scale immunofluorescence distributions allow dividing all histograms on three groups. First histograms group belongs to healthy donors. Two other groups belong to donors with autoimmune and inflammatory diseases. Some of the illnesses are not diagnosed by standards biochemical methods. Medical standards and statistical data of the immunofluorescence histograms for identifications of health and illnesses are interconnected. Possibilities and alterations of immunofluorescence statistics in registration, diagnostics and monitoring of different diseases in various medical treatments have been demonstrated. Health or illness criteria are connected with statistics features of immunofluorescence histograms. Neutrophils populations' fluorescence presents the sensitive clear indicator of health status.
Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

PubMed

Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

2015-01-08

Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Metaprop: a Stata command to perform meta-analysis of binomial data.

PubMed

Nyaga, Victoria N; Arbyn, Marc; Aerts, Marc

2014-01-01

Meta-analyses have become an essential tool in synthesizing evidence on clinical and epidemiological questions derived from a multitude of similar studies assessing the particular issue. Appropriate and accessible statistical software is needed to produce the summary statistic of interest. Metaprop is a statistical program implemented to perform meta-analyses of proportions in Stata. It builds further on the existing Stata procedure metan which is typically used to pool effects (risk ratios, odds ratios, differences of risks or means) but which is also used to pool proportions. Metaprop implements procedures which are specific to binomial data and allows computation of exact binomial and score test-based confidence intervals. It provides appropriate methods for dealing with proportions close to or at the margins where the normal approximation procedures often break down, by use of the binomial distribution to model the within-study variability or by allowing Freeman-Tukey double arcsine transformation to stabilize the variances. Metaprop was applied on two published meta-analyses: 1) prevalence of HPV-infection in women with a Pap smear showing ASC-US; 2) cure rate after treatment for cervical precancer using cold coagulation. The first meta-analysis showed a pooled HPV-prevalence of 43% (95% CI: 38%-48%). In the second meta-analysis, the pooled percentage of cured women was 94% (95% CI: 86%-97%). By using metaprop, no studies with 0% or 100% proportions were excluded from the meta-analysis. Furthermore, study specific and pooled confidence intervals always were within admissible values, contrary to the original publication, where metan was used.
Variations of attractors and wavelet spectra of the immunofluorescence distributions for women in the pregnant period

NASA Astrophysics Data System (ADS)

Galich, Nikolay E.

2008-07-01

Communication contains the description of the immunology data treatment. New nonlinear methods of immunofluorescence statistical analysis of peripheral blood neutrophils have been developed. We used technology of respiratory burst reaction of DNA fluorescence in the neutrophils cells nuclei due to oxidative activity. The histograms of photon count statistics the radiant neutrophils populations' in flow cytometry experiments are considered. Distributions of the fluorescence flashes frequency as functions of the fluorescence intensity are analyzed. Statistic peculiarities of histograms set for women in the pregnant period allow dividing all histograms on the three classes. The classification is based on three different types of smoothing and long-range scale averaged immunofluorescence distributions, their bifurcation and wavelet spectra. Heterogeneity peculiarities of long-range scale immunofluorescence distributions and peculiarities of wavelet spectra allow dividing all histograms on three groups. First histograms group belongs to healthy donors. Two other groups belong to donors with autoimmune and inflammatory diseases. Some of the illnesses are not diagnosed by standards biochemical methods. Medical standards and statistical data of the immunofluorescence histograms for identifications of health and illnesses are interconnected. Peculiarities of immunofluorescence for women in pregnant period are classified. Health or illness criteria are connected with statistics features of immunofluorescence histograms. Neutrophils populations' fluorescence presents the sensitive clear indicator of health status.
Component Models for Fuzzy Data

ERIC Educational Resources Information Center

Coppi, Renato; Giordani, Paolo; D'Urso, Pierpaolo

2006-01-01

The fuzzy perspective in statistical analysis is first illustrated with reference to the "Informational Paradigm" allowing us to deal with different types of uncertainties related to the various informational ingredients (data, model, assumptions). The fuzzy empirical data are then introduced, referring to "J" LR fuzzy variables as observed on "I"…
Fall 2013 International Comparisons

ERIC Educational Resources Information Center

Northwest Evaluation Association, 2014

2014-01-01

This Fall report is an aggregated statistical analysis of Measures of Academic Progress® (MAP®) data from international schools. The report provides a consistent means of comparisons of specific sub-groups by subject and grade, which allows partners to compare their MAP® results with other schools within their region or membership organization.…
77 FR 13326 - Carpenter Technology Corporation and Latrobe Specialty Metals, Inc.; Analysis of Proposed...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-03-06

... established sales and marketing network in the United States that will allow it to be immediately competitive... making sure that your comment does not include any sensitive personal information, like anyone's Social... information such as costs, sales statistics, inventories, formulas, patterns, devices, manufacturing processes...
Analysis of counting data: Development of the SATLAS Python package

NASA Astrophysics Data System (ADS)

Gins, W.; de Groote, R. P.; Bissell, M. L.; Granados Buitrago, C.; Ferrer, R.; Lynch, K. M.; Neyens, G.; Sels, S.

2018-01-01

For the analysis of low-statistics counting experiments, a traditional nonlinear least squares minimization routine may not always provide correct parameter and uncertainty estimates due to the assumptions inherent in the algorithm(s). In response to this, a user-friendly Python package (SATLAS) was written to provide an easy interface between the data and a variety of minimization algorithms which are suited for analyzinglow, as well as high, statistics data. The advantage of this package is that it allows the user to define their own model function and then compare different minimization routines to determine the optimal parameter values and their respective (correlated) errors. Experimental validation of the different approaches in the package is done through analysis of hyperfine structure data of 203Fr gathered by the CRIS experiment at ISOLDE, CERN.
Quantity of dates trumps quality of dates for dense Bayesian radiocarbon sediment chronologies - Gas ion source 14C dating instructed by simultaneous Bayesian accumulation rate modeling

NASA Astrophysics Data System (ADS)

Rosenheim, B. E.; Firesinger, D.; Roberts, M. L.; Burton, J. R.; Khan, N.; Moyer, R. P.

2016-12-01

Radiocarbon (14C) sediment core chronologies benefit from a high density of dates, even when precision of individual dates is sacrificed. This is demonstrated by a combined approach of rapid 14C analysis of CO2 gas generated from carbonates and organic material coupled with Bayesian statistical modeling. Analysis of 14C is facilitated by the gas ion source on the Continuous Flow Accelerator Mass Spectrometry (CFAMS) system at the Woods Hole Oceanographic Institution's National Ocean Sciences Accelerator Mass Spectrometry facility. This instrument is capable of producing a 14C determination of +/- 100 14C y precision every 4-5 minutes, with limited sample handling (dissolution of carbonates and/or combustion of organic carbon in evacuated containers). Rapid analysis allows over-preparation of samples to include replicates at each depth and/or comparison of different sample types at particular depths in a sediment or peat core. Analysis priority is given to depths that have the least chronologic precision as determined by Bayesian modeling of the chronology of calibrated ages. Use of such a statistical approach to determine the order in which samples are run ensures that the chronology constantly improves so long as material is available for the analysis of chronologic weak points. Ultimately, accuracy of the chronology is determined by the material that is actually being dated, and our combined approach allows testing of different constituents of the organic carbon pool and the carbonate minerals within a core. We will present preliminary results from a deep-sea sediment core abundant in deep-sea foraminifera as well as coastal wetland peat cores to demonstrate statistical improvements in sediment- and peat-core chronologies obtained by increasing the quantity and decreasing the quality of individual dates.
Advanced Gear Alloys for Ultra High Strength Applications

NASA Technical Reports Server (NTRS)

Shen, Tony; Krantz, Timothy; Sebastian, Jason

2011-01-01

Single tooth bending fatigue (STBF) test data of UHS Ferrium C61 and C64 alloys are presented in comparison with historical test data of conventional gear steels (9310 and Pyrowear 53) with comparable statistical analysis methods. Pitting and scoring tests of C61 and C64 are works in progress. Boeing statistical analysis of STBF test data for the four gear steels (C61, C64, 9310 and Pyrowear 53) indicates that the UHS grades exhibit increases in fatigue strength in the low cycle fatigue (LCF) regime. In the high cycle fatigue (HCF) regime, the UHS steels exhibit better mean fatigue strength endurance limit behavior (particularly as compared to Pyrowear 53). However, due to considerable scatter in the UHS test data, the anticipated overall benefits of the UHS grades in bending fatigue have not been fully demonstrated. Based on all the test data and on Boeing s analysis, C61 has been selected by Boeing as the gear steel for the final ERDS demonstrator test gearboxes. In terms of potential follow-up work, detailed physics-based, micromechanical analysis and modeling of the fatigue data would allow for a better understanding of the causes of the experimental scatter, and of the transition from high-stress LCF (surface-dominated) to low-stress HCF (subsurface-dominated) fatigue failure. Additional STBF test data and failure analysis work, particularly in the HCF regime and around the endurance limit stress, could allow for better statistical confidence and could reduce the observed effects of experimental test scatter. Finally, the need for further optimization of the residual compressive stress profiles of the UHS steels (resulting from carburization and peening) is noted, particularly for the case of the higher hardness C64 material.
Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads

NASA Technical Reports Server (NTRS)

Hailperin, M.

1993-01-01

This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that the authors' techniques allow more accurate estimation of the global system loading, resulting in fewer object migrations than local methods. The authors' method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive load-balancing methods. Results from a preliminary analysis of another system and from simulation with a synthetic load provide some evidence of more general applicability.
Attitude and Practice of Birth Attendants Regarding the Presence of Male Partner at Delivery in Nigeria.

PubMed

Adeniran, Abiodun; Adesina, Kikelomo; Aboyeji, Abiodun; Balogun, Olayinka; Adeniran, Peace; Fawole, Adegboyega

2017-03-01

Despite increasing request for the male partners' presence at delivery in developing countries, the view and practice of birth attendants remained poorly understood.This study aimed to evaluate the perception, attitude and practice of birth attendants concerning the requests in Nigeria. A prospective, cross-sectional survey involving consenting birth attendants was conducted in six public and six private health facilities in North Central Nigeria. Statistical analysis was done with SPSS-version 20.0; p-value <0.05 was considered statistically significant. Among 564 participants (24.8% male, 75.2% female), 465(82.4%) support the presence of male partners at delivery, 409(72.5%) desire to be with their partner at delivery, 434(77.0%) had previous request for male partner's presence at delivery while 225(51.8%) declined it due to perception that men will disturb. Among the male partners allowed at delivery, 92(44.0%) did not disturb the birth attendant while 5(2.4%) ended in litigation. Among birth attendants who allowed men at delivery in the past, 160(76.6%) will allow men in the future. There was no statistical significance regarding the age, gender, cadre or year of service of birth attendants and attitude to a protocol change to allow men at delivery. Birth attendants who support the presence of men at delivery showed positive attitude (OR33.178, 95%CI6.996-157.358; p<0.001) while those who opined that men would disturb at delivery had a negative attitude (OR0.306, 95%CI0.124-0.755); p0.010) to possible protocol change. Despite perceived negative effects of allowing male partners at delivery, many birth attendants are willing to allow them if necessary structural modifications are instituted.
Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

PubMed

Schlenker, Evelyn

2016-01-01

This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.
Adaptation of Lorke's method to determine and compare ED50 values: the cases of two anticonvulsants drugs.

PubMed

Garrido-Acosta, Osvaldo; Meza-Toledo, Sergio Enrique; Anguiano-Robledo, Liliana; Valencia-Hernández, Ignacio; Chamorro-Cevallos, Germán

2014-01-01

We determined the median effective dose (ED50) values for the anticonvulsants phenobarbital and sodium valproate using a modification of Lorke's method. This modification allowed appropriate statistical analysis and the use of a smaller number of mice per compound tested. The anticonvulsant activities of phenobarbital and sodium valproate were evaluated in male CD1 mice by maximal electroshock (MES) and intraperitoneal administration of pentylenetetrazole (PTZ). The anticonvulsant ED50 values were obtained through modifications of Lorke's method that involved changes in the selection of the three first doses in the initial test and the fourth dose in the second test. Furthermore, a test was added to evaluate the ED50 calculated by the modified Lorke's method, allowing statistical analysis of the data and determination of the confidence limits for ED50. The ED50 for phenobarbital against MES- and PTZ-induced seizures was 16.3mg/kg and 12.7mg/kg, respectively. The sodium valproate values were 261.2mg/kg and 159.7mg/kg, respectively. These results are similar to those found using the traditional methods of finding ED50, suggesting that the modifications made to Lorke's method generate equal results using fewer mice while increasing confidence in the statistical analysis. This adaptation of Lorke's method can be used to determine median letal dose (LD50) or ED50 for compounds with other pharmacological activities. Copyright © 2014 Elsevier Inc. All rights reserved.
Using Genome-Wide Expression Profiling to Define Gene Networks Relevant to the Study of Complex Traits: From RNA Integrity to Network Topology

PubMed Central

O'Brien, M.A.; Costin, B.N.; Miles, M.F.

2014-01-01

Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313
A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis.

PubMed

Gonzalez, Oscar; MacKinnon, David P

Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to an outcome. However, current methods do not allow researchers to study the relationships between general and specific aspects of a construct to an outcome simultaneously. This study proposes a bifactor measurement model for the mediating construct as a way to parse variance and represent the general aspect and specific facets of a construct simultaneously. Monte Carlo simulation results are presented to help determine the properties of mediated effect estimation when the mediator has a bifactor structure and a specific facet of a construct is the true mediator. This study also investigates the conditions when researchers can detect the mediated effect when the multidimensionality of the mediator is ignored and treated as unidimensional. Simulation results indicated that the mediation model with a bifactor mediator measurement model had unbiased and adequate power to detect the mediated effect with a sample size greater than 500 and medium a - and b -paths. Also, results indicate that parameter bias and detection of the mediated effect in both the data-generating model and the misspecified model varies as a function of the amount of facet variance represented in the mediation model. This study contributes to the largely unexplored area of measurement issues in statistical mediation analysis.

Bringing Clouds into Our Lab! - The Influence of Turbulence on the Early Stage Rain Droplets

NASA Astrophysics Data System (ADS)

Yavuz, Mehmet Altug; Kunnen, Rudie; Heijst, Gertjan; Clercx, Herman

2015-11-01

We are investigating a droplet-laden flow in an air-filled turbulence chamber, forced by speaker-driven air jets. The speakers are running in a random manner; yet they allow us to control and define the statistics of the turbulence. We study the motion of droplets with tunable size (Stokes numbers ~ 0.13 - 9) in a turbulent flow, mimicking the early stages of raindrop formation. 3D Particle Tracking Velocimetry (PTV) together with Laser Induced Fluorescence (LIF) methods are chosen as the experimental method to track the droplets and collect data for statistical analysis. Thereby it is possible to study the spatial distribution of the droplets in turbulence using the so-called Radial Distribution Function (RDF), a statistical measure to quantify the clustering of particles. Additionally, 3D-PTV technique allows us to measure velocity statistics of the droplets and the influence of the turbulence on droplet trajectories, both individually and collectively. In this contribution, we will present the clustering probability quantified by the RDF for different Stokes numbers. We will explain the physics underlying the influence of turbulence on droplet cluster behavior. This study supported by FOM/NWO Netherlands.
Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series

NASA Technical Reports Server (NTRS)

Vautard, R.; Ghil, M.

1989-01-01

Two dimensions of a dynamical system given by experimental time series are distinguished. Statistical dimension gives a theoretical upper bound for the minimal number of degrees of freedom required to describe the attractor up to the accuracy of the data, taking into account sampling and noise problems. The dynamical dimension is the intrinsic dimension of the attractor and does not depend on the quality of the data. Singular Spectrum Analysis (SSA) provides estimates of the statistical dimension. SSA also describes the main physical phenomena reflected by the data. It gives adaptive spectral filters associated with the dominant oscillations of the system and clarifies the noise characteristics of the data. SSA is applied to four paleoclimatic records. The principal climatic oscillations and the regime changes in their amplitude are detected. About 10 degrees of freedom are statistically significant in the data. Large noise and insufficient sample length do not allow reliable estimates of the dynamical dimension.
A phylogenetic transform enhances analysis of compositional microbiota data.

PubMed

Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

2017-02-15

Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.
Mediation analysis in nursing research: a methodological review

PubMed Central

Liu, Jianghong; Ulrich, Connie

2017-01-01

Mediation statistical models help clarify the relationship between independent predictor variables and dependent outcomes of interest by assessing the impact of third variables. This type of statistical analysis is applicable for many clinical nursing research questions, yet its use within nursing remains low. Indeed, mediational analyses may help nurse researchers develop more effective and accurate prevention and treatment programs as well as help bridge the gap between scientific knowledge and clinical practice. In addition, this statistical approach allows nurse researchers to ask – and answer – more meaningful and nuanced questions that extend beyond merely determining whether an outcome occurs. Therefore, the goal of this paper is to provide a brief tutorial on the use of mediational analyses in clinical nursing research by briefly introducing the technique and, through selected empirical examples from the nursing literature, demonstrating its applicability in advancing nursing science. PMID:26176804
Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

NASA Astrophysics Data System (ADS)

Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

2017-11-01

The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.
TableViewer for Herschel Data Processing

NASA Astrophysics Data System (ADS)

Zhang, L.; Schulz, B.

2006-07-01

The TableViewer utility is a GUI tool written in Java to support interactive data processing and analysis for the Herschel Space Observatory (Pilbratt et al. 2001). The idea was inherited from a prototype written in IDL (Schulz et al. 2005). It allows to graphically view and analyze tabular data organized in columns with equal numbers of rows. It can be run either as a standalone application, where data access is restricted to FITS (FITS 1999) files only, or it can be run from the Quick Look Analysis(QLA) or Interactive Analysis(IA) command line, from where also objects are accessible. The graphic display is very versatile, allowing plots in either linear or log scales. Zooming, panning, and changing data columns is performed rapidly using a group of navigation buttons. Selecting and de-selecting of fields of data points controls the input to simple analysis tasks like building a statistics table, or generating power spectra. The binary data stored in a TableDataset^1, a Product or in FITS files can also be displayed as tabular data, where values in individual cells can be modified. TableViewer provides several processing utilities which, besides calculation of statistics either for all channels or for selected channels, and calculation of power spectra, allows to convert/repair datasets by changing the unit name of data columns, and by modifying data values in columns with a simple calculator tool. Interactively selected data can be separated out, and modified data sets can be saved to FITS files. The tool will be very helpful especially in the early phases of Herschel data analysis when a quick access to contents of data products is important. TableDataset and Product are Java classes defined in herschel.ia.dataset.
LV software support for supersonic flow analysis

NASA Technical Reports Server (NTRS)

Bell, W. A.; Lepicovsky, J.

1992-01-01

The software for configuring an LV counter processor system has been developed using structured design. The LV system includes up to three counter processors and a rotary encoder. The software for configuring and testing the LV system has been developed, tested, and included in an overall software package for data acquisition, analysis, and reduction. Error handling routines respond to both operator and instrument errors which often arise in the course of measuring complex, high-speed flows. The use of networking capabilities greatly facilitates the software development process by allowing software development and testing from a remote site. In addition, high-speed transfers allow graphics files or commands to provide viewing of the data from a remote site. Further advances in data analysis require corresponding advances in procedures for statistical and time series analysis of nonuniformly sampled data.
LV software support for supersonic flow analysis

NASA Technical Reports Server (NTRS)

Bell, William A.

1992-01-01

The software for configuring a Laser Velocimeter (LV) counter processor system was developed using structured design. The LV system includes up to three counter processors and a rotary encoder. The software for configuring and testing the LV system was developed, tested, and included in an overall software package for data acquisition, analysis, and reduction. Error handling routines respond to both operator and instrument errors which often arise in the course of measuring complex, high-speed flows. The use of networking capabilities greatly facilitates the software development process by allowing software development and testing from a remote site. In addition, high-speed transfers allow graphics files or commands to provide viewing of the data from a remote site. Further advances in data analysis require corresponding advances in procedures for statistical and time series analysis of nonuniformly sampled data.
Geosocial process and its regularities

NASA Astrophysics Data System (ADS)

Vikulina, Marina; Vikulin, Alexander; Dolgaya, Anna

2015-04-01

Natural disasters and social events (wars, revolutions, genocides, epidemics, fires, etc.) accompany each other throughout human civilization, thus reflecting the close relationship of these phenomena that are seemingly of different nature. In order to study this relationship authors compiled and analyzed the list of the 2,400 natural disasters and social phenomena weighted by their magnitude that occurred during the last XXXVI centuries of our history. Statistical analysis was performed separately for each aggregate (natural disasters and social phenomena), and for particular statistically representative types of events. There was 5 + 5 = 10 types. It is shown that the numbers of events in the list are distributed by logarithmic law: the bigger the event, the less likely it happens. For each type of events and each aggregate the existence of periodicities with periods of 280 ± 60 years was established. Statistical analysis of the time intervals between adjacent events for both aggregates showed good agreement with Weibull-Gnedenko distribution with shape parameter less than 1, which is equivalent to the conclusion about the grouping of events at small time intervals. Modeling of statistics of time intervals with Pareto distribution allowed to identify the emergent property for all events in the aggregate. This result allowed the authors to make conclusion about interaction between natural disasters and social phenomena. The list of events compiled by authors and first identified properties of cyclicity, grouping and interaction process reflected by this list is the basis of modeling essentially unified geosocial process at high enough statistical level. Proof of interaction between "lifeless" Nature and Society is fundamental and provided a new approach to forecasting demographic crises with taking into account both natural disasters and social phenomena.
FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data

PubMed Central

Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs

2011-01-01

This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages. PMID:21253357
Finite-data-size study on practical universal blind quantum computation

NASA Astrophysics Data System (ADS)

Zhao, Qiang; Li, Qiong

2018-07-01

The universal blind quantum computation with weak coherent pulses protocol is a practical scheme to allow a client to delegate a computation to a remote server while the computation hidden. However, in the practical protocol, a finite data size will influence the preparation efficiency in the remote blind qubit state preparation (RBSP). In this paper, a modified RBSP protocol with two decoy states is studied in the finite data size. The issue of its statistical fluctuations is analyzed thoroughly. The theoretical analysis and simulation results show that two-decoy-state case with statistical fluctuation is closer to the asymptotic case than the one-decoy-state case with statistical fluctuation. Particularly, the two-decoy-state protocol can achieve a longer communication distance than the one-decoy-state case in this statistical fluctuation situation.
Why we should use simpler models if the data allow this: relevance for ANOVA designs in experimental biology.

PubMed

Lazic, Stanley E

2008-07-21

Analysis of variance (ANOVA) is a common statistical technique in physiological research, and often one or more of the independent/predictor variables such as dose, time, or age, can be treated as a continuous, rather than a categorical variable during analysis - even if subjects were randomly assigned to treatment groups. While this is not common, there are a number of advantages of such an approach, including greater statistical power due to increased precision, a simpler and more informative interpretation of the results, greater parsimony, and transformation of the predictor variable is possible. An example is given from an experiment where rats were randomly assigned to receive either 0, 60, 180, or 240 mg/L of fluoxetine in their drinking water, with performance on the forced swim test as the outcome measure. Dose was treated as either a categorical or continuous variable during analysis, with the latter analysis leading to a more powerful test (p = 0.021 vs. p = 0.159). This will be true in general, and the reasons for this are discussed. There are many advantages to treating variables as continuous numeric variables if the data allow this, and this should be employed more often in experimental biology. Failure to use the optimal analysis runs the risk of missing significant effects or relationships.
Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis.

PubMed

Cormier, Nathan; Kolisnik, Tyler; Bieda, Mark

2016-07-05

There has been an enormous expansion of use of chromatin immunoprecipitation followed by sequencing (ChIP-seq) technologies. Analysis of large-scale ChIP-seq datasets involves a complex series of steps and production of several specialized graphical outputs. A number of systems have emphasized custom development of ChIP-seq pipelines. These systems are primarily based on custom programming of a single, complex pipeline or supply libraries of modules and do not produce the full range of outputs commonly produced for ChIP-seq datasets. It is desirable to have more comprehensive pipelines, in particular ones addressing common metadata tasks, such as pathway analysis, and pipelines producing standard complex graphical outputs. It is advantageous if these are highly modular systems, available as both turnkey pipelines and individual modules, that are easily comprehensible, modifiable and extensible to allow rapid alteration in response to new analysis developments in this growing area. Furthermore, it is advantageous if these pipelines allow data provenance tracking. We present a set of 20 ChIP-seq analysis software modules implemented in the Kepler workflow system; most (18/20) were also implemented as standalone, fully functional R scripts. The set consists of four full turnkey pipelines and 16 component modules. The turnkey pipelines in Kepler allow data provenance tracking. Implementation emphasized use of common R packages and widely-used external tools (e.g., MACS for peak finding), along with custom programming. This software presents comprehensive solutions and easily repurposed code blocks for ChIP-seq analysis and pipeline creation. Tasks include mapping raw reads, peakfinding via MACS, summary statistics, peak location statistics, summary plots centered on the transcription start site (TSS), gene ontology, pathway analysis, and de novo motif finding, among others. These pipelines range from those performing a single task to those performing full analyses of ChIP-seq data. The pipelines are supplied as both Kepler workflows, which allow data provenance tracking, and, in the majority of cases, as standalone R scripts. These pipelines are designed for ease of modification and repurposing.
EuroPhenome: a repository for high-throughput mouse phenotyping data

PubMed Central

Morgan, Hugh; Beck, Tim; Blake, Andrew; Gates, Hilary; Adams, Niels; Debouzy, Guillaume; Leblanc, Sophie; Lengger, Christoph; Maier, Holger; Melvin, David; Meziane, Hamid; Richardson, Dave; Wells, Sara; White, Jacqui; Wood, Joe; de Angelis, Martin Hrabé; Brown, Steve D. M.; Hancock, John M.; Mallon, Ann-Marie

2010-01-01

The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies. PMID:19933761
Four modes of optical parametric operation for squeezed state generation

NASA Astrophysics Data System (ADS)

Andersen, U. L.; Buchler, B. C.; Lam, P. K.; Wu, J. W.; Gao, J. R.; Bachor, H.-A.

2003-11-01

We report a versatile instrument, based on a monolithic optical parametric amplifier, which reliably generates four different types of squeezed light. We obtained vacuum squeezing, low power amplitude squeezing, phase squeezing and bright amplitude squeezing. We show a complete analysis of this light, including a full quantum state tomography. In addition we demonstrate the direct detection of the squeezed state statistics without the aid of a spectrum analyser. This technique makes the nonclassical properties directly visible and allows complete measurement of the statistical moments of the squeezed quadrature.
The analysis of the statistical and historical information gathered during the development of the Shuttle Orbiter Primary Flight Software

NASA Technical Reports Server (NTRS)

Simmons, D. B.; Marchbanks, M. P., Jr.; Quick, M. J.

1982-01-01

The results of an effort to thoroughly and objectively analyze the statistical and historical information gathered during the development of the Shuttle Orbiter Primary Flight Software are given. The particular areas of interest include cost of the software, reliability of the software, requirements for the software and how the requirements changed during development of the system. Data related to the current version of the software system produced some interesting results. Suggestions are made for the saving of additional data which will allow additional investigation.
LP-search and its use in analysis of the accuracy of control systems with acoustical models

NASA Technical Reports Server (NTRS)

Sergeyev, V. I.; Sobol, I. M.; Statnikov, R. B.; Statnikov, I. N.

1973-01-01

The LP-search is proposed as an analog of the Monte Carlo method for finding values in nonlinear statistical systems. It is concluded that: To attain the required accuracy in solution to the problem of control for a statistical system in the LP-search, a considerably smaller number of tests is required than in the Monte Carlo method. The LP-search allows the possibility of multiple repetitions of tests under identical conditions and observability of the output variables of the system.
Multivariate statistical analysis of the polyphenolic constituents in kiwifruit juices to trace fruit varieties and geographical origins.

PubMed

Guo, Jing; Yuan, Yahong; Dou, Pei; Yue, Tianli

2017-10-01

Fifty-one kiwifruit juice samples of seven kiwifruit varieties from five regions in China were analyzed to determine their polyphenols contents and to trace fruit varieties and geographical origins by multivariate statistical analysis. Twenty-one polyphenols belonging to four compound classes were determined by ultra-high-performance liquid chromatography coupled with ultra-high-resolution TOF mass spectrometry. (-)-Epicatechin, (+)-catechin, procyanidin B1 and caffeic acid derivatives were the predominant phenolic compounds in the juices. Principal component analysis (PCA) allowed a clear separation of the juices according to kiwifruit varieties. Stepwise linear discriminant analysis (SLDA) yielded satisfactory categorization of samples, provided 100% success rate according to kiwifruit varieties and 92.2% success rate according to geographical origins. The result showed that polyphenolic profiles of kiwifruit juices contain enough information to trace fruit varieties and geographical origins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ion induced electron emission statistics under Agm- cluster bombardment of Ag

NASA Astrophysics Data System (ADS)

Breuers, A.; Penning, R.; Wucher, A.

2018-05-01

The electron emission from a polycrystalline silver surface under bombardment with Agm- cluster ions (m = 1, 2, 3) is investigated in terms of ion induced kinetic excitation. The electron yield γ is determined directly by a current measurement method on the one hand and implicitly by the analysis of the electron emission statistics on the other hand. Successful measurements of the electron emission spectra ensure a deeper understanding of the ion induced kinetic electron emission process, with particular emphasis on the effect of the projectile cluster size to the yield as well as to emission statistics. The results allow a quantitative comparison to computer simulations performed for silver atoms and clusters impinging onto a silver surface.
Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

PubMed

Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

2014-01-01

Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.

Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science

PubMed Central

Veldkamp, Coosje L. S.; Nuijten, Michèle B.; Dominguez-Alvarez, Linda; van Assen, Marcel A. L. M.; Wicherts, Jelte M.

2014-01-01

Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this ‘co-piloting’ currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors. PMID:25493918
The Statistical Analysis Techniques to Support the NGNP Fuel Performance Experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bihn T. Pham; Jeffrey J. Einerson

2010-06-01

This paper describes the development and application of statistical analysis techniques to support the AGR experimental program on NGNP fuel performance. The experiments conducted in the Idaho National Laboratory’s Advanced Test Reactor employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule. The tests are instrumented with thermocouples embedded in graphite blocks and the target quantity (fuel/graphite temperature) is regulated by the He-Ne gas mixture that fills the gap volume. Three techniques for statistical analysis, namely control charting, correlation analysis, and regression analysis, are implemented in the SAS-based NGNP Data Management and Analysis System (NDMAS) for automatedmore » processing and qualification of the AGR measured data. The NDMAS also stores daily neutronic (power) and thermal (heat transfer) code simulation results along with the measurement data, allowing for their combined use and comparative scrutiny. The ultimate objective of this work includes (a) a multi-faceted system for data monitoring and data accuracy testing, (b) identification of possible modes of diagnostics deterioration and changes in experimental conditions, (c) qualification of data for use in code validation, and (d) identification and use of data trends to support effective control of test conditions with respect to the test target. Analysis results and examples given in the paper show the three statistical analysis techniques providing a complementary capability to warn of thermocouple failures. It also suggests that the regression analysis models relating calculated fuel temperatures and thermocouple readings can enable online regulation of experimental parameters (i.e. gas mixture content), to effectively maintain the target quantity (fuel temperature) within a given range.« less
ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data.

PubMed

Carter, Kim W; Francis, Richard W; Carter, K W; Francis, R W; Bresnahan, M; Gissler, M; Grønborg, T K; Gross, R; Gunnes, N; Hammond, G; Hornig, M; Hultman, C M; Huttunen, J; Langridge, A; Leonard, H; Newman, S; Parner, E T; Petersson, G; Reichenberg, A; Sandin, S; Schendel, D E; Schalkwyk, L; Sourander, A; Steadman, C; Stoltenberg, C; Suominen, A; Surén, P; Susser, E; Sylvester Vethanayagam, A; Yusof, Z

2016-04-01

Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations. Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage. Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory. ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/]. © The Author 2015. Published by Oxford University Press on behalf of the International Epidemiological Association.
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)

NASA Astrophysics Data System (ADS)

Kasibhatla, P.

2004-12-01

In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
Analysis of Open-Ended Statistics Questions with Many Facet Rasch Model

ERIC Educational Resources Information Center

Güler, Nese

2014-01-01

Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…
Big Data, Big Opportunities, and Big Challenges.

PubMed

Frelinger, Jeffrey A

2015-11-01

High-throughput assays have begun to revolutionize modern biology and medicine. The advent of cheap next-generation sequencing (NGS) has made it possible to interrogate cells and human populations as never before. Although this has allowed us to investigate the genetics, gene expression, and impacts of the microbiome, there remain both practical and conceptual challenges. These include data handling, storage, and statistical analysis, as well as an inherent problem of the analysis of heterogeneous cell populations.
Systems and methods for knowledge discovery in spatial data

DOEpatents

Obradovic, Zoran; Fiez, Timothy E.; Vucetic, Slobodan; Lazarevic, Aleksandar; Pokrajac, Dragoljub; Hoskinson, Reed L.

2005-03-08

Systems and methods are provided for knowledge discovery in spatial data as well as to systems and methods for optimizing recipes used in spatial environments such as may be found in precision agriculture. A spatial data analysis and modeling module is provided which allows users to interactively and flexibly analyze and mine spatial data. The spatial data analysis and modeling module applies spatial data mining algorithms through a number of steps. The data loading and generation module obtains or generates spatial data and allows for basic partitioning. The inspection module provides basic statistical analysis. The preprocessing module smoothes and cleans the data and allows for basic manipulation of the data. The partitioning module provides for more advanced data partitioning. The prediction module applies regression and classification algorithms on the spatial data. The integration module enhances prediction methods by combining and integrating models. The recommendation module provides the user with site-specific recommendations as to how to optimize a recipe for a spatial environment such as a fertilizer recipe for an agricultural field.
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.

PubMed

Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz

2017-03-01

Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Statistical Analysis of Small-Scale Magnetic Flux Emergence Patterns: A Useful Subsurface Diagnostic?

NASA Astrophysics Data System (ADS)

Lamb, Derek A.

2016-10-01

While sunspots follow a well-defined pattern of emergence in space and time, small-scale flux emergence is assumed to occur randomly at all times in the quiet Sun. HMI's full-disk coverage, high cadence, spatial resolution, and duty cycle allow us to probe that basic assumption. Some case studies of emergence suggest that temporal clustering on spatial scales of 50-150 Mm may occur. If clustering is present, it could serve as a diagnostic of large-scale subsurface magnetic field structures. We present the results of a manual survey of small-scale flux emergence events over a short time period, and a statistical analysis addressing the question of whether these events show spatio-temporal behavior that is anything other than random.
Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

PubMed

Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D

2017-01-01

If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
DASS-GUI: a user interface for identification and analysis of significant patterns in non-sequential data.

PubMed

Hollunder, Jens; Friedel, Maik; Kuiper, Martin; Wilhelm, Thomas

2010-04-01

Many large 'omics' datasets have been published and many more are expected in the near future. New analysis methods are needed for best exploitation. We have developed a graphical user interface (GUI) for easy data analysis. Our discovery of all significant substructures (DASS) approach elucidates the underlying modularity, a typical feature of complex biological data. It is related to biclustering and other data mining approaches. Importantly, DASS-GUI also allows handling of multi-sets and calculation of statistical significances. DASS-GUI contains tools for further analysis of the identified patterns: analysis of the pattern hierarchy, enrichment analysis, module validation, analysis of additional numerical data, easy handling of synonymous names, clustering, filtering and merging. Different export options allow easy usage of additional tools such as Cytoscape. Source code, pre-compiled binaries for different systems, a comprehensive tutorial, case studies and many additional datasets are freely available at http://www.ifr.ac.uk/dass/gui/. DASS-GUI is implemented in Qt.
Improved score statistics for meta-analysis in single-variant and gene-level association studies.

PubMed

Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo

2018-06-01

Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.
FADTTSter: accelerating hypothesis testing with functional analysis of diffusion tensor tract statistics

NASA Astrophysics Data System (ADS)

Noel, Jean; Prieto, Juan C.; Styner, Martin

2017-03-01

Functional Analysis of Diffusion Tensor Tract Statistics (FADTTS) is a toolbox for analysis of white matter (WM) fiber tracts. It allows associating diffusion properties along major WM bundles with a set of covariates of interest, such as age, diagnostic status and gender, and the structure of the variability of these WM tract properties. However, to use this toolbox, a user must have an intermediate knowledge in scripting languages (MATLAB). FADTTSter was created to overcome this issue and make the statistical analysis accessible to any non-technical researcher. FADTTSter is actively being used by researchers at the University of North Carolina. FADTTSter guides non-technical users through a series of steps including quality control of subjects and fibers in order to setup the necessary parameters to run FADTTS. Additionally, FADTTSter implements interactive charts for FADTTS' outputs. This interactive chart enhances the researcher experience and facilitates the analysis of the results. FADTTSter's motivation is to improve usability and provide a new analysis tool to the community that complements FADTTS. Ultimately, by enabling FADTTS to a broader audience, FADTTSter seeks to accelerate hypothesis testing in neuroimaging studies involving heterogeneous clinical data and diffusion tensor imaging. This work is submitted to the Biomedical Applications in Molecular, Structural, and Functional Imaging conference. The source code of this application is available in NITRC.
Damage modeling and statistical analysis of optics damage performance in MJ-class laser systems.

PubMed

Liao, Zhi M; Raymond, B; Gaylord, J; Fallejo, R; Bude, J; Wegner, P

2014-11-17

Modeling the lifetime of a fused silica optic is described for a multiple beam, MJ-class laser system. This entails combining optic processing data along with laser shot data to account for complete history of optic processing and shot exposure. Integrating with online inspection data allows for the construction of a performance metric to describe how an optic performs with respect to the model. This methodology helps to validate the damage model as well as allows strategic planning and identifying potential hidden parameters that are affecting the optic's performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)

P-Mart was designed specifically to allow cancer researchers to perform robust statistical processing of publicly available cancer proteomic datasets. To date an online statistical processing suite for proteomics does not exist. The P-Mart software is designed to allow statistical programmers to utilize these algorithms through packages in the R programming language as well as offering a web-based interface using the Azure cloud technology. The Azure cloud technology also allows the release of the software via Docker containers.
Ontology-based content analysis of US patent applications from 2001-2010.

PubMed

Weber, Lutz; Böhme, Timo; Irmer, Matthias

2013-01-01

Ontology-based semantic text analysis methods allow to automatically extract knowledge relationships and data from text documents. In this review, we have applied these technologies for the systematic analysis of pharmaceutical patents. Hierarchical concepts from the knowledge domains of chemical compounds, diseases and proteins were used to annotate full-text US patent applications that deal with pharmacological activities of chemical compounds and filed in the years 2001-2010. Compounds claimed in these applications have been classified into their respective compound classes to review the distribution of scaffold types or general compound classes such as natural products in a time-dependent manner. Similarly, the target proteins and claimed utility of the compounds have been classified and the most relevant were extracted. The method presented allows the discovery of the main areas of innovation as well as emerging fields of patenting activities - providing a broad statistical basis for competitor analysis and decision-making efforts.
Study design and statistical analysis of data in human population studies with the micronucleus assay.

PubMed

Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano

2011-01-01

The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.
ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Antcheva, I.; /CERN; Ballintijn, M.

2009-01-01

ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.« less
Influence of nonlinear effects on statistical properties of the radiation from SASE FEL

NASA Astrophysics Data System (ADS)

Saldin, E. L.; Schneidmiller, E. A.; Yurkov, M. V.

1998-02-01

The paper presents analysis of statistical properties of the radiation from self-amplified spontaneous emission (SASE) free-electron laser operating in nonlinear mode. The present approach allows one to calculate the following statistical properties of the SASE FEL radiation: time and spectral field correlation functions, distribution of the fluctuations of the instantaneous radiation power, distribution of the energy in the electron bunch, distribution of the radiation energy after monochromator installed at the FEL amplifier exit and the radiation spectrum. It has been observed that the statistics of the instantaneous radiation power from SASE FEL operating in the nonlinear regime changes significantly with respect to the linear regime. All numerical results presented in the paper have been calculated for the 70 nm SASE FEL at the TESLA Test Facility under construction at DESY.
Quantitative Analysis of Venus Radar Backscatter Data in ArcGIS

NASA Technical Reports Server (NTRS)

Long, S. M.; Grosfils, E. B.

2005-01-01

Ongoing mapping of the Ganiki Planitia (V14) quadrangle of Venus and definition of material units has involved an integrated but qualitative analysis of Magellan radar backscatter images and topography using standard geomorphological mapping techniques. However, such analyses do not take full advantage of the quantitative information contained within the images. Analysis of the backscatter coefficient allows a much more rigorous statistical comparison between mapped units, permitting first order selfsimilarity tests of geographically separated materials assigned identical geomorphological labels. Such analyses cannot be performed directly on pixel (DN) values from Magellan backscatter images, because the pixels are scaled to the Muhleman law for radar echoes on Venus and are not corrected for latitudinal variations in incidence angle. Therefore, DN values must be converted based on pixel latitude back to their backscatter coefficient values before accurate statistical analysis can occur. Here we present a method for performing the conversions and analysis of Magellan backscatter data using commonly available ArcGIS software and illustrate the advantages of the process for geological mapping.

The role of ensemble-based statistics in variational assimilation of cloud-affected observations from infrared imagers

NASA Astrophysics Data System (ADS)

Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris

2017-04-01

Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the apparent benefit from using ensembles for cloudy radiance assimilation in an EnVar context.
Physical and Hydrological Meaning of the Spectral Information from Hydrodynamic Signals at Karst Springs

NASA Astrophysics Data System (ADS)

Dufoyer, A.; Lecoq, N.; Massei, N.; Marechal, J. C.

2017-12-01

Physics-based modeling of karst systems remains almost impossible without enough accurate information about the inner physical characteristics. Usually, the only available hydrodynamic information is the flow rate at the karst outlet. Numerous works in the past decades have used and proven the usefulness of time-series analysis and spectral techniques applied to spring flow, precipitations or even physico-chemical parameters, for interpreting karst hydrological functioning. However, identifying or interpreting the karst systems physical features that control statistical or spectral characteristics of spring flow variations is still challenging, not to say sometimes controversial. The main objective of this work is to determine how the statistical and spectral characteristics of the hydrodynamic signal at karst springs can be related to inner physical and hydraulic properties. In order to address this issue, we undertake an empirical approach based on the use of both distributed and physics-based models, and on synthetic systems responses. The first step of the research is to conduct a sensitivity analysis of time-series/spectral methods to karst hydraulic and physical properties. For this purpose, forward modeling of flow through several simple, constrained and synthetic cases in response to precipitations is undertaken. It allows us to quantify how the statistical and spectral characteristics of flow at the outlet are sensitive to changes (i) in conduit geometries, and (ii) in hydraulic parameters of the system (matrix/conduit exchange rate, matrix hydraulic conductivity and storativity). The flow differential equations resolved by MARTHE, a computer code developed by the BRGM, allows karst conduits modeling. From signal processing on simulated spring responses, we hope to determine if specific frequencies are always modified, thanks to Fourier series and multi-resolution analysis. We also hope to quantify which parameters are the most variable with auto-correlation analysis: first results seem to show higher variations due to conduit conductivity than the ones due to matrix/conduit exchange rate. Future steps will be using another computer code, based on double-continuum approach and allowing turbulent conduit flow, and modeling a natural system.
MAGMA: analysis of two-channel microarrays made easy.

PubMed

Rehrauer, Hubert; Zoller, Stefan; Schlapbach, Ralph

2007-07-01

The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at www.magma-fgcz.uzh.ch.
Improved dynamical scaling analysis using the kernel method for nonequilibrium relaxation.

PubMed

Echinaka, Yuki; Ozeki, Yukiyasu

2016-10-01

The dynamical scaling analysis for the Kosterlitz-Thouless transition in the nonequilibrium relaxation method is improved by the use of Bayesian statistics and the kernel method. This allows data to be fitted to a scaling function without using any parametric model function, which makes the results more reliable and reproducible and enables automatic and faster parameter estimation. Applying this method, the bootstrap method is introduced and a numerical discrimination for the transition type is proposed.
A new item response theory model to adjust data allowing examinee choice

PubMed Central

Costa, Marcelo Azevedo; Braga Oliveira, Rivert Paulo

2018-01-01

In a typical questionnaire testing situation, examinees are not allowed to choose which items they answer because of a technical issue in obtaining satisfactory statistical estimates of examinee ability and item difficulty. This paper introduces a new item response theory (IRT) model that incorporates information from a novel representation of questionnaire data using network analysis. Three scenarios in which examinees select a subset of items were simulated. In the first scenario, the assumptions required to apply the standard Rasch model are met, thus establishing a reference for parameter accuracy. The second and third scenarios include five increasing levels of violating those assumptions. The results show substantial improvements over the standard model in item parameter recovery. Furthermore, the accuracy was closer to the reference in almost every evaluated scenario. To the best of our knowledge, this is the first proposal to obtain satisfactory IRT statistical estimates in the last two scenarios. PMID:29389996
77 FR 74224 - OSHA Data Initiative (ODI); Extension of the Office of Management and Budget's (OMB) Approval of...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-13

... Schmidt, Office of Statistical Analysis, Occupational Safety and Health Administration, U.S. Department of... 1904. These data will allow OSHA to calculate occupational injury and illness rates and to focus its... and dates of birth. Although all submissions are listed in the http://www.regulations.gov index, some...
An In-Depth Analysis of Adult Learning Policies and Their Effectiveness in Europe

ERIC Educational Resources Information Center

European Union, 2015

2015-01-01

Adult learning policies, like any other policies, need to be effective: they need to reach their objectives and attain the desired impacts, which should be carefully defined. Understanding the performance of policies allows policy makers to change and improve them. A growing body of research and statistics provides important insights into how…
From genes to ecosystems: Measuring evolutionary diversity and community structure with Forest Inventory and Analysis (FIA) data

Treesearch

Kevin M. Potter

2009-01-01

Forest genetic sustainability is an important component of forest health because genetic diversity and evolutionary processes allow for the adaptation of species and for the maintenance of ecosystem functionality and resilience. Phylogenetic community analyses, a set of new statistical methods for describing the evolutionary relationships among species, offer an...
Public Educational Expenditures in Industrialized Countries: An Analytical Comparison. MacArthur/Spencer Series Number 19.

ERIC Educational Resources Information Center

Ram, Rati

Educational expenditures in 18 Organisation for Economic Cooperation and Development (OECD) countries for the years 1975 and 1985 are investigated in this report. Data collection is based on analysis of UNESCO's 1989 "Statistical Yearbook" and OECD data. Although data deficiencies allow only a broad assessment, a conclusion is that…
Quantitative comparison of the absorption spectra of the gas mixtures in analogy to the criterion of Pearson

NASA Astrophysics Data System (ADS)

Kistenev, Yu. V.; Kuzmin, D. A.; Sandykova, E. A.; Shapovalov, A. V.

2015-11-01

An approach to the reduction of the space of the absorption spectra, based on the original criterion for profile analysis of the spectra, was proposed. This criterion dates back to the known statistics chi-square test of Pearson. Introduced criterion allows to quantify the differences of spectral curves.
From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

PubMed Central

Cocco, Simona; Monasson, Remi; Weigt, Martin

2013-01-01

Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764
Computational and Statistical Analyses of Amino Acid Usage and Physico-Chemical Properties of the Twelve Late Embryogenesis Abundant Protein Classes

PubMed Central

Jaspard, Emmanuel; Macherel, David; Hunault, Gilles

2012-01-01

Late Embryogenesis Abundant Proteins (LEAPs) are ubiquitous proteins expected to play major roles in desiccation tolerance. Little is known about their structure - function relationships because of the scarcity of 3-D structures for LEAPs. The previous building of LEAPdb, a database dedicated to LEAPs from plants and other organisms, led to the classification of 710 LEAPs into 12 non-overlapping classes with distinct properties. Using this resource, numerous physico-chemical properties of LEAPs and amino acid usage by LEAPs have been computed and statistically analyzed, revealing distinctive features for each class. This unprecedented analysis allowed a rigorous characterization of the 12 LEAP classes, which differed also in multiple structural and physico-chemical features. Although most LEAPs can be predicted as intrinsically disordered proteins, the analysis indicates that LEAP class 7 (PF03168) and probably LEAP class 11 (PF04927) are natively folded proteins. This study thus provides a detailed description of the structural properties of this protein family opening the path toward further LEAP structure - function analysis. Finally, since each LEAP class can be clearly characterized by a unique set of physico-chemical properties, this will allow development of software to predict proteins as LEAPs. PMID:22615859
Research in Theoretical High Energy Nuclear Physics at the University of Arizona

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rafelski, Johann

In the past decade (2004-2015) we addressed the quest for the understanding of how quark confinement works, how it can be dissolved in a limited space-time domain, and what this means: i) for the paradigm of the laws of physics of present day; and, ii) for our understanding of cosmology. The focus of our in laboratory matter formation work has been centered on the understanding of the less frequently produced hadronic particles (e.g. strange antibaryons, charmed and beauty hadrons, massive resonances, charmonium, B c). We have developed a public analysis tool, SHARE (Statistical HAdronization with REsonances) which allows a precisemore » model description of experimental particle yield and fluctuation data. We have developed a charm recombination model to allow for off-equilibrium rate of charmonium production. We have developed methods and techniques which allowed us to study the hadron resonance yield evolution by kinetic theory. We explored entropy, strangeness and charm as signature of QGP addressing the wide range of reaction energy for AGS, SPS, RHIC and LHC energy range. In analysis of experimental data, we obtained both statistical parameters as well as physical properties of the hadron source. The following pages present listings of our primary writing on these questions. The abstracts are included in lieu of more detailed discussion of our research accomplishments in each of the publications.« less
An improved multiple linear regression and data analysis computer program package

NASA Technical Reports Server (NTRS)

Sidik, S. M.

1972-01-01

NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Application of factor analysis of infrared spectra for quantitative determination of beta-tricalcium phosphate in calcium hydroxylapatite.

PubMed

Arsenyev, P A; Trezvov, V V; Saratovskaya, N V

1997-01-01

This work represents a method, which allows to determine phase composition of calcium hydroxylapatite basing on its infrared spectrum. The method uses factor analysis of the spectral data of calibration set of samples to determine minimal number of factors required to reproduce the spectra within experimental error. Multiple linear regression is applied to establish correlation between factor scores of calibration standards and their properties. The regression equations can be used to predict the property value of unknown sample. The regression model was built for determination of beta-tricalcium phosphate content in hydroxylapatite. Statistical estimation of quality of the model was carried out. Application of the factor analysis on spectral data allows to increase accuracy of beta-tricalcium phosphate determination and expand the range of determination towards its less concentration. Reproducibility of results is retained.
Statistical characterization of short wind waves from stereo images of the sea surface

NASA Astrophysics Data System (ADS)

Mironov, Alexey; Yurovskaya, Maria; Dulov, Vladimir; Hauser, Danièle; Guérin, Charles-Antoine

2013-04-01

We propose a methodology to extract short-scale statistical characteristics of the sea surface topography by means of stereo image reconstruction. The possibilities and limitations of the technique are discussed and tested on a data set acquired from an oceanographic platform at the Black Sea. The analysis shows that reconstruction of the topography based on stereo method is an efficient way to derive non-trivial statistical properties of surface short- and intermediate-waves (say from 1 centimer to 1 meter). Most technical issues pertaining to this type of datasets (limited range of scales, lacunarity of data or irregular sampling) can be partially overcome by appropriate processing of the available points. The proposed technique also allows one to avoid linear interpolation which dramatically corrupts properties of retrieved surfaces. The processing technique imposes that the field of elevation be polynomially detrended, which has the effect of filtering out the large scales. Hence the statistical analysis can only address the small-scale components of the sea surface. The precise cut-off wavelength, which is approximatively half the patch size, can be obtained by applying a high-pass frequency filter on the reference gauge time records. The results obtained for the one- and two-points statistics of small-scale elevations are shown consistent, at least in order of magnitude, with the corresponding gauge measurements as well as other experimental measurements available in the literature. The calculation of the structure functions provides a powerful tool to investigate spectral and statistical properties of the field of elevations. Experimental parametrization of the third-order structure function, the so-called skewness function, is one of the most important and original outcomes of this study. This function is of primary importance in analytical scattering models from the sea surface and was up to now unavailable in field conditions. Due to the lack of precise reference measurements for the small-scale wave field, we could not quantify exactly the accuracy of the retrieval technique. However, it appeared clearly that the obtained accuracy is good enough for the estimation of second-order statistical quantities (such as the correlation function), acceptable for third-order quantities (such as the skwewness function) and insufficient for fourth-order quantities (such as kurtosis). Therefore, the stereo technique in the present stage should not be thought as a self-contained universal tool to characterize the surface statistics. Instead, it should be used in conjunction with other well calibrated but sparse reference measurement (such as wave gauges) for cross-validation and calibration. It then completes the statistical analysis in as much as it provides a snapshot of the three-dimensional field and allows for the evaluation of higher-order spatial statistics.
Advanced Statistics for Exotic Animal Practitioners.

PubMed

Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G

2017-09-01

Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
A Prototype System for Retrieval of Gene Functional Information

PubMed Central

Folk, Lillian C.; Patrick, Timothy B.; Pattison, James S.; Wolfinger, Russell D.; Mitchell, Joyce A.

2003-01-01

Microarrays allow researchers to gather data about the expression patterns of thousands of genes simultaneously. Statistical analysis can reveal which genes show statistically significant results. Making biological sense of those results requires the retrieval of functional information about the genes thus identified, typically a manual gene-by-gene retrieval of information from various on-line databases. For experiments generating thousands of genes of interest, retrieval of functional information can become a significant bottleneck. To address this issue, we are currently developing a prototype system to automate the process of retrieval of functional information from multiple on-line sources. PMID:14728346
An Investigation of the Fine Spatial Structure of Meteor Streams Using the Relational Database ``Meteor''

NASA Astrophysics Data System (ADS)

Karpov, A. V.; Yumagulov, E. Z.

2003-05-01

We have restored and ordered the archive of meteor observations carried out with a meteor radar complex ``KGU-M5'' since 1986. A relational database has been formed under the control of the Database Management System (DBMS) Oracle 8. We also improved and tested a statistical method for studying the fine spatial structure of meteor streams with allowance for the specific features of application of the DBMS. Statistical analysis of the results of observations made it possible to obtain information about the substance distribution in the Quadrantid, Geminid, and Perseid meteor streams.
Mild cognitive impairment and fMRI studies of brain functional connectivity: the state of the art

PubMed Central

Farràs-Permanyer, Laia; Guàrdia-Olmos, Joan; Peró-Cebollero, Maribel

2015-01-01

In the last 15 years, many articles have studied brain connectivity in Mild Cognitive Impairment patients with fMRI techniques, seemingly using different connectivity statistical models in each investigation to identify complex connectivity structures so as to recognize typical behavior in this type of patient. This diversity in statistical approaches may cause problems in results comparison. This paper seeks to describe how researchers approached the study of brain connectivity in MCI patients using fMRI techniques from 2002 to 2014. The focus is on the statistical analysis proposed by each research group in reference to the limitations and possibilities of those techniques to identify some recommendations to improve the study of functional connectivity. The included articles came from a search of Web of Science and PsycINFO using the following keywords: f MRI, MCI, and functional connectivity. Eighty-one papers were found, but two of them were discarded because of the lack of statistical analysis. Accordingly, 79 articles were included in this review. We summarized some parts of the articles, including the goal of every investigation, the cognitive paradigm and methods used, brain regions involved, use of ROI analysis and statistical analysis, emphasizing on the connectivity estimation model used in each investigation. The present analysis allowed us to confirm the remarkable variability of the statistical analysis methods found. Additionally, the study of brain connectivity in this type of population is not providing, at the moment, any significant information or results related to clinical aspects relevant for prediction and treatment. We propose to follow guidelines for publishing fMRI data that would be a good solution to the problem of study replication. The latter aspect could be important for future publications because a higher homogeneity would benefit the comparison between publications and the generalization of results. PMID:26300802

Quasi-experimental study designs series-paper 10: synthesizing evidence for effects collected from quasi-experimental studies presents surmountable challenges.

PubMed

Becker, Betsy Jane; Aloe, Ariel M; Duvendack, Maren; Stanley, T D; Valentine, Jeffrey C; Fretheim, Atle; Tugwell, Peter

2017-09-01

To outline issues of importance to analytic approaches to the synthesis of quasi-experiments (QEs) and to provide a statistical model for use in analysis. We drew on studies of statistics, epidemiology, and social-science methodology to outline methods for synthesis of QE studies. The design and conduct of QEs, effect sizes from QEs, and moderator variables for the analysis of those effect sizes were discussed. Biases, confounding, design complexities, and comparisons across designs offer serious challenges to syntheses of QEs. Key components of meta-analyses of QEs were identified, including the aspects of QE study design to be coded and analyzed. Of utmost importance are the design and statistical controls implemented in the QEs. Such controls and any potential sources of bias and confounding must be modeled in analyses, along with aspects of the interventions and populations studied. Because of such controls, effect sizes from QEs are more complex than those from randomized experiments. A statistical meta-regression model that incorporates important features of the QEs under review was presented. Meta-analyses of QEs provide particular challenges, but thorough coding of intervention characteristics and study methods, along with careful analysis, should allow for sound inferences. Copyright © 2017 Elsevier Inc. All rights reserved.
[Development of an Excel spreadsheet for meta-analysis of indirect and mixed treatment comparisons].

PubMed

Tobías, Aurelio; Catalá-López, Ferrán; Roqué, Marta

2014-01-01

Meta-analyses in clinical research usually aimed to evaluate treatment efficacy and safety in direct comparison with a unique comparator. Indirect comparisons, using the Bucher's method, can summarize primary data when information from direct comparisons is limited or nonexistent. Mixed comparisons allow combining estimates from direct and indirect comparisons, increasing statistical power. There is a need for simple applications for meta-analysis of indirect and mixed comparisons. These can easily be conducted using a Microsoft Office Excel spreadsheet. We developed a spreadsheet for indirect and mixed effects comparisons of friendly use for clinical researchers interested in systematic reviews, but non-familiarized with the use of more advanced statistical packages. The use of the proposed Excel spreadsheet for indirect and mixed comparisons can be of great use in clinical epidemiology to extend the knowledge provided by traditional meta-analysis when evidence from direct comparisons is limited or nonexistent.
Impact of ontology evolution on functional analyses.

PubMed

Groß, Anika; Hartung, Michael; Prüfer, Kay; Kelso, Janet; Rahm, Erhard

2012-10-15

Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.
RAD-ADAPT: Software for modelling clonogenic assay data in radiation biology.

PubMed

Zhang, Yaping; Hu, Kaiqiang; Beumer, Jan H; Bakkenist, Christopher J; D'Argenio, David Z

2017-04-01

We present a comprehensive software program, RAD-ADAPT, for the quantitative analysis of clonogenic assays in radiation biology. Two commonly used models for clonogenic assay analysis, the linear-quadratic model and single-hit multi-target model, are included in the software. RAD-ADAPT uses maximum likelihood estimation method to obtain parameter estimates with the assumption that cell colony count data follow a Poisson distribution. The program has an intuitive interface, generates model prediction plots, tabulates model parameter estimates, and allows automatic statistical comparison of parameters between different groups. The RAD-ADAPT interface is written using the statistical software R and the underlying computations are accomplished by the ADAPT software system for pharmacokinetic/pharmacodynamic systems analysis. The use of RAD-ADAPT is demonstrated using an example that examines the impact of pharmacologic ATM and ATR kinase inhibition on human lung cancer cell line A549 after ionizing radiation. Copyright © 2017 Elsevier B.V. All rights reserved.
Handling nonnormality and variance heterogeneity for quantitative sublethal toxicity tests.

PubMed

Ritz, Christian; Van der Vliet, Leana

2009-09-01

The advantages of using regression-based techniques to derive endpoints from environmental toxicity data are clear, and slowly, this superior analytical technique is gaining acceptance. As use of regression-based analysis becomes more widespread, some of the associated nuances and potential problems come into sharper focus. Looking at data sets that cover a broad spectrum of standard test species, we noticed that some model fits to data failed to meet two key assumptions-variance homogeneity and normality-that are necessary for correct statistical analysis via regression-based techniques. Failure to meet these assumptions often is caused by reduced variance at the concentrations showing severe adverse effects. Although commonly used with linear regression analysis, transformation of the response variable only is not appropriate when fitting data using nonlinear regression techniques. Through analysis of sample data sets, including Lemna minor, Eisenia andrei (terrestrial earthworm), and algae, we show that both the so-called Box-Cox transformation and use of the Poisson distribution can help to correct variance heterogeneity and nonnormality and so allow nonlinear regression analysis to be implemented. Both the Box-Cox transformation and the Poisson distribution can be readily implemented into existing protocols for statistical analysis. By correcting for nonnormality and variance heterogeneity, these two statistical tools can be used to encourage the transition to regression-based analysis and the depreciation of less-desirable and less-flexible analytical techniques, such as linear interpolation.
Seismicity map tools for earthquake studies

NASA Astrophysics Data System (ADS)

Boucouvalas, Anthony; Kaskebes, Athanasios; Tselikas, Nikos

2014-05-01

We report on the development of new and online set of tools for use within Google Maps, for earthquake research. We demonstrate this server based and online platform (developped with PHP, Javascript, MySQL) with the new tools using a database system with earthquake data. The platform allows us to carry out statistical and deterministic analysis on earthquake data use of Google Maps and plot various seismicity graphs. The tool box has been extended to draw on the map line segments, multiple straight lines horizontally and vertically as well as multiple circles, including geodesic lines. The application is demonstrated using localized seismic data from the geographic region of Greece as well as other global earthquake data. The application also offers regional segmentation (NxN) which allows the studying earthquake clustering, and earthquake cluster shift within the segments in space. The platform offers many filters such for plotting selected magnitude ranges or time periods. The plotting facility allows statistically based plots such as cumulative earthquake magnitude plots and earthquake magnitude histograms, calculation of 'b' etc. What is novel for the platform is the additional deterministic tools. Using the newly developed horizontal and vertical line and circle tools we have studied the spatial distribution trends of many earthquakes and we here show for the first time the link between Fibonacci Numbers and spatiotemporal location of some earthquakes. The new tools are valuable for examining visualizing trends in earthquake research as it allows calculation of statistics as well as deterministic precursors. We plan to show many new results based on our newly developed platform.
Statistical Analysis of a Large Sample Size Pyroshock Test Data Set Including Post Flight Data Assessment. Revision 1

NASA Technical Reports Server (NTRS)

Hughes, William O.; McNelis, Anne M.

2010-01-01

The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.
A phylogenetic transform enhances analysis of compositional microbiota data

PubMed Central

Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

2017-01-01

Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities. DOI: http://dx.doi.org/10.7554/eLife.21887.001 PMID:28198697
Structural texture similarity metrics for image analysis and retrieval.

PubMed

Zujovic, Jana; Pappas, Thrasyvoulos N; Neuhoff, David L

2013-07-01

We develop new metrics for texture similarity that accounts for human visual perception and the stochastic nature of textures. The metrics rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are essentially identical. The proposed metrics extend the ideas of structural similarity and are guided by research in texture analysis-synthesis. They are implemented using a steerable filter decomposition and incorporate a concise set of subband statistics, computed globally or in sliding windows. We conduct systematic tests to investigate metric performance in the context of "known-item search," the retrieval of textures that are "identical" to the query texture. This eliminates the need for cumbersome subjective tests, thus enabling comparisons with human performance on a large database. Our experimental results indicate that the proposed metrics outperform peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and its variations, as well as state-of-the-art texture classification metrics, using standard statistical measures.
An Adaptive Buddy Check for Observational Quality Control

NASA Technical Reports Server (NTRS)

Dee, Dick P.; Rukhovets, Leonid; Todling, Ricardo; DaSilva, Arlindo M.; Larson, Jay W.; Einaudi, Franco (Technical Monitor)

2000-01-01

An adaptive buddy check algorithm is presented that adjusts tolerances for outlier observations based on the variability of surrounding data. The algorithm derives from a statistical hypothesis test combined with maximum-likelihood covariance estimation. Its stability is shown to depend on the initial identification of outliers by a simple background check. The adaptive feature ensures that the final quality control decisions are not very sensitive to prescribed statistics of first-guess and observation errors, nor on other approximations introduced into the algorithm. The implementation of the algorithm in a global atmospheric data assimilation is described. Its performance is contrasted with that of a non-adaptive buddy check, for the surface analysis of an extreme storm that took place in Europe on 27 December 1999. The adaptive algorithm allowed the inclusion of many important observations that differed greatly from the first guess and that would have been excluded on the basis of prescribed statistics. The analysis of the storm development was much improved as a result of these additional observations.
Remote Sensing/gis Integration for Site Planning and Resource Management

NASA Technical Reports Server (NTRS)

Fellows, J. D.

1982-01-01

The development of an interactive/batch gridded information system (array of cells georeferenced to USGS quad sheets) and interfacing application programs (e.g., hydrologic models) is discussed. This system allows non-programer users to request any data set(s) stored in the data base by inputing any random polygon's (watershed, political zone) boundary points. The data base information contained within this polygon can be used to produce maps, statistics, and define model parameters for the area. Present/proposed conditions for the area may be compared by inputing future usage (land cover, soils, slope, etc.). This system, known as the Hydrologic Analysis Program (HAP), is especially effective in the real time analysis of proposed land cover changes on runoff hydrographs and graphics/statistics resource inventories of random study area/watersheds.
SDAR 1.0 a New Quantitative Toolkit for Analyze Stratigraphic Data

NASA Astrophysics Data System (ADS)

Ortiz, John; Moreno, Carlos; Cardenas, Andres; Jaramillo, Carlos

2015-04-01

Since the foundation of stratigraphy geoscientists have recognized that data obtained from stratigraphic columns (SC), two dimensional schemes recording descriptions of both geological and paleontological features (e.g., thickness of rock packages, grain size, fossil and lithological components, and sedimentary structures), are key elements for establishing reliable hypotheses about the distribution in space and time of rock sequences, and ancient sedimentary environmental and paleobiological dynamics. Despite the tremendous advances on the way geoscientists store, plot, and quantitatively analyze sedimentological and paleontological data (e.g., Macrostrat [http://www.macrostrat.org/], Paleobiology Database [http://www.paleodb.org/], respectively), there is still a lack of computational methodologies designed to quantitatively examine data from a highly detailed SCs. Moreover, frequently the stratigraphic information is plotted "manually" using vector graphics editors (e.g., Corel Draw, Illustrator), however, this information although store on a digital format, cannot be used readily for any quantitative analysis. Therefore, any attempt to examine the stratigraphic data in an analytical fashion necessarily takes further steps. Given these issues, we have developed the sofware 'Stratigraphic Data Analysis in R' (SDAR), which stores in a database all sedimentological, stratigraphic, and paleontological information collected from a SC, allowing users to generate high-quality graphic plots (including one or multiple features stored in the database). SDAR also encompasses quantitative analyses helping users to quantify stratigraphic information (e.g. grain size, sorting and rounding, proportion of sand/shale). Finally, given that the SDAR analysis module, has been written in the open-source high-level computer language "R graphics/statistics language" [R Development Core Team, 2014], it is already loaded with many of the crucial features required to accomplish basic and complex tasks of statistical analysis (i.e., R language provide more than hundred spatial libraries that allow users to explore various Geostatistics and spatial analysis). Consequently, SDAR allows a deeper exploration of the stratigraphic data collected in the field, it will allow the geoscientific community in the near future to develop complex analyses related with the distribution in space and time of rock sequences, such as lithofacial correlations, by a multivariate comparison between empirical SCs with quantitative lithofacial models established from modern sedimentary environments.
Order statistics applied to the most massive and most distant galaxy clusters

NASA Astrophysics Data System (ADS)

Waizmann, J.-C.; Ettori, S.; Bartelmann, M.

2013-06-01

In this work, we present an analytic framework for calculating the individual and joint distributions of the nth most massive or nth highest redshift galaxy cluster for a given survey characteristic allowing us to formulate Λ cold dark matter (ΛCDM) exclusion criteria. We show that the cumulative distribution functions steepen with increasing order, giving them a higher constraining power with respect to the extreme value statistics. Additionally, we find that the order statistics in mass (being dominated by clusters at lower redshifts) is sensitive to the matter density and the normalization of the matter fluctuations, whereas the order statistics in redshift is particularly sensitive to the geometric evolution of the Universe. For a fixed cosmology, both order statistics are efficient probes of the functional shape of the mass function at the high-mass end. To allow a quick assessment of both order statistics, we provide fits as a function of the survey area that allow percentile estimation with an accuracy better than 2 per cent. Furthermore, we discuss the joint distributions in the two-dimensional case and find that for the combination of the largest and the second largest observation, it is most likely to find them to be realized with similar values with a broadly peaked distribution. When combining the largest observation with higher orders, it is more likely to find a larger gap between the observations and when combining higher orders in general, the joint probability density function peaks more strongly. Having introduced the theory, we apply the order statistical analysis to the Southpole Telescope (SPT) massive cluster sample and metacatalogue of X-ray detected clusters of galaxies catalogue and find that the 10 most massive clusters in the sample are consistent with ΛCDM and the Tinker mass function. For the order statistics in redshift, we find a discrepancy between the data and the theoretical distributions, which could in principle indicate a deviation from the standard cosmology. However, we attribute this deviation to the uncertainty in the modelling of the SPT survey selection function. In turn, by assuming the ΛCDM reference cosmology, order statistics can also be utilized for consistency checks of the completeness of the observed sample and of the modelling of the survey selection function.
An experiment on the impact of a neonicotinoid pesticide on honeybees: the value of a formal analysis of the data.

PubMed

Schick, Robert S; Greenwood, Jeremy J D; Buckland, Stephen T

2017-01-01

We assess the analysis of the data resulting from a field experiment conducted by Pilling et al. (PLoS ONE. doi: 10.1371/journal.pone.0077193, 5) on the potential effects of thiamethoxam on honeybees. The experiment had low levels of replication, so Pilling et al. concluded that formal statistical analysis would be misleading. This would be true if such an analysis merely comprised tests of statistical significance and if the investigators concluded that lack of significance meant little or no effect. However, an analysis that includes estimation of the size of any effects-with confidence limits-allows one to reach conclusions that are not misleading and that produce useful insights. For the data of Pilling et al., we use straightforward statistical analysis to show that the confidence limits are generally so wide that any effects of thiamethoxam could have been large without being statistically significant. Instead of formal analysis, Pilling et al. simply inspected the data and concluded that they provided no evidence of detrimental effects and from this that thiamethoxam poses a "low risk" to bees. Conclusions derived from the inspection of the data were not just misleading in this case but also are unacceptable in principle, for if data are inadequate for a formal analysis (or only good enough to provide estimates with wide confidence intervals), then they are bound to be inadequate as a basis for reaching any sound conclusions. Given that the data in this case are largely uninformative with respect to the treatment effect, any conclusions reached from such informal approaches can do little more than reflect the prior beliefs of those involved.
Seasonal assessment and apportionment of surface water pollution using multivariate statistical methods: Sinos River, southern Brazil.

PubMed

Alves, Darlan Daniel; Riegel, Roberta Plangg; de Quevedo, Daniela Müller; Osório, Daniela Montanari Migliavacca; da Costa, Gustavo Marques; do Nascimento, Carlos Augusto; Telöken, Franko

2018-06-08

Assessment of surface water quality is an issue of currently high importance, especially in polluted rivers which provide water for treatment and distribution as drinking water, as is the case of the Sinos River, southern Brazil. Multivariate statistical techniques allow a better understanding of the seasonal variations in water quality, as well as the source identification and source apportionment of water pollution. In this study, the multivariate statistical techniques of cluster analysis (CA), principal component analysis (PCA), and positive matrix factorization (PMF) were used, along with the Kruskal-Wallis test and Spearman's correlation analysis in order to interpret a water quality data set resulting from a monitoring program conducted over a period of almost two years (May 2013 to April 2015). The water samples were collected from the raw water inlet of the municipal water treatment plant (WTP) operated by the Water and Sewage Services of Novo Hamburgo (COMUSA). CA allowed the data to be grouped into three periods (autumn and summer (AUT-SUM); winter (WIN); spring (SPR)). Through the PCA, it was possible to identify that the most important parameters in contribution to water quality variations are total coliforms (TCOLI) in SUM-AUT, water level (WL), water temperature (WT), and electrical conductivity (EC) in WIN and color (COLOR) and turbidity (TURB) in SPR. PMF was applied to the complete data set and enabled the source apportionment water pollution through three factors, which are related to anthropogenic sources, such as the discharge of domestic sewage (mostly represented by Escherichia coli (ECOLI)), industrial wastewaters, and agriculture runoff. The results provided by this study demonstrate the contribution provided by the use of integrated statistical techniques in the interpretation and understanding of large data sets of water quality, showing also that this approach can be used as an efficient methodology to optimize indicators for water quality assessment.
Single photon detection and signal analysis for high sensitivity dosimetry based on optically stimulated luminescence with beryllium oxide

NASA Astrophysics Data System (ADS)

Radtke, J.; Sponner, J.; Jakobi, C.; Schneider, J.; Sommer, M.; Teichmann, T.; Ullrich, W.; Henniger, J.; Kormoll, T.

2018-01-01

Single photon detection applied to optically stimulated luminescence (OSL) dosimetry is a promising approach due to the low level of luminescence light and the known statistical behavior of single photon events. Time resolved detection allows to apply a variety of different and independent data analysis methods. Furthermore, using amplitude modulated stimulation impresses time- and frequency information into the OSL light and therefore allows for additional means of analysis. Considering the impressed frequency information, data analysis by using Fourier transform algorithms or other digital filters can be used for separating the OSL signal from unwanted light or events generated by other phenomena. This potentially lowers the detection limits of low dose measurements and might improve the reproducibility and stability of obtained data. In this work, an OSL system based on a single photon detector, a fast and accurate stimulation unit and an FPGA is presented. Different analysis algorithms which are applied to the single photon data are discussed.
The statistical big bang of 1911: ideology, technological innovation and the production of medical statistics.

PubMed

Higgs, W

1996-12-01

This paper examines the relationship between intellectual debate, technologies for analysing information, and the production of statistics in the General Register Office (GRO) in London in the early twentieth century. It argues that controversy between eugenicists and public health officials respecting the cause and effect of class-specific variations in fertility led to the introduction of questions in the 1911 census on marital fertility. The increasing complexity of the census necessitated a shift from manual to mechanised forms of data processing within the GRO. The subsequent increase in processing power allowed the GRO to make important changes to the medical and demographic statistics it published in the annual Reports of the Registrar General. These included substituting administrative sanitary districts for registration districts as units of analysis, consistently transferring deaths in institutions back to place of residence, and abstracting deaths according to the International List of Causes of Death.
Robust inference for group sequential trials.

PubMed

Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

2017-03-01

For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.
Detailed Uncertainty Analysis of the ZEM-3 Measurement System

NASA Technical Reports Server (NTRS)

Mackey, Jon; Sehirlioglu, Alp; Dynys, Fred

2014-01-01

The measurement of Seebeck coefficient and electrical resistivity are critical to the investigation of all thermoelectric systems. Therefore, it stands that the measurement uncertainty must be well understood to report ZT values which are accurate and trustworthy. A detailed uncertainty analysis of the ZEM-3 measurement system has been performed. The uncertainty analysis calculates error in the electrical resistivity measurement as a result of sample geometry tolerance, probe geometry tolerance, statistical error, and multi-meter uncertainty. The uncertainty on Seebeck coefficient includes probe wire correction factors, statistical error, multi-meter uncertainty, and most importantly the cold-finger effect. The cold-finger effect plagues all potentiometric (four-probe) Seebeck measurement systems, as heat parasitically transfers through thermocouple probes. The effect leads to an asymmetric over-estimation of the Seebeck coefficient. A thermal finite element analysis allows for quantification of the phenomenon, and provides an estimate on the uncertainty of the Seebeck coefficient. The thermoelectric power factor has been found to have an uncertainty of +9-14 at high temperature and 9 near room temperature.
BEAT: Bioinformatics Exon Array Tool to store, analyze and visualize Affymetrix GeneChip Human Exon Array data from disease experiments

PubMed Central

2012-01-01

Background It is known from recent studies that more than 90% of human multi-exon genes are subject to Alternative Splicing (AS), a key molecular mechanism in which multiple transcripts may be generated from a single gene. It is widely recognized that a breakdown in AS mechanisms plays an important role in cellular differentiation and pathologies. Polymerase Chain Reactions, microarrays and sequencing technologies have been applied to the study of transcript diversity arising from alternative expression. Last generation Affymetrix GeneChip Human Exon 1.0 ST Arrays offer a more detailed view of the gene expression profile providing information on the AS patterns. The exon array technology, with more than five million data points, can detect approximately one million exons, and it allows performing analyses at both gene and exon level. In this paper we describe BEAT, an integrated user-friendly bioinformatics framework to store, analyze and visualize exon arrays datasets. It combines a data warehouse approach with some rigorous statistical methods for assessing the AS of genes involved in diseases. Meta statistics are proposed as a novel approach to explore the analysis results. BEAT is available at http://beat.ba.itb.cnr.it. Results BEAT is a web tool which allows uploading and analyzing exon array datasets using standard statistical methods and an easy-to-use graphical web front-end. BEAT has been tested on a dataset with 173 samples and tuned using new datasets of exon array experiments from 28 colorectal cancer and 26 renal cell cancer samples produced at the Medical Genetics Unit of IRCCS Casa Sollievo della Sofferenza. To highlight all possible AS events, alternative names, accession Ids, Gene Ontology terms and biochemical pathways annotations are integrated with exon and gene level expression plots. The user can customize the results choosing custom thresholds for the statistical parameters and exploiting the available clinical data of the samples for a multivariate AS analysis. Conclusions Despite exon array chips being widely used for transcriptomics studies, there is a lack of analysis tools offering advanced statistical features and requiring no programming knowledge. BEAT provides a user-friendly platform for a comprehensive study of AS events in human diseases, displaying the analysis results with easily interpretable and interactive tables and graphics. PMID:22536968

Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.

PubMed

Simovski, Boris; Kanduri, Chakravarthi; Gundersen, Sveinung; Titov, Dmytro; Domanska, Diana; Bock, Christoph; Bossini-Castillo, Lara; Chikina, Maria; Favorov, Alexander; Layer, Ryan M; Mironov, Andrey A; Quinlan, Aaron R; Sheffield, Nathan C; Trynka, Gosia; Sandve, Geir K

2018-06-05

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.
How can my research paper be useful for future meta-analyses on forest restoration practices?

Treesearch

Enrique Andivia; Pedro Villar‑Salvador; Juan A. Oliet; Jaime Puertolas; R. Kasten Dumroese

2018-01-01

Statistical meta-analysis is a powerful and useful tool to quantitatively synthesize the information conveyed in published studies on a particular topic. It allows identifying and quantifying overall patterns and exploring causes of variation. The inclusion of published works in meta-analyses requires, however, a minimum quality standard of the reported data and...
Parallel Climate Data Assimilation PSAS Package Achieves 18 GFLOPs on 512-Node Intel Paragon

NASA Technical Reports Server (NTRS)

Ding, H. Q.; Chan, C.; Gennery, D. B.; Ferraro, R. D.

1995-01-01

Several algorithms were added to the Physical-space Statistical Analysis System (PSAS) from Goddard, which assimilates observational weather data by correcting for different levels of uncertainty about the data and different locations for mobile observation platforms. The new algorithms and use of the 512-node Intel Paragon allowed a hundred-fold decrease in processing time.
Estimating clandestine activities from partially observed processes: illegal transhipment of fish catches among vessels

NASA Astrophysics Data System (ADS)

Wilcox, C.; Ford, J.

2016-12-01

Crimes involving fishers impose significant costs on fisheries, managers and national governments. These crimes also lead to unsustainable harvesting practices, as they undermine both knowledge of the status of fisheries stocks and limits on their harvesting. One of the greatest contributors to fisheries crimes globally is transfer of fish catch among vessels, otherwise known as transshipment. While legal transshipment provides economic advantages to vessels by increasing their efficiency, illegal transshipment can allow them to avoid regulations, catch prohibited species, and fish with impunity in prohibited locations such as waters of foreign countries. Despite the presence of a number of monitoring technologies for tracking fishing vessels, transshipment is frequently done clandestinely. Here we present a statistical model for transshipment in a Southeast Asian tuna fishery. We utilize both spatial and temporal information on vessel movement patterns in a statistical model to infer unobserved transshipment events among vessels. We provide a risk analysis framework for forecasting likely transshipment events, based on our analysis of vessel movement patterns. The tools we present are widely applicable to a variety of fisheries and types of tracking data, allowing managers to more effectively screen the large volume of data tracking systems create and quickly identify suspicious behavior.
A user-friendly workflow for analysis of Illumina gene expression bead array data available at the arrayanalysis.org portal.

PubMed

Eijssen, Lars M T; Goelela, Varshna S; Kelder, Thomas; Adriaens, Michiel E; Evelo, Chris T; Radonjic, Marijana

2015-06-30

Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows. To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina's GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools. The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.
Using Bayes' theorem for free energy calculations

NASA Astrophysics Data System (ADS)

Rogers, David M.

Statistical mechanics is fundamentally based on calculating the probabilities of molecular-scale events. Although Bayes' theorem has generally been recognized as providing key guiding principals for setup and analysis of statistical experiments [83], classical frequentist models still predominate in the world of computational experimentation. As a starting point for widespread application of Bayesian methods in statistical mechanics, we investigate the central quantity of free energies from this perspective. This dissertation thus reviews the basics of Bayes' view of probability theory, and the maximum entropy formulation of statistical mechanics before providing examples of its application to several advanced research areas. We first apply Bayes' theorem to a multinomial counting problem in order to determine inner shell and hard sphere solvation free energy components of Quasi-Chemical Theory [140]. We proceed to consider the general problem of free energy calculations from samples of interaction energy distributions. From there, we turn to spline-based estimation of the potential of mean force [142], and empirical modeling of observed dynamics using integrator matching. The results of this research are expected to advance the state of the art in coarse-graining methods, as they allow a systematic connection from high-resolution (atomic) to low-resolution (coarse) structure and dynamics. In total, our work on these problems constitutes a critical starting point for further application of Bayes' theorem in all areas of statistical mechanics. It is hoped that the understanding so gained will allow for improvements in comparisons between theory and experiment.
Trace element fingerprinting of jewellery rubies by external beam PIXE

NASA Astrophysics Data System (ADS)

Calligaro, T.; Poirot, J.-P.; Querré, G.

1999-04-01

External beam PIXE analysis allows the non-destructive in situ characterisation of gemstones mounted on jewellery pieces. This technique was used for the determination of the geographical origin of 64 rubies set on a high-valued necklace. The trace element content of these gemstones was measured and compared to that of a set of rubies of known sources. Multivariate statistical processing of the results allowed us to infer the provenance of rubies : one comes from Thailand/Cambodia deposit while the remaining are attributed to Burma. This highlights the complementary capabilities of PIXE and conventional gemological observations.
Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

PubMed

Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

2015-03-01

FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.
In vivo evaluation of the effect of stimulus distribution on FIR statistical efficiency in event-related fMRI

PubMed Central

Jansma, J Martijn; de Zwart, Jacco A; van Gelderen, Peter; Duyn, Jeff H; Drevets, Wayne C; Furey, Maura L

2013-01-01

Technical developments in MRI have improved signal to noise, allowing use of analysis methods such as Finite impulse response (FIR) of rapid event related functional MRI (er-fMRI). FIR is one of the most informative analysis methods as it determines onset and full shape of the hemodynamic response function (HRF) without any a-priori assumptions. FIR is however vulnerable to multicollinearity, which is directly related to the distribution of stimuli over time. Efficiency can be optimized by simplifying a design, and restricting stimuli distribution to specific sequences, while more design flexibility necessarily reduces efficiency. However, the actual effect of efficiency on fMRI results has never been tested in vivo. Thus, it is currently difficult to make an informed choice between protocol flexibility and statistical efficiency. The main goal of this study was to assign concrete fMRI signal to noise values to the abstract scale of FIR statistical efficiency. Ten subjects repeated a perception task with five random and m-sequence based protocol, with varying but, according to literature, acceptable levels of multicollinearity. Results indicated substantial differences in signal standard deviation, while the level was a function of multicollinearity. Experiment protocols varied up to 55.4% in standard deviation. Results confirm that quality of fMRI in an FIR analysis can significantly and substantially vary with statistical efficiency. Our in vivo measurements can be used to aid in making an informed decision between freedom in protocol design and statistical efficiency. PMID:23473798
Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.

PubMed

Martínez, María Jimena; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J

2015-01-01

The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.
Relating appendicular skeletal variation of sigmodontine rodents to locomotion modes in a phylogenetic context.

PubMed

Carvalho Coutinho, Ludmilla; Alves de Oliveira, João

2017-10-01

Sigmodontinae rodents constitute the second-largest subfamily among mammals. Alongside the taxonomic diversity, they are also ecologically diverse, exhibiting a wide array of locomotion modes, with semifossorial, terrestrial, semiaquatic, scansorial, arboreal, and saltatorial forms. To understand the ecomorphologic aspects that allow these rodents to display such locomotion diversity, we analyzed 35 qualitative characters of the appendicular skeleton (humerus, ulna, radius, scapula, femur, tibia, ilium, ischium and pubis) in 795 specimens belonging to 64 species, 34 genera and 10 tribes, representing all locomotion modes assigned to this subfamily. We performed a statistical analysis based upon the coefficient of trait differentiation to test the congruence of character states and the different locomotion modes. We also mapped characters states in a molecular phylogeny in order to reconstruct ancestral states and to evaluate how appendicular characters evolved within main lineages of Sigmodontinae radiation under a phylogenetic framework. The statistical analyses revealed six characters related to specific locomotion modes, except terrestrial. The mapping and parsimony ancestral states reconstruction identified two characters with phylogenetical signal and eight characters that are exclusively or more frequently recorded in certain modes of locomotion, four of them also detected by the statistical analysis. Notwithstanding the documented morphological variation, few changes characterize the transition to each of the locomotion modes, at least regarding the appendicular skeleton. This finding corroborates previous results that showed that sigmodontines exhibit an all-purpose appendicular morphology that allows them to use and explore a great variety of habitats. © 2017 Anatomical Society.
Logistic regression applied to natural hazards: rare event logistic regression with replications

NASA Astrophysics Data System (ADS)

Guns, M.; Vanacker, V.

2012-06-01

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Feature-Based Statistical Analysis of Combustion Simulation Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bennett, J; Krishnamoorthy, V; Liu, S

2011-11-18

We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing andmore » reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion science; however, it is applicable to many other science domains.« less
Illicit and pharmaceutical drug consumption estimated via wastewater analysis. Part B: placing back-calculations in a formal statistical framework.

PubMed

Jones, Hayley E; Hickman, Matthew; Kasprzyk-Hordern, Barbara; Welton, Nicky J; Baker, David R; Ades, A E

2014-07-15

Concentrations of metabolites of illicit drugs in sewage water can be measured with great accuracy and precision, thanks to the development of sensitive and robust analytical methods. Based on assumptions about factors including the excretion profile of the parent drug, routes of administration and the number of individuals using the wastewater system, the level of consumption of a drug can be estimated from such measured concentrations. When presenting results from these 'back-calculations', the multiple sources of uncertainty are often discussed, but are not usually explicitly taken into account in the estimation process. In this paper we demonstrate how these calculations can be placed in a more formal statistical framework by assuming a distribution for each parameter involved, based on a review of the evidence underpinning it. Using a Monte Carlo simulations approach, it is then straightforward to propagate uncertainty in each parameter through the back-calculations, producing a distribution for instead of a single estimate of daily or average consumption. This can be summarised for example by a median and credible interval. To demonstrate this approach, we estimate cocaine consumption in a large urban UK population, using measured concentrations of two of its metabolites, benzoylecgonine and norbenzoylecgonine. We also demonstrate a more sophisticated analysis, implemented within a Bayesian statistical framework using Markov chain Monte Carlo simulation. Our model allows the two metabolites to simultaneously inform estimates of daily cocaine consumption and explicitly allows for variability between days. After accounting for this variability, the resulting credible interval for average daily consumption is appropriately wider, representing additional uncertainty. We discuss possibilities for extensions to the model, and whether analysis of wastewater samples has potential to contribute to a prevalence model for illicit drug use. Copyright © 2014. Published by Elsevier B.V.
Illicit and pharmaceutical drug consumption estimated via wastewater analysis. Part B: Placing back-calculations in a formal statistical framework

PubMed Central

Jones, Hayley E.; Hickman, Matthew; Kasprzyk-Hordern, Barbara; Welton, Nicky J.; Baker, David R.; Ades, A.E.

2014-01-01

Concentrations of metabolites of illicit drugs in sewage water can be measured with great accuracy and precision, thanks to the development of sensitive and robust analytical methods. Based on assumptions about factors including the excretion profile of the parent drug, routes of administration and the number of individuals using the wastewater system, the level of consumption of a drug can be estimated from such measured concentrations. When presenting results from these ‘back-calculations’, the multiple sources of uncertainty are often discussed, but are not usually explicitly taken into account in the estimation process. In this paper we demonstrate how these calculations can be placed in a more formal statistical framework by assuming a distribution for each parameter involved, based on a review of the evidence underpinning it. Using a Monte Carlo simulations approach, it is then straightforward to propagate uncertainty in each parameter through the back-calculations, producing a distribution for instead of a single estimate of daily or average consumption. This can be summarised for example by a median and credible interval. To demonstrate this approach, we estimate cocaine consumption in a large urban UK population, using measured concentrations of two of its metabolites, benzoylecgonine and norbenzoylecgonine. We also demonstrate a more sophisticated analysis, implemented within a Bayesian statistical framework using Markov chain Monte Carlo simulation. Our model allows the two metabolites to simultaneously inform estimates of daily cocaine consumption and explicitly allows for variability between days. After accounting for this variability, the resulting credible interval for average daily consumption is appropriately wider, representing additional uncertainty. We discuss possibilities for extensions to the model, and whether analysis of wastewater samples has potential to contribute to a prevalence model for illicit drug use. PMID:24636801
Statistical tools for transgene copy number estimation based on real-time PCR.

PubMed

Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

2007-11-01

As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
DETECTING UNSPECIFIED STRUCTURE IN LOW-COUNT IMAGES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stein, Nathan M.; Dyk, David A. van; Kashyap, Vinay L.

Unexpected structure in images of astronomical sources often presents itself upon visual inspection of the image, but such apparent structure may either correspond to true features in the source or be due to noise in the data. This paper presents a method for testing whether inferred structure in an image with Poisson noise represents a significant departure from a baseline (null) model of the image. To infer image structure, we conduct a Bayesian analysis of a full model that uses a multiscale component to allow flexible departures from the posited null model. As a test statistic, we use a tailmore » probability of the posterior distribution under the full model. This choice of test statistic allows us to estimate a computationally efficient upper bound on a p-value that enables us to draw strong conclusions even when there are limited computational resources that can be devoted to simulations under the null model. We demonstrate the statistical performance of our method on simulated images. Applying our method to an X-ray image of the quasar 0730+257, we find significant evidence against the null model of a single point source and uniform background, lending support to the claim of an X-ray jet.« less
A statistical study of EMIC waves observed by Cluster: 1. Wave properties

NASA Astrophysics Data System (ADS)

Allen, R. C.; Zhang, J.-C.; Kistler, L. M.; Spence, H. E.; Lin, R.-L.; Klecker, B.; Dunlop, M. W.; André, M.; Jordanova, V. K.

2015-07-01

Electromagnetic ion cyclotron (EMIC) waves are an important mechanism for particle energization and losses inside the magnetosphere. In order to better understand the effects of these waves on particle dynamics, detailed information about the occurrence rate, wave power, ellipticity, normal angle, energy propagation angle distributions, and local plasma parameters are required. Previous statistical studies have used in situ observations to investigate the distribution of these parameters in the magnetic local time versus L-shell (MLT-L) frame within a limited magnetic latitude (MLAT) range. In this study, we present a statistical analysis of EMIC wave properties using 10 years (2001-2010) of data from Cluster, totaling 25,431 min of wave activity. Due to the polar orbit of Cluster, we are able to investigate EMIC waves at all MLATs and MLTs. This allows us to further investigate the MLAT dependence of various wave properties inside different MLT sectors and further explore the effects of Shabansky orbits on EMIC wave generation and propagation. The statistical analysis is presented in two papers. This paper focuses on the wave occurrence distribution as well as the distribution of wave properties. The companion paper focuses on local plasma parameters during wave observations as well as wave generation proxies.
Statistical characterization of planar two-dimensional Rayleigh-Taylor mixing layers

NASA Astrophysics Data System (ADS)

Sendersky, Dmitry

2000-10-01

The statistical evolution of a planar, randomly perturbed fluid interface subject to Rayleigh-Taylor instability is explored through numerical simulation in two space dimensions. The data set, generated by the front-tracking code FronTier, is highly resolved and covers a large ensemble of initial perturbations, allowing a more refined analysis of closure issues pertinent to the stochastic modeling of chaotic fluid mixing. We closely approach a two-fold convergence of the mean two-phase flow: convergence of the numerical solution under computational mesh refinement, and statistical convergence under increasing ensemble size. Quantities that appear in the two-phase averaged Euler equations are computed directly and analyzed for numerical and statistical convergence. Bulk averages show a high degree of convergence, while interfacial averages are convergent only in the outer portions of the mixing zone, where there is a coherent array of bubble and spike tips. Comparison with the familiar bubble/spike penetration law h = alphaAgt 2 is complicated by the lack of scale invariance, inability to carry the simulations to late time, the increasing Mach numbers of the bubble/spike tips, and sensitivity to the method of data analysis. Finally, we use the simulation data to analyze some constitutive properties of the mixing process.
Scaling Laws in Canopy Flows: A Wind-Tunnel Analysis

NASA Astrophysics Data System (ADS)

Segalini, Antonio; Fransson, Jens H. M.; Alfredsson, P. Henrik

2013-08-01

An analysis of velocity statistics and spectra measured above a wind-tunnel forest model is reported. Several measurement stations downstream of the forest edge have been investigated and it is observed that, while the mean velocity profile adjusts quickly to the new canopy boundary condition, the turbulence lags behind and shows a continuous penetration towards the free stream along the canopy model. The statistical profiles illustrate this growth and do not collapse when plotted as a function of the vertical coordinate. However, when the statistics are plotted as function of the local mean velocity (normalized with a characteristic velocity scale), they do collapse, independently of the streamwise position and freestream velocity. A new scaling for the spectra of all three velocity components is proposed based on the velocity variance and integral time scale. This normalization improves the collapse of the spectra compared to existing scalings adopted in atmospheric measurements, and allows the determination of a universal function that provides the velocity spectrum. Furthermore, a comparison of the proposed scaling laws for two different canopy densities is shown, demonstrating that the vertical velocity variance is the most sensible statistical quantity to the characteristics of the canopy roughness.

Digital correlation detector for low-cost Omega navigation

NASA Technical Reports Server (NTRS)

Chamberlin, K. A.

1976-01-01

Techniques to lower the cost of using the Omega global navigation network with phase-locked loops (PLL) were developed. The technique that was accepted as being "optimal" is called the memory-aided phase-locked loop (MAPLL) since it allows operation on all eight Omega time slots with one PLL through the implementation of a random access memory. The receiver front-end and the signals that it transmits to the PLL were first described. A brief statistical analysis of these signals was then made to allow a rough comparison between the front-end presented in this work and a commercially available front-end to be made. The hardware and theory of application of the MAPLL were described, ending with an analysis of data taken with the MAPLL. Some conclusions and recommendations were also given.
Algorithm for computing descriptive statistics for very large data sets and the exa-scale era

NASA Astrophysics Data System (ADS)

Beekman, Izaak

2017-11-01

An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for in situ analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity-useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by Pébay (Sandia Report SAND2008-6212). Pébay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jiménez, & Moser's (2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties.
Capturing rogue waves by multi-point statistics

NASA Astrophysics Data System (ADS)

Hadjihosseini, A.; Wächter, Matthias; Hoffmann, N. P.; Peinke, J.

2016-01-01

As an example of a complex system with extreme events, we investigate ocean wave states exhibiting rogue waves. We present a statistical method of data analysis based on multi-point statistics which for the first time allows the grasping of extreme rogue wave events in a highly satisfactory statistical manner. The key to the success of the approach is mapping the complexity of multi-point data onto the statistics of hierarchically ordered height increments for different time scales, for which we can show that a stochastic cascade process with Markov properties is governed by a Fokker-Planck equation. Conditional probabilities as well as the Fokker-Planck equation itself can be estimated directly from the available observational data. With this stochastic description surrogate data sets can in turn be generated, which makes it possible to work out arbitrary statistical features of the complex sea state in general, and extreme rogue wave events in particular. The results also open up new perspectives for forecasting the occurrence probability of extreme rogue wave events, and even for forecasting the occurrence of individual rogue waves based on precursory dynamics.
Statistics on continuous IBD data: Exact distribution evaluation for a pair of full(half)-sibs and a pair of a (great-) grandchild with a (great-) grandparent

PubMed Central

Stefanov, Valeri T

2002-01-01

Background Pairs of related individuals are widely used in linkage analysis. Most of the tests for linkage analysis are based on statistics associated with identity by descent (IBD) data. The current biotechnology provides data on very densely packed loci, and therefore, it may provide almost continuous IBD data for pairs of closely related individuals. Therefore, the distribution theory for statistics on continuous IBD data is of interest. In particular, distributional results which allow the evaluation of p-values for relevant tests are of importance. Results A technology is provided for numerical evaluation, with any given accuracy, of the cumulative probabilities of some statistics on continuous genome data for pairs of closely related individuals. In the case of a pair of full-sibs, the following statistics are considered: (i) the proportion of genome with 2 (at least 1) haplotypes shared identical-by-descent (IBD) on a chromosomal segment, (ii) the number of distinct pieces (subsegments) of a chromosomal segment, on each of which exactly 2 (at least 1) haplotypes are shared IBD. The natural counterparts of these statistics for the other relationships are also considered. Relevant Maple codes are provided for a rapid evaluation of the cumulative probabilities of such statistics. The genomic continuum model, with Haldane's model for the crossover process, is assumed. Conclusions A technology, together with relevant software codes for its automated implementation, are provided for exact evaluation of the distributions of relevant statistics associated with continuous genome data on closely related individuals. PMID:11996673
pROC: an open-source package for R and S+ to analyze and compare ROC curves.

PubMed

Robin, Xavier; Turck, Natacha; Hainard, Alexandre; Tiberti, Natalia; Lisacek, Frédérique; Sanchez, Jean-Charles; Müller, Markus

2011-03-17

Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface. With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC. pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation.
An Introduction to MAMA (Meta-Analysis of MicroArray data) System.

PubMed

Zhang, Zhe; Fenstermacher, David

2005-01-01

Analyzing microarray data across multiple experiments has been proven advantageous. To support this kind of analysis, we are developing a software system called MAMA (Meta-Analysis of MicroArray data). MAMA utilizes a client-server architecture with a relational database on the server-side for the storage of microarray datasets collected from various resources. The client-side is an application running on the end user's computer that allows the user to manipulate microarray data and analytical results locally. MAMA implementation will integrate several analytical methods, including meta-analysis within an open-source framework offering other developers the flexibility to plug in additional statistical algorithms.
A common base method for analysis of qPCR data and the application of simple blocking in qPCR experiments.

PubMed

Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J

2017-12-01

qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.
Climate Considerations Of The Electricity Supply Systems In Industries

NASA Astrophysics Data System (ADS)

Asset, Khabdullin; Zauresh, Khabdullina

2014-12-01

The study is focused on analysis of climate considerations of electricity supply systems in a pellet industry. The developed analysis model consists of two modules: statistical data of active power losses evaluation module and climate aspects evaluation module. The statistical data module is presented as a universal mathematical model of electrical systems and components of industrial load. It forms a basis for detailed accounting of power loss from the voltage levels. On the basis of the universal model, a set of programs is designed to perform the calculation and experimental research. It helps to obtain the statistical characteristics of the power losses and loads of the electricity supply systems and to define the nature of changes in these characteristics. Within the module, several methods and algorithms for calculating parameters of equivalent circuits of low- and high-voltage ADC and SD with a massive smooth rotor with laminated poles are developed. The climate aspects module includes an analysis of the experimental data of power supply system in pellet production. It allows identification of GHG emission reduction parameters: operation hours, type of electrical motors, values of load factor and deviation of standard value of voltage.
Applications of Ergodic Theory to Coverage Analysis

NASA Technical Reports Server (NTRS)

Lo, Martin W.

2003-01-01

The study of differential equations, or dynamical systems in general, has two fundamentally different approaches. We are most familiar with the construction of solutions to differential equations. Another approach is to study the statistical behavior of the solutions. Ergodic Theory is one of the most developed methods to study the statistical behavior of the solutions of differential equations. In the theory of satellite orbits, the statistical behavior of the orbits is used to produce 'Coverage Analysis' or how often a spacecraft is in view of a site on the ground. In this paper, we consider the use of Ergodic Theory for Coverage Analysis. This allows us to greatly simplify the computation of quantities such as the total time for which a ground station can see a satellite without ever integrating the trajectory, see Lo 1,2. More over, for any quantity which is an integrable function of the ground track, its average may be computed similarly without the integration of the trajectory. For example, the data rate for a simple telecom system is a function of the distance between the satellite and the ground station. We show that such a function may be averaged using the Ergodic Theorem.
Compositional data analysis for physical activity, sedentary time and sleep research.

PubMed

Dumuid, Dorothea; Stanford, Tyman E; Martin-Fernández, Josep-Antoni; Pedišić, Željko; Maher, Carol A; Lewis, Lucy K; Hron, Karel; Katzmarzyk, Peter T; Chaput, Jean-Philippe; Fogelholm, Mikael; Hu, Gang; Lambert, Estelle V; Maia, José; Sarmiento, Olga L; Standage, Martyn; Barreira, Tiago V; Broyles, Stephanie T; Tudor-Locke, Catrine; Tremblay, Mark S; Olds, Timothy

2017-01-01

The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.
Using Social Network Analysis to Better Understand Compulsive Exercise Behavior Among a Sample of Sorority Members.

PubMed

Patterson, Megan S; Goodson, Patricia

2017-05-01

Compulsive exercise, a form of unhealthy exercise often associated with prioritizing exercise and feeling guilty when exercise is missed, is a common precursor to and symptom of eating disorders. College-aged women are at high risk of exercising compulsively compared with other groups. Social network analysis (SNA) is a theoretical perspective and methodology allowing researchers to observe the effects of relational dynamics on the behaviors of people. SNA was used to assess the relationship between compulsive exercise and body dissatisfaction, physical activity, and network variables. Descriptive statistics were conducted using SPSS, and quadratic assignment procedure (QAP) analyses were conducted using UCINET. QAP regression analysis revealed a statistically significant model (R 2 = .375, P < .0001) predicting compulsive exercise behavior. Physical activity, body dissatisfaction, and network variables were statistically significant predictor variables in the QAP regression model. In our sample, women who are connected to "important" or "powerful" people in their network are likely to have higher compulsive exercise scores. This result provides healthcare practitioners key target points for intervention within similar groups of women. For scholars researching eating disorders and associated behaviors, this study supports looking into group dynamics and network structure in conjunction with body dissatisfaction and exercise frequency.
Bubble statistics in aged wet foams and the Fokker-Planck equation

NASA Astrophysics Data System (ADS)

Zimnyakov, D. A.; Yuvchenko, S. A.; Tzyipin, D. V.; Samorodina, T. V.

2018-04-01

Results of the experimental study of changes in the bubble size statistics during aging of wet foams are discussed. It is proposed that the evolution of the bubble radii distributions can be described in terms of the one dimensional Fokker- Planck equation. The empirical distributions of the bubble radii exhibit a self-similarity of their shapes and can be transformed to a time-independent form using the radius renormalization. Analysis of obtained data allows us to suggest that the drift term of the Fokker-Planck equation dominates in comparison with the diffusion term in the case of aging of isolated quasi-stable wet foams.
Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

NASA Astrophysics Data System (ADS)

Donges, Jonathan F.; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik V.; Marwan, Norbert; Dijkstra, Henk A.; Kurths, Jürgen

2015-11-01

We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology.
Power in randomized group comparisons: the value of adding a single intermediate time point to a traditional pretest-posttest design.

PubMed

Venter, Anre; Maxwell, Scott E; Bolig, Erika

2002-06-01

Adding a pretest as a covariate to a randomized posttest-only design increases statistical power, as does the addition of intermediate time points to a randomized pretest-posttest design. Although typically 5 waves of data are required in this instance to produce meaningful gains in power, a 3-wave intensive design allows the evaluation of the straight-line growth model and may reduce the effect of missing data. The authors identify the statistically most powerful method of data analysis in the 3-wave intensive design. If straight-line growth is assumed, the pretest-posttest slope must assume fairly extreme values for the intermediate time point to increase power beyond the standard analysis of covariance on the posttest with the pretest as covariate, ignoring the intermediate time point.
On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.

PubMed

Li, Bing; Chun, Hyonho; Zhao, Hongyu

2014-09-01

We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.
EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis.

PubMed

Delorme, Arnaud; Makeig, Scott

2004-03-15

We have developed a toolbox and graphic user interface, EEGLAB, running under the crossplatform MATLAB environment (The Mathworks, Inc.) for processing collections of single-trial and/or averaged EEG data of any number of channels. Available functions include EEG data, channel and event information importing, data visualization (scrolling, scalp map and dipole model plotting, plus multi-trial ERP-image plots), preprocessing (including artifact rejection, filtering, epoch selection, and averaging), independent component analysis (ICA) and time/frequency decompositions including channel and component cross-coherence supported by bootstrap statistical methods based on data resampling. EEGLAB functions are organized into three layers. Top-layer functions allow users to interact with the data through the graphic interface without needing to use MATLAB syntax. Menu options allow users to tune the behavior of EEGLAB to available memory. Middle-layer functions allow users to customize data processing using command history and interactive 'pop' functions. Experienced MATLAB users can use EEGLAB data structures and stand-alone signal processing functions to write custom and/or batch analysis scripts. Extensive function help and tutorial information are included. A 'plug-in' facility allows easy incorporation of new EEG modules into the main menu. EEGLAB is freely available (http://www.sccn.ucsd.edu/eeglab/) under the GNU public license for noncommercial use and open source development, together with sample data, user tutorial and extensive documentation.
SU-F-T-227: A Comprehensive Patient Specific, Structure Specific, Pre-Treatment 3D QA Protocol for IMRT, SBRT and VMAT - Clinical Experience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gueorguiev, G; Cotter, C; Young, M

2016-06-15

Purpose: To present a 3D QA method and clinical results for 550 patients. Methods: Five hundred and fifty patient treatment deliveries (400 IMRT, 75 SBRT and 75 VMAT) from various treatment sites, planned on Raystation treatment planning system (TPS), were measured on three beam-matched Elekta linear accelerators using IBA’s COMPASS system. The difference between TPS computed and delivered dose was evaluated in 3D by applying three statistical parameters to each structure of interest: absolute average dose difference (AADD, 6% allowed difference), absolute dose difference greater than 6% (ADD6, 4% structure volume allowed to fail) and 3D gamma test (3%/3mm DTA,more » 4% structure volume allowed to fail). If the allowed value was not met for a given structure, manual review was performed. The review consisted of overlaying dose difference or gamma results with the patient CT, scrolling through the slices. For QA to pass, areas of high dose difference or gamma must be small and not on consecutive slices. For AADD to manually pass QA, the average dose difference in cGy must be less than 50cGy. The QA protocol also includes DVH analysis based on QUANTEC and TG-101 recommended dose constraints. Results: Figures 1–3 show the results for the three parameters per treatment modality. Manual review was performed on 67 deliveries (27 IMRT, 22 SBRT and 18 VMAT), for which all passed QA. Results show that statistical parameter AADD may be overly sensitive for structures receiving low dose, especially for the SBRT deliveries (Fig.1). The TPS computed and measured DVH values were in excellent agreement and with minimum difference. Conclusion: Applying DVH analysis and different statistical parameters to any structure of interest, as part of the 3D QA protocol, provides a comprehensive treatment plan evaluation. Author G. Gueorguiev discloses receiving travel and research funding from IBA for unrelated to this project work. Author B. Crawford discloses receiving travel funding from IBA for unrelated to this project work.« less
Identification of phases, symmetries and defects through local crystallography

DOE PAGES

Belianinov, Alex; He, Qian; Kravchenko, Mikhail; ...

2015-07-20

Here we report that advances in electron and probe microscopies allow 10 pm or higher precision in measurements of atomic positions. This level of fidelity is sufficient to correlate the length (and hence energy) of bonds, as well as bond angles to functional properties of materials. Traditionally, this relied on mapping locally measured parameters to macroscopic variables, for example, average unit cell. This description effectively ignores the information contained in the microscopic degrees of freedom available in a high-resolution image. Here we introduce an approach for local analysis of material structure based on statistical analysis of individual atomic neighbourhoods. Clusteringmore » and multivariate algorithms such as principal component analysis explore the connectivity of lattice and bond structure, as well as identify minute structural distortions, thus allowing for chemical description and identification of phases. This analysis lays the framework for building image genomes and structure–property libraries, based on conjoining structural and spectral realms through local atomic behaviour.« less
Combined acute ecotoxicity of malathion and deltamethrin to Daphnia magna (Crustacea, Cladocera): comparison of different data analysis approaches.

PubMed

Toumi, Héla; Boumaiza, Moncef; Millet, Maurice; Radetski, Claudemir Marcos; Camara, Baba Issa; Felten, Vincent; Masfaraud, Jean-François; Férard, Jean-François

2018-04-19

We studied the combined acute effect (i.e., after 48 h) of deltamethrin (a pyrethroid insecticide) and malathion (an organophosphate insecticide) on Daphnia magna. Two approaches were used to examine the potential interaction effects of eight mixtures of deltamethrin and malathion: (i) calculation of mixture toxicity index (MTI) and safety factor index (SFI) and (ii) response surface methodology coupled with isobole-based statistical model (using generalized linear model). According to the calculation of MTI and SFI, one tested mixture was found additive while the two other tested mixtures were found no additive (MTI) or antagonistic (SFI), but these differences between index responses are only due to differences in terminology related to these two indexes. Through the surface response approach and isobologram analysis, we concluded that there was a significant antagonistic effect of the binary mixtures of deltamethrin and malathion that occurs on D. magna immobilization, after 48 h of exposure. Index approaches and surface response approach with isobologram analysis are complementary. Calculation of mixture toxicity index and safety factor index allows identifying punctually the type of interaction for several tested mixtures, while the surface response approach with isobologram analysis integrates all the data providing a global outcome about the type of interactive effect. Only the surface response approach and isobologram analysis allowed the statistical assessment of the ecotoxicological interaction. Nevertheless, we recommend the use of both approaches (i) to identify the combined effects of contaminants and (ii) to improve risk assessment and environmental management.
MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.

PubMed

Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin

2015-04-01

Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

GARNET--gene set analysis with exploration of annotation relations.

PubMed

Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

2011-02-15

Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Persistence of discrimination: Revisiting Axtell, Epstein and Young

NASA Astrophysics Data System (ADS)

Weisbuch, Gérard

2018-02-01

We reformulate an earlier model of the "Emergence of classes..." proposed by Axtell et al. (2001) using more elaborate cognitive processes allowing a statistical physics approach. The thorough analysis of the phase space and of the basins of attraction leads to a reconsideration of the previous social interpretations: our model predicts the reinforcement of discrimination biases and their long term stability rather than the emergence of classes.
GeneTools--application for functional annotation and statistical hypothesis testing.

PubMed

Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid

2006-10-24

Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no
Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

NASA Astrophysics Data System (ADS)

Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.

2013-04-01

Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.
Wavelet analysis in ecology and epidemiology: impact of statistical tests

PubMed Central

Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario

2014-01-01

Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892
Wavelet analysis in ecology and epidemiology: impact of statistical tests.

PubMed

Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario

2014-02-06

Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.
Statistical testing of association between menstruation and migraine.

PubMed

Barra, Mathias; Dahl, Fredrik A; Vetvik, Kjersti G

2015-02-01

To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation. Menstrually related migraine (MRM) affects about 20% of female migraineurs in the general population. The exact pathophysiological link from menstruation to migraine is hypothesized to be through fluctuations in female reproductive hormones, but the exact mechanisms remain unknown. Therefore, the main diagnostic criterion today is concurrency of migraine attacks with menstruation. Methods aiming to exclude spurious associations are wanted, so that further research into these mechanisms can be performed on a population with a true association. The statistical method is based on a simple two-parameter null model of MRM (which allows for simulation modeling), and Fisher's exact test (with mid-p correction) applied to standard 2 × 2 contingency tables derived from the patients' headache diaries. Our method is a corrected version of a previously published flawed framework. To our best knowledge, no other published methods for establishing a menstruation-migraine association by statistical means exist today. The probabilistic methodology shows good performance when subjected to receiver operator characteristic curve analysis. Quick reference cutoff values for the clinical setting were tabulated for assessing association given a patient's headache history. In this paper, we correct a proposed method for establishing association between menstruation and migraine by statistical methods. We conclude that the proposed standard of 3-cycle observations prior to setting an MRM diagnosis should be extended with at least one perimenstrual window to obtain sufficient information for statistical processing. © 2014 American Headache Society.
Health research and systems' governance are at risk: should the right to data protection override health?

PubMed

Di Iorio, C T; Carinci, F; Oderkirk, J

2014-07-01

The European Union (EU) Data Protection Regulation will have profound implications for public health, health services research and statistics in Europe. The EU Commission's Proposal was a breakthrough in balancing privacy rights and rights to health and healthcare. The European Parliament, however, has proposed extensive amendments. This paper reviews the amendments proposed by the European Parliament Committee on Civil Liberties, Justice and Home Affairs and their implications for health research and statistics. The amendments eliminate most innovations brought by the Proposal. Notably, derogation to the general prohibition of processing sensitive data shall be allowed for public interests such as the management of healthcare services,but not health research, monitoring, surveillance and governance. The processing of personal health data for historical, statistical or scientific purposes shall be allowed only with the consent of the data subject or if the processing serves an exceptionally high public interest, cannot be performed otherwise and is legally authorised. Research, be it academic, government,corporate or market research, falls under the same rule.The proposed amendments will make difficult or render impossible research and statistics involving the linkage and analysis of the wealth of data from clinical,administrative, insurance and survey sources, which have contributed to improving health outcomes and health systems performance and governance; and may illegitimise efforts that have been made in some European countries to enable privacy-respectful data use for research and statistical purposes. If the amendments stand as written, the right to privacy is likely to override the right to health and healthcare in Europe.
Multielement analysis of Canadian wines by inductively coupled plasma mass spectrometry (ICP-MS) and multivariate statistics.

PubMed

Taylor, Vivien F; Longerich, Henry P; Greenough, John D

2003-02-12

Trace element fingerprints were deciphered for wines from Canada's two major wine-producing regions, the Okanagan Valley and the Niagara Peninsula, for the purpose of examining differences in wine element composition with region of origin and identifying elements important to determining provenance. Analysis by ICP-MS allowed simultaneous determination of 34 trace elements in wine (Li, Be, Mg, Al, P, Cl, Ca, Ti, V, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Br, Rb, Sr, Mo, Ag, Cd, Sb, I, Cs, Ba, La, Ce, Tl, Pb, Bi, Th, and U) at low levels of detection, and patterns in trace element concentrations were deciphered by multivariate statistical analysis. The two regions were discriminated with 100% accuracy using 10 of these elements. Differences in soil chemistry between the Niagara and Okanagan vineyards were evident, without a good correlation between soil and wine composition. The element Sr was found to be a good indicator of provenance and has been reported in fingerprinting studies of other regions.
Relevant principal component analysis applied to the characterisation of Portuguese heather honey.

PubMed

Martins, Rui C; Lopes, Victor V; Valentão, Patrícia; Carvalho, João C M F; Isabel, Paulo; Amaral, Maria T; Batista, Maria T; Andrade, Paula B; Silva, Branca M

2008-01-01

The main purpose of this study was the characterisation of 'Serra da Lousã' heather honey by using novel statistical methodology, relevant principal component analysis, in order to assess the correlations between production year, locality and composition. Herein, we also report its chemical composition in terms of sugars, glycerol and ethanol, and physicochemical parameters. Sugars profiles from 'Serra da Lousã' heather and 'Terra Quente de Trás-os-Montes' lavender honeys were compared and allowed the discrimination: 'Serra da Lousã' honeys do not contain sucrose, generally exhibit lower contents of turanose, trehalose and maltose and higher contents of fructose and glucose. Different localities from 'Serra da Lousã' provided groups of samples with high and low glycerol contents. Glycerol and ethanol contents were revealed to be independent of the sugars profiles. These data and statistical models can be very useful in the comparison and detection of adulterations during the quality control analysis of 'Serra da Lousã' honey.
Estimating size and scope economies in the Portuguese water sector using the Bayesian stochastic frontier analysis.

PubMed

Carvalho, Pedro; Marques, Rui Cunha

2016-02-15

This study aims to search for economies of size and scope in the Portuguese water sector applying Bayesian and classical statistics to make inference in stochastic frontier analysis (SFA). This study proves the usefulness and advantages of the application of Bayesian statistics for making inference in SFA over traditional SFA which just uses classical statistics. The resulting Bayesian methods allow overcoming some problems that arise in the application of the traditional SFA, such as the bias in small samples and skewness of residuals. In the present case study of the water sector in Portugal, these Bayesian methods provide more plausible and acceptable results. Based on the results obtained we found that there are important economies of output density, economies of size, economies of vertical integration and economies of scope in the Portuguese water sector, pointing out to the huge advantages in undertaking mergers by joining the retail and wholesale components and by joining the drinking water and wastewater services. Copyright © 2015 Elsevier B.V. All rights reserved.
A guide to statistical analysis in microbial ecology: a community-focused, living review of multivariate data analyses.

PubMed

Buttigieg, Pier Luigi; Ramette, Alban

2014-12-01

The application of multivariate statistical analyses has become a consistent feature in microbial ecology. However, many microbial ecologists are still in the process of developing a deep understanding of these methods and appreciating their limitations. As a consequence, staying abreast of progress and debate in this arena poses an additional challenge to many microbial ecologists. To address these issues, we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME): a dynamic, web-based resource providing accessible descriptions of numerous multivariate techniques relevant to microbial ecologists. A combination of interactive elements allows users to discover and navigate between methods relevant to their needs and examine how they have been used by others in the field. We have designed GUSTA ME to become a community-led and -curated service, which we hope will provide a common reference and forum to discuss and disseminate analytical techniques relevant to the microbial ecology community. © 2014 The Authors. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

PubMed Central

Ré, Miguel A.; Azad, Rajeev K.

2014-01-01

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338
Generalization of entropy based divergence measures for symbolic sequence analysis.

PubMed

Ré, Miguel A; Azad, Rajeev K

2014-01-01

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
In vivo evaluation of the effect of stimulus distribution on FIR statistical efficiency in event-related fMRI.

PubMed

Jansma, J Martijn; de Zwart, Jacco A; van Gelderen, Peter; Duyn, Jeff H; Drevets, Wayne C; Furey, Maura L

2013-05-15

Technical developments in MRI have improved signal to noise, allowing use of analysis methods such as Finite impulse response (FIR) of rapid event related functional MRI (er-fMRI). FIR is one of the most informative analysis methods as it determines onset and full shape of the hemodynamic response function (HRF) without any a priori assumptions. FIR is however vulnerable to multicollinearity, which is directly related to the distribution of stimuli over time. Efficiency can be optimized by simplifying a design, and restricting stimuli distribution to specific sequences, while more design flexibility necessarily reduces efficiency. However, the actual effect of efficiency on fMRI results has never been tested in vivo. Thus, it is currently difficult to make an informed choice between protocol flexibility and statistical efficiency. The main goal of this study was to assign concrete fMRI signal to noise values to the abstract scale of FIR statistical efficiency. Ten subjects repeated a perception task with five random and m-sequence based protocol, with varying but, according to literature, acceptable levels of multicollinearity. Results indicated substantial differences in signal standard deviation, while the level was a function of multicollinearity. Experiment protocols varied up to 55.4% in standard deviation. Results confirm that quality of fMRI in an FIR analysis can significantly and substantially vary with statistical efficiency. Our in vivo measurements can be used to aid in making an informed decision between freedom in protocol design and statistical efficiency. Published by Elsevier B.V.
Fully Bayesian tests of neutrality using genealogical summary statistics.

PubMed

Drummond, Alexei J; Suchard, Marc A

2008-10-31

Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
Some challenges with statistical inference in adaptive designs.

PubMed

Hung, H M James; Wang, Sue-Jane; Yang, Peiling

2014-01-01

Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.
SpecViz: Interactive Spectral Data Analysis

NASA Astrophysics Data System (ADS)

Earl, Nicholas Michael; STScI

2016-06-01

The astronomical community is about to enter a new generation of scientific enterprise. With next-generation instrumentation and advanced capabilities, the need has arisen to equip astronomers with the necessary tools to deal with large, multi-faceted data. The Space Telescope Science Institute has initiated a data analysis forum for the creation, development, and maintenance of software tools for the interpretation of these new data sets. SpecViz is a spectral 1-D interactive visualization and analysis application built with Python in an open source development environment. A user-friendly GUI allows for a fast, interactive approach to spectral analysis. SpecViz supports handling of unique and instrument-specific data, incorporation of advanced spectral unit handling and conversions in a flexible, high-performance interactive plotting environment. Active spectral feature analysis is possible through interactive measurement and statistical tools. It can be used to build wide-band SEDs, with the capability of combining or overplotting data products from various instruments. SpecViz sports advanced toolsets for filtering and detrending spectral lines; identifying, isolating, and manipulating spectral features; as well as utilizing spectral templates for renormalizing data in an interactive way. SpecViz also includes a flexible model fitting toolset that allows for multi-component models, as well as custom models, to be used with various fitting and decomposition routines. SpecViz also features robust extension via custom data loaders and connection to the central communication system underneath the interface for more advanced control. Incorporation with Jupyter notebooks via connection with the active iPython kernel allows for SpecViz to be used in addition to a user’s normal workflow without demanding the user drastically alter their method of data analysis. In addition, SpecViz allows the interactive analysis of multi-object spectroscopy in the same straight-forward, consistent way. Through the development of such tools, STScI hopes to unify astronomical data analysis software for JWST and other instruments, allowing for efficient, reliable, and consistent scientific results.
Annotating spatio-temporal datasets for meaningful analysis in the Web

NASA Astrophysics Data System (ADS)

Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

2014-05-01

More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.
PageMan: an interactive ontology tool to generate, display, and annotate overview graphs for profiling experiments.

PubMed

Usadel, Björn; Nagel, Axel; Steinhauser, Dirk; Gibon, Yves; Bläsing, Oliver E; Redestig, Henning; Sreenivasulu, Nese; Krall, Leonard; Hannah, Matthew A; Poree, Fabien; Fernie, Alisdair R; Stitt, Mark

2006-12-18

Microarray technology has become a widely accepted and standardized tool in biology. The first microarray data analysis programs were developed to support pair-wise comparison. However, as microarray experiments have become more routine, large scale experiments have become more common, which investigate multiple time points or sets of mutants or transgenics. To extract biological information from such high-throughput expression data, it is necessary to develop efficient analytical platforms, which combine manually curated gene ontologies with efficient visualization and navigation tools. Currently, most tools focus on a few limited biological aspects, rather than offering a holistic, integrated analysis. Here we introduce PageMan, a multiplatform, user-friendly, and stand-alone software tool that annotates, investigates, and condenses high-throughput microarray data in the context of functional ontologies. It includes a GUI tool to transform different ontologies into a suitable format, enabling the user to compare and choose between different ontologies. It is equipped with several statistical modules for data analysis, including over-representation analysis and Wilcoxon statistical testing. Results are exported in a graphical format for direct use, or for further editing in graphics programs.PageMan provides a fast overview of single treatments, allows genome-level responses to be compared across several microarray experiments covering, for example, stress responses at multiple time points. This aids in searching for trait-specific changes in pathways using mutants or transgenics, analyzing development time-courses, and comparison between species. In a case study, we analyze the results of publicly available microarrays of multiple cold stress experiments using PageMan, and compare the results to a previously published meta-analysis.PageMan offers a complete user's guide, a web-based over-representation analysis as well as a tutorial, and is freely available at http://mapman.mpimp-golm.mpg.de/pageman/. PageMan allows multiple microarray experiments to be efficiently condensed into a single page graphical display. The flexible interface allows data to be quickly and easily visualized, facilitating comparisons within experiments and to published experiments, thus enabling researchers to gain a rapid overview of the biological responses in the experiments.

In Situ Distribution Guided Analysis and Visualization of Transonic Jet Engine Simulations.

PubMed

Dutta, Soumya; Chen, Chun-Ming; Heinlein, Gregory; Shen, Han-Wei; Chen, Jen-Ping

2017-01-01

Study of flow instability in turbine engine compressors is crucial to understand the inception and evolution of engine stall. Aerodynamics experts have been working on detecting the early signs of stall in order to devise novel stall suppression technologies. A state-of-the-art Navier-Stokes based, time-accurate computational fluid dynamics simulator, TURBO, has been developed in NASA to enhance the understanding of flow phenomena undergoing rotating stall. Despite the proven high modeling accuracy of TURBO, the excessive simulation data prohibits post-hoc analysis in both storage and I/O time. To address these issues and allow the expert to perform scalable stall analysis, we have designed an in situ distribution guided stall analysis technique. Our method summarizes statistics of important properties of the simulation data in situ using a probabilistic data modeling scheme. This data summarization enables statistical anomaly detection for flow instability in post analysis, which reveals the spatiotemporal trends of rotating stall for the expert to conceive new hypotheses. Furthermore, the verification of the hypotheses and exploratory visualization using the summarized data are realized using probabilistic visualization techniques such as uncertain isocontouring. Positive feedback from the domain scientist has indicated the efficacy of our system in exploratory stall analysis.
Application of statistical experimental design to study the formulation variables influencing the coating process of lidocaine liposomes.

PubMed

González-Rodríguez, M L; Barros, L B; Palma, J; González-Rodríguez, P L; Rabasco, A M

2007-06-07

In this paper, we have used statistical experimental design to investigate the effect of several factors in coating process of lidocaine hydrochloride (LID) liposomes by a biodegradable polymer (chitosan, CH). These variables were the concentration of CH coating solution, the dripping rate of this solution on the liposome colloidal dispersion, the stirring rate, the time since the liposome production to the liposome coating and finally the amount of drug entrapped into liposomes. The selected response variables were drug encapsulation efficiency (EE, %), coating efficiency (CE, %) and zeta potential. Liposomes were obtained by thin-layer evaporation method. They were subsequently coated with CH according the experimental plan provided by a fractional factorial (2(5-1)) screening matrix. We have used spectroscopic methods to determine the zeta potential values. The EE (%) assay was carried out in dialysis bags and the brilliant red probe was used to determine CE (%) due to its property of forming molecular complexes with CH. The graphic analysis of the effects allowed the identification of the main formulation and technological factors by the analysis of the selected responses and permitted the determination of the proper level of these factors for the response improvement. Moreover, fractional design allowed quantifying the interactions between the factors, which will consider in next experiments. The results obtained pointed out that LID amount was the predominant factor that increased the drug entrapment capacity (EE). The CE (%) response was mainly affected by the concentration of the CH solution and the stirring rate, although all the interactions between the main factors have statistical significance.
Data Warehousing at the Marine Corps Institute

DTIC Science & Technology

2003-09-01

applications exists for several reasons. It allows for data to be extracted from many sources, by “cleaned”, and stored into one large data facility ...exists. Key individuals at MCI, or the so called “knowledge workers” will be educated , and try to brainstorm possible data relationships that can...They include querying and reporting, On-Line Analytical Processing (OLAP) and statistical analysis, and data mining. 1. Queries and Reports The
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jarocki, John Charles; Zage, David John; Fisher, Andrew N.

LinkShop is a software tool for applying the method of Linkography to the analysis time-sequence data. LinkShop provides command line, web, and application programming interfaces (API) for input and processing of time-sequence data, abstraction models, and ontologies. The software creates graph representations of the abstraction model, ontology, and derived linkograph. Finally, the tool allows the user to perform statistical measurements of the linkograph and refine the ontology through direct manipulation of the linkograph.
A Classification and Analysis of National Contract Management Journal Articles from 1966 Through 1989

DTIC Science & Technology

1991-06-01

THEORETICAL, NORMATIVE, EMPIRICAL, INDUCTIVE [24] "New Approaches for Quantifying Risk and Determining Sharing Arrangements," Raymond S. Lieber, pp...the negotiation process can be enhanced by quantifying risk using statistical methods. The article discusses two approaches which allow the...in Incentive Contracting," Melvin W. Lifson, pp. 59-80. The purpose to this article is to suggest a general approach for defining and quantifying
Development of Intelligent Unmanned Systems

DTIC Science & Technology

2011-05-01

statistical analysis on the terrain map. The data points are stored in the corresponding cells of the Traversability Grid as a linked list of 3-D Cartesian...allowed for multiple configurations of specified data to be as flexible as possible. For example, when an object is being created the knowledge store ...library was also used for querying and storing spatial data . It provided many geometric abstractions necessary such as overlap and intersects
Robust detection of multiple sclerosis lesions from intensity-normalized multi-channel MRI

NASA Astrophysics Data System (ADS)

Karpate, Yogesh; Commowick, Olivier; Barillot, Christian

2015-03-01

Multiple sclerosis (MS) is a disease with heterogeneous evolution among the patients. Quantitative analysis of longitudinal Magnetic Resonance Images (MRI) provides a spatial analysis of the brain tissues which may lead to the discovery of biomarkers of disease evolution. Better understanding of the disease will lead to a better discovery of pathogenic mechanisms, allowing for patient-adapted therapeutic strategies. To characterize MS lesions, we propose a novel paradigm to detect white matter lesions based on a statistical framework. It aims at studying the benefits of using multi-channel MRI to detect statistically significant differences between each individual MS patient and a database of control subjects. This framework consists in two components. First, intensity standardization is conducted to minimize the inter-subject intensity difference arising from variability of the acquisition process and different scanners. The intensity normalization maps parameters obtained using a robust Gaussian Mixture Model (GMM) estimation not affected by the presence of MS lesions. The second part studies the comparison of multi-channel MRI of MS patients with respect to an atlas built from the control subjects, thereby allowing us to look for differences in normal appearing white matter, in and around the lesions of each patient. Experimental results demonstrate that our technique accurately detects significant differences in lesions consequently improving the results of MS lesion detection.
Participatory Sensing Marine Debris: Current Trends and Future Opportunities

NASA Astrophysics Data System (ADS)

Jambeck, J.; Johnsen, K.

2016-02-01

The monitoring of litter and debris is challenging at the global scale because of spatial and temporal variability, disconnected local organizations and the use of paper and pen for documentation. The Marine Debris Tracker mobile app and citizen science program allows for the collection of global standardized data at a scale, speed and efficiency that was not previously possible. The app itself also serves as an outreach and education tool, creating an engaged participatory sensing instrument. This instrument is characterized by several aspects including range and frequency, accuracy and precision, accessibility, measurement dimensions, participant performance, and statistical analysis. Also, important to Marine Debris Tracker is open data and transparency. A web portal provides data that users have logged allowing immediate feedback to users and additional education opportunities. The engagement of users through a top tracker competition and social media keeps participants interested in the Marine Debris Tracker community. Over half a million items have been tracked globally, and maps provide both global and local distribution of data. The Marine Debris Tracker community and dataset continues to grow daily. We will present current usage and engagement, participatory sensing data distributions, choropleth maps of areas of active tracking, and discuss future technologies and platforms to expand data collection and conduct statistical analysis.
The influence of dopants on the nucleation of semiconductor nanocrystals from homogeneous solution.

PubMed

Bryan, J Daniel; Schwartz, Dana A; Gamelin, Daniel R

2005-09-01

The influence of Co2+ ions on the homogeneous nucleation of ZnO is examined. Using electronic absorption spectroscopy as a dopant-specific in-situ spectroscopic probe, Co2+ ions are found to be quantitatively excluded from the ZnO critical nuclei but incorporated nearly statistically in the subsequent growth layers, resulting in crystallites with pure ZnO cores and Zn(1-x)Co(x)O shells. Strong inhibition of ZnO nucleation by Co2+ ions is also observed. These results are explained using the classical nucleation model. Statistical analysis of nucleation inhibition data allows estimation of the critical nucleus size as 25 +/- 4 Zn2+ ions. Bulk calorimetric data allow the activation barrier for ZnO nucleation containing a single Co2+ impurity to be estimated as 5.75 kcal/mol cluster greater than that of pure ZnO, corresponding to a 1.5 x 10(4)-fold reduction in the ZnO nucleation rate constant upon introduction of a single Co2+ impurity. These data and analysis offer a rare view into the role of composition in homogeneous nucleation processes, and specifically address recent experiments targeting formation of semiconductor quantum dots containing single magnetic impurity ions at their precise centers.
Statistical benchmark for BosonSampling

NASA Astrophysics Data System (ADS)

Walschaers, Mattia; Kuipers, Jack; Urbina, Juan-Diego; Mayer, Klaus; Tichy, Malte Christopher; Richter, Klaus; Buchleitner, Andreas

2016-03-01

Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church-Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects.
Assigning statistical significance to proteotypic peptides via database searches

PubMed Central

Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

2011-01-01

Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489
Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data.

PubMed

Carmichael, Owen; Sakhanenko, Lyudmila

2015-05-15

We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way.
Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data

PubMed Central

Carmichael, Owen; Sakhanenko, Lyudmila

2015-01-01

We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way. PMID:25937674
Multidimensional Analysis of Linguistic Networks

NASA Astrophysics Data System (ADS)

Araújo, Tanya; Banisch, Sven

Network-based approaches play an increasingly important role in the analysis of data even in systems in which a network representation is not immediately apparent. This is particularly true for linguistic networks, which use to be induced from a linguistic data set for which a network perspective is only one out of several options for representation. Here we introduce a multidimensional framework for network construction and analysis with special focus on linguistic networks. Such a framework is used to show that the higher is the abstraction level of network induction, the harder is the interpretation of the topological indicators used in network analysis. Several examples are provided allowing for the comparison of different linguistic networks as well as to networks in other fields of application of network theory. The computation and the intelligibility of some statistical indicators frequently used in linguistic networks are discussed. It suggests that the field of linguistic networks, by applying statistical tools inspired by network studies in other domains, may, in its current state, have only a limited contribution to the development of linguistic theory.
Importance of the Correlation between Width and Length in the Shape Analysis of Nanorods: Use of a 2D Size Plot To Probe Such a Correlation.

PubMed

Zhao, Zhihua; Zheng, Zhiqin; Roux, Clément; Delmas, Céline; Marty, Jean-Daniel; Kahn, Myrtil L; Mingotaud, Christophe

2016-08-22

Analysis of nanoparticle size through a simple 2D plot is proposed in order to extract the correlation between length and width in a collection or a mixture of anisotropic particles. Compared to the usual statistics on the length associated with a second and independent statistical analysis of the width, this simple plot easily points out the various types of nanoparticles and their (an)isotropy. For each class of nano-objects, the relationship between width and length (i.e., the strong or weak correlations between these two parameters) may suggest information concerning the nucleation/growth processes. It allows one to follow the effect on the shape and size distribution of physical or chemical processes such as simple ripening. Various electron microscopy pictures from the literature or from the authors' own syntheses are used as examples to demonstrate the efficiency and simplicity of the proposed 2D plot combined with a multivariate analysis. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Factor analysis in optimization of formulation of high content uniformity tablets containing low dose active substance.

PubMed

Lukášová, Ivana; Muselík, Jan; Franc, Aleš; Goněc, Roman; Mika, Filip; Vetchý, David

2017-11-15

Warfarin is intensively discussed drug with narrow therapeutic range. There have been cases of bleeding attributed to varying content or altered quality of the active substance. Factor analysis is useful for finding suitable technological parameters leading to high content uniformity of tablets containing low amount of active substance. The composition of tabletting blend and technological procedure were set with respect to factor analysis of previously published results. The correctness of set parameters was checked by manufacturing and evaluation of tablets containing 1-10mg of warfarin sodium. The robustness of suggested technology was checked by using "worst case scenario" and statistical evaluation of European Pharmacopoeia (EP) content uniformity limits with respect to Bergum division and process capability index (Cpk). To evaluate the quality of active substance and tablets, dissolution method was developed (water; EP apparatus II; 25rpm), allowing for statistical comparison of dissolution profiles. Obtained results prove the suitability of factor analysis to optimize the composition with respect to batches manufactured previously and thus the use of metaanalysis under industrial conditions is feasible. Copyright © 2017 Elsevier B.V. All rights reserved.
Metrological traceability in education: A practical online system for measuring and managing middle school mathematics instruction

NASA Astrophysics Data System (ADS)

Torres Irribarra, D.; Freund, R.; Fisher, W.; Wilson, M.

2015-02-01

Computer-based, online assessments modelled, designed, and evaluated for adaptively administered invariant measurement are uniquely suited to defining and maintaining traceability to standardized units in education. An assessment of this kind is embedded in the Assessing Data Modeling and Statistical Reasoning (ADM) middle school mathematics curriculum. Diagnostic information about middle school students' learning of statistics and modeling is provided via computer-based formative assessments for seven constructs that comprise a learning progression for statistics and modeling from late elementary through the middle school grades. The seven constructs are: Data Display, Meta-Representational Competence, Conceptions of Statistics, Chance, Modeling Variability, Theory of Measurement, and Informal Inference. The end product is a web-delivered system built with Ruby on Rails for use by curriculum development teams working with classroom teachers in designing, developing, and delivering formative assessments. The online accessible system allows teachers to accurately diagnose students' unique comprehension and learning needs in a common language of real-time assessment, logging, analysis, feedback, and reporting.
PIXE analysis of cystic fibrosis sweat samples with an external proton beam

NASA Astrophysics Data System (ADS)

Sommer, F.; Massonnet, B.

1987-03-01

PIXE analysis with an external proton beam is used to study, in four control and five cystic fibrosis children, the elemental composition of sweat samples collected from different parts of the body during entire body hyperthermia. We observe no significant difference of sweat rates and of temperature variations between the two groups during sweat test. The statistical study of results obtained by PIXE analysis allows us to pick out amongst 8 elements studied, 6 elements (Na, Cl, Ca, Mn, Cu, Br) significatively different between the two groups of subjects. Using regression analysis, Na, Cl and Br concentrations could be used in a predictive equation of the state of health.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.

PubMed

Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number

PubMed Central

Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470

Low-level processing for real-time image analysis

NASA Technical Reports Server (NTRS)

Eskenazi, R.; Wilf, J. M.

1979-01-01

A system that detects object outlines in television images in real time is described. A high-speed pipeline processor transforms the raw image into an edge map and a microprocessor, which is integrated into the system, clusters the edges, and represents them as chain codes. Image statistics, useful for higher level tasks such as pattern recognition, are computed by the microprocessor. Peak intensity and peak gradient values are extracted within a programmable window and are used for iris and focus control. The algorithms implemented in hardware and the pipeline processor architecture are described. The strategy for partitioning functions in the pipeline was chosen to make the implementation modular. The microprocessor interface allows flexible and adaptive control of the feature extraction process. The software algorithms for clustering edge segments, creating chain codes, and computing image statistics are also discussed. A strategy for real time image analysis that uses this system is given.
Molecular dynamics simulations and statistical coupling analysis reveal functional coevolution network of oncogenic mutations in the CDKN2A-CDK6 complex.

PubMed

Wang, Jingwen; Zhao, Yuqi; Wang, Yanjie; Huang, Jingfei

2013-01-16

Coevolution between proteins is crucial for understanding protein-protein interaction. Simultaneous changes allow a protein complex to maintain its overall structural-functional integrity. In this study, we combined statistical coupling analysis (SCA) and molecular dynamics simulations on the CDK6-CDKN2A protein complex to evaluate coevolution between proteins. We reconstructed an inter-protein residue coevolution network, consisting of 37 residues and 37 interactions. It shows that most of the coevolved residue pairs are spatially proximal. When the mutations happened, the stable local structures were broken up and thus the protein interaction was decreased or inhibited, with a following increased risk of melanoma. The identification of inter-protein coevolved residues in the CDK6-CDKN2A complex can be helpful for designing protein engineering experiments. Copyright © 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Fatal falls in the US construction industry, 1990 to 1999.

PubMed

Derr, J; Forst, L; Chen, H Y; Conroy, L

2001-10-01

The Occupational Safety and Health Administration's (OSHA's) Integrated Management Information System (IMIS) database allows for the detailed analysis of risk factors surrounding fatal occupational events. This study used IMIS data to (1) perform a risk factor analysis of fatal construction falls, and (2) assess the impact of the February 1995 29 CFR Part 1926 Subpart M OSHA fall protection regulations for construction by calculating trends in fatal fall rates. In addition, IMIS data on fatal construction falls were compared with data from other occupational fatality surveillance systems. For falls in construction, the study identified several demographic factors that may indicate increased risk. A statistically significant downward trend in fatal falls was evident in all construction and within several construction categories during the decade. Although the study failed to show a statistically significant intervention effect from the new OSHA regulations, it may have lacked the power to do so.
Visual analysis of geocoded twin data puts nature and nurture on the map.

PubMed

Davis, O S P; Haworth, C M A; Lewis, C M; Plomin, R

2012-09-01

Twin studies allow us to estimate the relative contributions of nature and nurture to human phenotypes by comparing the resemblance of identical and fraternal twins. Variation in complex traits is a balance of genetic and environmental influences; these influences are typically estimated at a population level. However, what if the balance of nature and nurture varies depending on where we grow up? Here we use statistical and visual analysis of geocoded data from over 6700 families to show that genetic and environmental contributions to 45 childhood cognitive and behavioral phenotypes vary geographically in the United Kingdom. This has implications for detecting environmental exposures that may interact with the genetic influences on complex traits, and for the statistical power of samples recruited for genetic association studies. More broadly, our experience demonstrates the potential for collaborative exploratory visualization to act as a lingua franca for large-scale interdisciplinary research.
Adaptive distributed source coding.

PubMed

Varodayan, David; Lin, Yao-Chung; Girod, Bernd

2012-05-01

We consider distributed source coding in the presence of hidden variables that parameterize the statistical dependence among sources. We derive the Slepian-Wolf bound and devise coding algorithms for a block-candidate model of this problem. The encoder sends, in addition to syndrome bits, a portion of the source to the decoder uncoded as doping bits. The decoder uses the sum-product algorithm to simultaneously recover the source symbols and the hidden statistical dependence variables. We also develop novel techniques based on density evolution (DE) to analyze the coding algorithms. We experimentally confirm that our DE analysis closely approximates practical performance. This result allows us to efficiently optimize parameters of the algorithms. In particular, we show that the system performs close to the Slepian-Wolf bound when an appropriate doping rate is selected. We then apply our coding and analysis techniques to a reduced-reference video quality monitoring system and show a bit rate saving of about 75% compared with fixed-length coding.
Application of random match probability calculations to mixed STR profiles.

PubMed

Bille, Todd; Bright, Jo-Anne; Buckleton, John

2013-03-01

Mixed DNA profiles are being encountered more frequently as laboratories analyze increasing amounts of touch evidence. If it is determined that an individual could be a possible contributor to the mixture, it is necessary to perform a statistical analysis to allow an assignment of weight to the evidence. Currently, the combined probability of inclusion (CPI) and the likelihood ratio (LR) are the most commonly used methods to perform the statistical analysis. A third method, random match probability (RMP), is available. This article compares the advantages and disadvantages of the CPI and LR methods to the RMP method. We demonstrate that although the LR method is still considered the most powerful of the binary methods, the RMP and LR methods make similar use of the observed data such as peak height, assumed number of contributors, and known contributors where the CPI calculation tends to waste information and be less informative. © 2013 American Academy of Forensic Sciences.
Langmuir waveforms at interplanetary shocks: STEREO statistical analysis

NASA Astrophysics Data System (ADS)

Briand, C.

2016-12-01

Wave-particle interactions and particle acceleration are the two main processes allowing energy dissipation at non collisional shocks. Ion acceleration has been deeply studied for many years, also for their central role in the shock front reformation. Electron dynamics is also important in the shock dynamics through the instabilities they can generate which may impact the ion dynamics.Particle measurements can be efficiently completed by wave measurements to determine the characteristics of the electron beams and study the turbulence of the medium. Electric waveforms obtained from the S/WAVES instrument of the STEREO mission between 2007 to 2014 are analyzed. Thus, clear signature of Langmuir waves are observed on 41 interplanetary shocks. These data enable a statistical analysis and to deduce some characteristics of the electron dynamics on different shocks sources (SIR or ICME) and types (quasi-perpendicular or quasi-parallel). The conversion process between electrostatic to electromagnetic waves has also been tested in several cases.
Flares, ejections, proton events

NASA Astrophysics Data System (ADS)

Belov, A. V.

2017-11-01

Statistical analysis is performed for the relationship of coronal mass ejections (CMEs) and X-ray flares with the fluxes of solar protons with energies >10 and >100 MeV observed near the Earth. The basis for this analysis was the events that took place in 1976-2015, for which there are reliable observations of X-ray flares on GOES satellites and CME observations with SOHO/LASCO coronagraphs. A fairly good correlation has been revealed between the magnitude of proton enhancements and the power and duration of flares, as well as the initial CME speed. The statistics do not give a clear advantage either to CMEs or the flares concerning their relation with proton events, but the characteristics of the flares and ejections complement each other well and are reasonable to use together in the forecast models. Numerical dependences are obtained that allow estimation of the proton fluxes to the Earth expected from solar observations; possibilities for improving the model are discussed.
Good analytical practice: statistics and handling data in biomedical science. A primer and directions for authors. Part 1: Introduction. Data within and between one or two sets of individuals.

PubMed

Blann, A D; Nation, B R

2008-01-01

The biomedical scientist is bombarded on a daily basis by information, almost all of which refers to the health status of an individual or groups of individuals. This review is the first of a two-part article written to explain some of the issues related to the presentation and analysis of data. The first part focuses on types of data and how to present and analyse data from an individual or from one or two groups of persons. The second part will examine data from three or more sets of persons, what methods are available to allow this analysis (i.e., statistical software packages), and will conclude with a statement on appropriate descriptors of data, their analyses, and presentation for authors considering submission of their data to this journal.
Physical restraints in an Italian psychiatric ward: clinical reasons and staff organization problems.

PubMed

Di Lorenzo, Rosaria; Baraldi, Sara; Ferrara, Maria; Mimmi, Stefano; Rigatelli, Marco

2012-04-01

To analyze physical restraint use in an Italian acute psychiatric ward, where mechanical restraint by belt is highly discouraged but allowed. Data were retrospectively collected from medical and nursing charts, from January 1, 2005, to December 31, 2008. Physical restraint rate and relationships between restraints and selected variables were statistically analyzed. Restraints were statistically significantly more frequent in compulsory or voluntary admissions of patients with an altered state of consciousness, at night, to control aggressive behavior, and in patients with "Schizophrenia and other Psychotic Disorders" during the first 72 hr of hospitalization. Analysis of clinical and organizational factors conditioning restraints may limit its use. © 2011 Wiley Periodicals, Inc.
A statistical model for radar images of agricultural scenes

NASA Technical Reports Server (NTRS)

Frost, V. S.; Shanmugan, K. S.; Holtzman, J. C.; Stiles, J. A.

1982-01-01

The presently derived and validated statistical model for radar images containing many different homogeneous fields predicts the probability density functions of radar images of entire agricultural scenes, thereby allowing histograms of large scenes composed of a variety of crops to be described. Seasat-A SAR images of agricultural scenes are accurately predicted by the model on the basis of three assumptions: each field has the same SNR, all target classes cover approximately the same area, and the true reflectivity characterizing each individual target class is a uniformly distributed random variable. The model is expected to be useful in the design of data processing algorithms and for scene analysis using radar images.
Experimental assessment of the spatial variability of porosity, permeability and sorption isotherms in an ordinary building concrete

NASA Astrophysics Data System (ADS)

Issaadi, N.; Hamami, A. A.; Belarbi, R.; Aït-Mokhtar, A.

2017-10-01

In this paper, spatial variabilities of some transfer and storage properties of a concrete wall were assessed. The studied parameters deal with water porosity, water vapor permeability, intrinsic permeability and water vapor sorption isotherms. For this purpose, a concrete wall was built in the laboratory and specimens were periodically taken and tested. The obtained results allow highlighting a statistical estimation of the mean value, the standard deviation and the spatial correlation length of the studied fields for each parameter. These results were discussed and a statistical analysis was performed in order to assess for each of these parameters the appropriate probability density function.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Marekova, Elisaveta

Series of relatively large earthquakes in different regions of the Earth are studied. The regions chooses are of a high seismic activity and has a good contemporary network for recording of the seismic events along them. The main purpose of this investigation is the attempt to describe analytically the seismic process in the space and time. We are considering the statistical distributions the distances and the times between consecutive earthquakes (so called pair analysis). Studies conducted on approximating the statistical distribution of the parameters of consecutive seismic events indicate the existence of characteristic functions that describe them best. Such amore » mathematical description allows the distributions of the examined parameters to be compared to other model distributions.« less
Andreev Bound States Formation and Quasiparticle Trapping in Quench Dynamics Revealed by Time-Dependent Counting Statistics.

PubMed

Souto, R Seoane; Martín-Rodero, A; Yeyati, A Levy

2016-12-23

We analyze the quantum quench dynamics in the formation of a phase-biased superconducting nanojunction. We find that in the absence of an external relaxation mechanism and for very general conditions the system gets trapped in a metastable state, corresponding to a nonequilibrium population of the Andreev bound states. The use of the time-dependent full counting statistics analysis allows us to extract information on the asymptotic population of even and odd many-body states, demonstrating that a universal behavior, dependent only on the Andreev state energy, is reached in the quantum point contact limit. These results shed light on recent experimental observations on quasiparticle trapping in superconducting atomic contacts.
Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study.

PubMed

van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

2015-08-07

Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.
Nonparametric Residue Analysis of Dynamic PET Data With Application to Cerebral FDG Studies in Normals.

PubMed

O'Sullivan, Finbarr; Muzi, Mark; Spence, Alexander M; Mankoff, David M; O'Sullivan, Janet N; Fitzgerald, Niall; Newman, George C; Krohn, Kenneth A

2009-06-01

Kinetic analysis is used to extract metabolic information from dynamic positron emission tomography (PET) uptake data. The theory of indicator dilutions, developed in the seminal work of Meier and Zierler (1954), provides a probabilistic framework for representation of PET tracer uptake data in terms of a convolution between an arterial input function and a tissue residue. The residue is a scaled survival function associated with tracer residence in the tissue. Nonparametric inference for the residue, a deconvolution problem, provides a novel approach to kinetic analysis-critically one that is not reliant on specific compartmental modeling assumptions. A practical computational technique based on regularized cubic B-spline approximation of the residence time distribution is proposed. Nonparametric residue analysis allows formal statistical evaluation of specific parametric models to be considered. This analysis needs to properly account for the increased flexibility of the nonparametric estimator. The methodology is illustrated using data from a series of cerebral studies with PET and fluorodeoxyglucose (FDG) in normal subjects. Comparisons are made between key functionals of the residue, tracer flux, flow, etc., resulting from a parametric (the standard two-compartment of Phelps et al. 1979) and a nonparametric analysis. Strong statistical evidence against the compartment model is found. Primarily these differences relate to the representation of the early temporal structure of the tracer residence-largely a function of the vascular supply network. There are convincing physiological arguments against the representations implied by the compartmental approach but this is the first time that a rigorous statistical confirmation using PET data has been reported. The compartmental analysis produces suspect values for flow but, notably, the impact on the metabolic flux, though statistically significant, is limited to deviations on the order of 3%-4%. The general advantage of the nonparametric residue analysis is the ability to provide a valid kinetic quantitation in the context of studies where there may be heterogeneity or other uncertainty about the accuracy of a compartmental model approximation of the tissue residue.
Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study

PubMed Central

Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

2015-01-01

Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160
A statistical study of EMIC waves observed by Cluster. 1. Wave properties. EMIC Wave Properties

DOE PAGES

Allen, R. C.; Zhang, J. -C.; Kistler, L. M.; ...

2015-07-23

Electromagnetic ion cyclotron (EMIC) waves are an important mechanism for particle energization and losses inside the magnetosphere. In order to better understand the effects of these waves on particle dynamics, detailed information about the occurrence rate, wave power, ellipticity, normal angle, energy propagation angle distributions, and local plasma parameters are required. Previous statistical studies have used in situ observations to investigate the distribution of these parameters in the magnetic local time versus L-shell (MLT-L) frame within a limited magnetic latitude (MLAT) range. In our study, we present a statistical analysis of EMIC wave properties using 10 years (2001–2010) of datamore » from Cluster, totaling 25,431 min of wave activity. Due to the polar orbit of Cluster, we are able to investigate EMIC waves at all MLATs and MLTs. This allows us to further investigate the MLAT dependence of various wave properties inside different MLT sectors and further explore the effects of Shabansky orbits on EMIC wave generation and propagation. Thus, the statistical analysis is presented in two papers. OUr paper focuses on the wave occurrence distribution as well as the distribution of wave properties. The companion paper focuses on local plasma parameters during wave observations as well as wave generation proxies.« less
GPUs for statistical data analysis in HEP: a performance study of GooFit on GPUs vs. RooFit on CPUs

NASA Astrophysics Data System (ADS)

Pompili, Alexis; Di Florio, Adriano; CMS Collaboration

2016-10-01

In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the Jψϕ invariant mass in the three-body decay B +→JψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerably resulting speed-up, while comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may apply or does not apply because its regularity conditions are not satisfied.
Developing Statistical Evaluation Model of Introduction Effect of MSW Thermal Recycling

NASA Astrophysics Data System (ADS)

Aoyama, Makoto; Kato, Takeyoshi; Suzuoki, Yasuo

For the effective utilization of municipal solid waste (MSW) through a thermal recycling, new technologies, such as an incineration plant using a Molten Carbonate Fuel Cell (MCFC), are being developed. The impact of new technologies should be evaluated statistically for various municipalities, so that the target of technological development or potential cost reduction due to the increased cumulative number of installed system can be discussed. For this purpose, we developed a model for discussing the impact of new technologies, where a statistical mesh data set was utilized to estimate the heat demand around the incineration plant. This paper examines a case study by using a developed model, where a conventional type and a MCFC type MSW incineration plant is compared in terms of the reduction in primary energy and the revenue by both electricity and heat supply. Based on the difference in annual revenue, we calculate the allowable investment in MCFC-type MSW incineration plant in addition to conventional plant. The results suggest that allowable investment can be about 30 millions yen/(t/day) in small municipalities, while it is only 10 millions yen/(t/day) in large municipalities. The sensitive analysis shows the model can be useful for discussing the difference of impact of material recycling of plastics on thermal recycling technologies.

Statistical quantifiers of memory for an analysis of human brain and neuro-system diseases

NASA Astrophysics Data System (ADS)

Demin, S. A.; Yulmetyev, R. M.; Panischev, O. Yu.; Hänggi, Peter

2008-03-01

On the basis of a memory function formalism for correlation functions of time series we investigate statistical memory effects by the use of appropriate spectral and relaxation parameters of measured stochastic data for neuro-system diseases. In particular, we study the dynamics of the walk of a patient who suffers from Parkinson's disease (PD), Huntington's disease (HD), amyotrophic lateral sclerosis (ALS), and compare against the data of healthy people (CO - control group). We employ an analytical method which is able to characterize the stochastic properties of stride-to-stride variations of gait cycle timing. Our results allow us to estimate quantitatively a few human locomotion function abnormalities occurring in the human brain and in the central nervous system (CNS). Particularly, the patient's gait dynamics are characterized by an increased memory behavior together with sizable fluctuations as compared with the locomotion dynamics of healthy patients. Moreover, we complement our findings with peculiar features as detected in phase-space portraits and spectral characteristics for the different data sets (PD, HD, ALS and healthy people). The evaluation of statistical quantifiers of the memory function is shown to provide a useful toolkit which can be put to work to identify various abnormalities of locomotion dynamics. Moreover, it allows one to diagnose qualitatively and quantitatively serious brain and central nervous system diseases.
Verify MesoNAM Performance

NASA Technical Reports Server (NTRS)

Bauman, William H., III

2010-01-01

The AMU conducted an objective analysis of the MesoNAM forecasts compared to observed values from sensors at specified KSC/CCAFS wind towers by calculating the following statistics to verify the performance of the model: 1) Bias (mean difference), 2) Standard deviation of Bias, 3) Root Mean Square Error (RMSE), and 4) Hypothesis test for Bias = O. The 45 WS LWOs use the MesoNAM to support launch weather operations. However, the actual performance of the model at KSC and CCAFS had not been measured objectively. The analysis compared the MesoNAM forecast winds, temperature and dew point to the observed values from the sensors on wind towers. The data were stratified by tower sensor, month and onshore/offshore wind direction based on the orientation of the coastline to each tower's location. The model's performance statistics were then calculated for each wind tower based on sensor height and model initialization time. The period of record for the data used in this task was based on the operational start of the current MesoNAM in mid-August 2006 and so the task began with the first full month of data, September 2006, through May 2010. The analysis of model performance indicated: a) The accuracy decreased as the forecast valid time from the model initialization increased, b) There was a diurnal signal in T with a cool bias during the late night and a warm bias during the afternoon, c) There was a diurnal signal in Td with a low bias during the afternoon and a high bias during the late night, and d) The model parameters at each vertical level most closely matched the observed parameters at heights closest to those vertical levels. The AMU developed a GUI that consists of a multi-level drop-down menu written in JavaScript embedded within the HTML code. This tool allows the LWO to easily and efficiently navigate among the charts and spreadsheet files containing the model performance statistics. The objective statistics give the LWOs knowledge of the model's strengths and weaknesses and the GUI allows quick access to the data which will result in improved forecasts for operations.
ProteoSign: an end-user online differential proteomics statistical analysis platform.

PubMed

Efstathiou, Georgios; Antonakis, Andreas N; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Divanach, Peter; Trudgian, David C; Thomas, Benjamin; Papanikolaou, Nikolas; Aivaliotis, Michalis; Acuto, Oreste; Iliopoulos, Ioannis

2017-07-03

Profiling of proteome dynamics is crucial for understanding cellular behavior in response to intrinsic and extrinsic stimuli and maintenance of homeostasis. Over the last 20 years, mass spectrometry (MS) has emerged as the most powerful tool for large-scale identification and characterization of proteins. Bottom-up proteomics, the most common MS-based proteomics approach, has always been challenging in terms of data management, processing, analysis and visualization, with modern instruments capable of producing several gigabytes of data out of a single experiment. Here, we present ProteoSign, a freely available web application, dedicated in allowing users to perform proteomics differential expression/abundance analysis in a user-friendly and self-explanatory way. Although several non-commercial standalone tools have been developed for post-quantification statistical analysis of proteomics data, most of them are not end-user appealing as they often require very stringent installation of programming environments, third-party software packages and sometimes further scripting or computer programming. To avoid this bottleneck, we have developed a user-friendly software platform accessible via a web interface in order to enable proteomics laboratories and core facilities to statistically analyse quantitative proteomics data sets in a resource-efficient manner. ProteoSign is available at http://bioinformatics.med.uoc.gr/ProteoSign and the source code at https://github.com/yorgodillo/ProteoSign. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Identifying drought response of semi-arid aeolian systems using near-surface luminescence profiles and changepoint analysis, Nebraska Sandhills.

NASA Astrophysics Data System (ADS)

Buckland, Catherine; Bailey, Richard; Thomas, David

2017-04-01

Two billion people living in drylands are affected by land degradation. Sediment erosion by wind and water removes fertile soil and destabilises landscapes. Vegetation disturbance is a key driver of dryland erosion caused by both natural and human forcings: drought, fire, land use, grazing pressure. A quantified understanding of vegetation cover sensitivities and resultant surface change to forcing factors is needed if the vegetation and landscape response to future climate change and human pressure are to be better predicted. Using quartz luminescence dating and statistical changepoint analysis (Killick & Eckley, 2014) this study demonstrates the ability to identify step-changes in depositional age of near-surface sediments. Lx/Tx luminescence profiles coupled with statistical analysis show the use of near-surface sediments in providing a high-resolution record of recent system response and aeolian system thresholds. This research determines how the environment has recorded and retained sedimentary evidence of drought response and land use disturbances over the last two hundred years across both individual landforms and the wider Nebraska Sandhills. Identifying surface deposition and comparing with records of climate, fire and land use changes allows us to assess the sensitivity and stability of the surface sediment to a range of forcing factors. Killick, R and Eckley, IA. (2014) "changepoint: An R Package for Changepoint Analysis." Journal of Statistical Software, (58) 1-19.
Clustering analysis for muon tomography data elaboration in the Muon Portal project

NASA Astrophysics Data System (ADS)

Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

2015-05-01

Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
A behavior-analytic view of psychological health

PubMed Central

Follette, William C.; Bach, Patricia A.; Follette, Victoria M.

1993-01-01

This paper argues that a behavioral analysis of psychological health is useful and appropriate. Such an analysis will allow us to better evaluate intervention outcomes without resorting only to the assessment of pathological behavior, thus providing an alternative to the Diagnostic and Statistical Manual system of conceptualizing behavior. The goals of such an analysis are to distinguish between people and outcomes using each term of the three-term contingency as a dimension to consider. A brief review of other efforts to define psychological health is provided. Laboratory approaches to a behavioral analysis of healthy behavior are presented along with shortcomings in our science that impede our analysis. Finally, we present some of the functional characteristics of psychological health that we value. PMID:22478160
Laser scanning cytometry as a tool for biomarker validation

NASA Astrophysics Data System (ADS)

Mittag, Anja; Füldner, Christiane; Lehmann, Jörg; Tarnok, Attila

2013-03-01

Biomarkers are essential for diagnosis, prognosis, and therapy. As diverse is the range of diseases the broad is the range of biomarkers and the material used for analysis. Whereas body fluids can be relatively easily obtained and analyzed, the investigation of tissue is in most cases more complicated. The same applies for the screening and the evaluation of new biomarkers and the estimation of the binding of biomarkers found in animal models which need to be transferred into applications in humans. The latter in particular is difficult if it recognizes proteins or cells in tissue. A better way to find suitable cellular biomarkers for immunoscintigraphy or PET analyses may be therefore the in situ analysis of the cells in the respective tissue. In this study we present a method for biomarker validation using Laser Scanning Cytometry which allows the emulation of future in vivo analysis. The biomarker validation is exemplarily shown for rheumatoid arthritis (RA) on synovial membrane. Cryosections were scanned and analyzed by phantom contouring. Adequate statistical methods allowed the identification of suitable markers and combinations. The fluorescence analysis of the phantoms allowed the discrimination between synovial membrane of RA patients and non-RA control sections by using median fluorescence intensity and the "affected area". As intensity and area are relevant parameters of in vivo imaging (e.g. PET scan) too, the presented method allows emulation of a probable outcome of in vivo imaging, i.e. the binding of the target protein and hence, the validation of the potential of the respective biomarker.
Effects of Consecutive Basketball Games on the Game-Related Statistics that Discriminate Winner and Losing Teams

PubMed Central

Ibáñez, Sergio J.; García, Javier; Feu, Sebastian; Lorenzo, Alberto; Sampaio, Jaime

2009-01-01

The aim of the present study was to identify the game-related statistics that discriminated basketball winning and losing teams in each of the three consecutive games played in a condensed tournament format. The data were obtained from the Spanish Basketball Federation and included game-related statistics from the Under-20 league (2005-2006 and 2006-2007 seasons). A total of 223 games were analyzed with the following game-related statistics: two and three-point field goal (made and missed), free-throws (made and missed), offensive and defensive rebounds, assists, steals, turnovers, blocks (made and received), fouls committed, ball possessions and offensive rating. Results showed that winning teams in this competition had better values in all game-related statistics, with the exception of three point field goals made, free-throws missed and turnovers (p ≥ 0.05). The main effect of game number was only identified in turnovers, with a statistical significant decrease between the second and third game. No interaction was found in the analysed variables. A discriminant analysis allowed identifying the two-point field goals made, the defensive rebounds and the assists as discriminators between winning and losing teams in all three games. Additionally to these, only the three-point field goals made contributed to discriminate teams in game three, suggesting a moderate effect of fatigue. Coaches may benefit from being aware of this variation in game determinant related statistics and, also, from using offensive and defensive strategies in the third game, allowing to explore or hide the three point field-goals performance. Key points Overall team performances along the three consecutive games were very similar, not confirming an accumulated fatigue effect. The results from the three-point field goals in the third game suggested that winning teams were able to shoot better from longer distances and this could be the result of exhibiting higher conditioning status and/or the losing teams’ exhibiting low conditioning in defense. PMID:24150011
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.

PubMed

Zheng, Qi; Wang, Xiu-Jie

2008-07-01

Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data. Although there have been a lot of software with GO-related analysis functions, new tools are still needed to meet the requirements for data generated by newly developed technologies or for advanced analysis purpose. Here, we present a Gene Ontology Enrichment Analysis Software Toolkit (GOEAST), an easy-to-use web-based toolkit that identifies statistically overrepresented GO terms within given gene sets. Compared with available GO analysis tools, GOEAST has the following improved features: (i) GOEAST displays enriched GO terms in graphical format according to their relationships in the hierarchical tree of each GO category (biological process, molecular function and cellular component), therefore, provides better understanding of the correlations among enriched GO terms; (ii) GOEAST supports analysis for data from various sources (probe or probe set IDs of Affymetrix, Illumina, Agilent or customized microarrays, as well as different gene identifiers) and multiple species (about 60 prokaryote and eukaryote species); (iii) One unique feature of GOEAST is to allow cross comparison of the GO enrichment status of multiple experiments to identify functional correlations among them. GOEAST also provides rigorous statistical tests to enhance the reliability of analysis results. GOEAST is freely accessible at http://omicslab.genetics.ac.cn/GOEAST/
Separate-channel analysis of two-channel microarrays: recovering inter-spot information.

PubMed

Smyth, Gordon K; Altman, Naomi S

2013-05-26

Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
Something old, something new, something borrowed, something blue: a framework for the marriage of health econometrics and cost-effectiveness analysis.

PubMed

Hoch, Jeffrey S; Briggs, Andrew H; Willan, Andrew R

2002-07-01

Economic evaluation is often seen as a branch of health economics divorced from mainstream econometric techniques. Instead, it is perceived as relying on statistical methods for clinical trials. Furthermore, the statistic of interest in cost-effectiveness analysis, the incremental cost-effectiveness ratio is not amenable to regression-based methods, hence the traditional reliance on comparing aggregate measures across the arms of a clinical trial. In this paper, we explore the potential for health economists undertaking cost-effectiveness analysis to exploit the plethora of established econometric techniques through the use of the net-benefit framework - a recently suggested reformulation of the cost-effectiveness problem that avoids the reliance on cost-effectiveness ratios and their associated statistical problems. This allows the formulation of the cost-effectiveness problem within a standard regression type framework. We provide an example with empirical data to illustrate how a regression type framework can enhance the net-benefit method. We go on to suggest that practical advantages of the net-benefit regression approach include being able to use established econometric techniques, adjust for imperfect randomisation, and identify important subgroups in order to estimate the marginal cost-effectiveness of an intervention. Copyright 2002 John Wiley & Sons, Ltd.
Empirical validation of statistical parametric mapping for group imaging of fast neural activity using electrical impedance tomography.

PubMed

Packham, B; Barnes, G; Dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D

2016-06-01

Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity.
Low energy peripheral scaling in nucleon-nucleon scattering and uncertainty quantification

NASA Astrophysics Data System (ADS)

Ruiz Simo, I.; Amaro, J. E.; Ruiz Arriola, E.; Navarro Pérez, R.

2018-03-01

We analyze the peripheral structure of the nucleon-nucleon interaction for LAB energies below 350 MeV. To this end we transform the scattering matrix into the impact parameter representation by analyzing the scaled phase shifts (L + 1/2) δ JLS (p) and the scaled mixing parameters (L + 1/2)ɛ JLS (p) in terms of the impact parameter b = (L + 1/2)/p. According to the eikonal approximation, at large angular momentum L these functions should become an universal function of b, independent on L. This allows to discuss in a rather transparent way the role of statistical and systematic uncertainties in the different long range components of the two-body potential. Implications for peripheral waves obtained in chiral perturbation theory interactions to fifth order (N5LO) or from the large body of NN data considered in the SAID partial wave analysis are also drawn from comparing them with other phenomenological high-quality interactions, constructed to fit scattering data as well. We find that both N5LO and SAID peripheral waves disagree more than 5σ with the Granada-2013 statistical analysis, more than 2σ with the 6 statistically equivalent potentials fitting the Granada-2013 database and about 1σ with the historical set of 13 high-quality potentials developed since the 1993 Nijmegen analysis.
Empirical validation of statistical parametric mapping for group imaging of fast neural activity using electrical impedance tomography

PubMed Central

Packham, B; Barnes, G; dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D

2016-01-01

Abstract Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity. PMID:27203477
Nearfield Summary and Statistical Analysis of the Second AIAA Sonic Boom Prediction Workshop

NASA Technical Reports Server (NTRS)

Park, Michael A.; Nemec, Marian

2017-01-01

A summary is provided for the Second AIAA Sonic Boom Workshop held 8-9 January 2017 in conjunction with AIAA SciTech 2017. The workshop used three required models of increasing complexity: an axisymmetric body, a wing body, and a complete configuration with flow-through nacelle. An optional complete configuration with propulsion boundary conditions is also provided. These models are designed with similar nearfield signatures to isolate geometry and shock/expansion interaction effects. Eleven international participant groups submitted nearfield signatures with forces, pitching moment, and iterative convergence norms. Statistics and grid convergence of these nearfield signatures are presented. These submissions are propagated to the ground, and noise levels are computed. This allows the grid convergence and the statistical distribution of a noise level to be computed. While progress is documented since the first workshop, improvement to the analysis methods for a possible subsequent workshop are provided. The complete configuration with flow-through nacelle showed the most dramatic improvement between the two workshops. The current workshop cases are more relevant to vehicles with lower loudness and have the potential for lower annoyance than the first workshop cases. The models for this workshop with quieter ground noise levels than the first workshop exposed weaknesses in analysis, particularly in convective discretization.
COV2HTML: a visualization and analysis tool of bacterial next generation sequencing (NGS) data for postgenomics life scientists.

PubMed

Monot, Marc; Orgeur, Mickael; Camiade, Emilie; Brehier, Clément; Dupuy, Bruno

2014-03-01

COV2HTML is an interactive web interface, which is addressed to biologists, and allows performing both coverage visualization and analysis of NGS alignments performed on prokaryotic organisms (bacteria and phages). It combines two processes: a tool that converts the huge NGS mapping or coverage files into light specific coverage files containing information on genetic elements; and a visualization interface allowing a real-time analysis of data with optional integration of statistical results. To demonstrate the scope of COV2HTML, the program was tested with data from two published studies. The first data were from RNA-seq analysis of Campylobacter jejuni, based on comparison of two conditions with two replicates. We were able to recover 26 out of 27 genes highlighted in the publication using COV2HTML. The second data comprised of stranded TSS and RNA-seq data sets on the Archaea Sulfolobus solfataricus. COV2HTML was able to highlight most of the TSSs from the article and allows biologists to visualize both TSS and RNA-seq on the same screen. The strength of the COV2HTML interface is making possible NGS data analysis without software installation, login, or a long training period. A web version is accessible at https://mmonot.eu/COV2HTML/ . This website is free and open to users without any login requirement.
Project risk management in the construction of high-rise buildings

NASA Astrophysics Data System (ADS)

Titarenko, Boris; Hasnaoui, Amir; Titarenko, Roman; Buzuk, Liliya

2018-03-01

This paper shows the project risk management methods, which allow to better identify risks in the construction of high-rise buildings and to manage them throughout the life cycle of the project. One of the project risk management processes is a quantitative analysis of risks. The quantitative analysis usually includes the assessment of the potential impact of project risks and their probabilities. This paper shows the most popular methods of risk probability assessment and tries to indicate the advantages of the robust approach over the traditional methods. Within the framework of the project risk management model a robust approach of P. Huber is applied and expanded for the tasks of regression analysis of project data. The suggested algorithms used to assess the parameters in statistical models allow to obtain reliable estimates. A review of the theoretical problems of the development of robust models built on the methodology of the minimax estimates was done and the algorithm for the situation of asymmetric "contamination" was developed.
A computer analysis of ERTS data of the Lake Gregory area of South Australia with particular emphasis on its role in terrain classification for engineering. M.S. Thesis

NASA Technical Reports Server (NTRS)

Lodwick, G. D. (Principal Investigator)

1976-01-01

A digital computer and multivariate statistical techniques were used to analyze 4-band multispectral data. A representation of the original data for each of the four bands allows a certain degree of terrain interpretation; however, variations in appearance of sites within and between bands, without additional criteria for deciding which representation should be preferred, create difficulties for classification. Investigation of the video data groups produced by principal components analysis and cluster analysis techniques shows that effective correlations with classifications of terrain produced by conventional methods could be carried out. The analyses also highlighted underlying relationships between the various elements. The approach used allows large areas (185 cm by 185 cm) to be classified into fundamental units within a matter of hours and can be applied to those parts of the Earth where facilities for conventional studies are poor or lacking.
Moment-based metrics for global sensitivity analysis of hydrological systems

NASA Astrophysics Data System (ADS)

Dell'Oca, Aronne; Riva, Monica; Guadagnini, Alberto

2017-12-01

We propose new metrics to assist global sensitivity analysis, GSA, of hydrological and Earth systems. Our approach allows assessing the impact of uncertain parameters on main features of the probability density function, pdf, of a target model output, y. These include the expected value of y, the spread around the mean and the degree of symmetry and tailedness of the pdf of y. Since reliable assessment of higher-order statistical moments can be computationally demanding, we couple our GSA approach with a surrogate model, approximating the full model response at a reduced computational cost. Here, we consider the generalized polynomial chaos expansion (gPCE), other model reduction techniques being fully compatible with our theoretical framework. We demonstrate our approach through three test cases, including an analytical benchmark, a simplified scenario mimicking pumping in a coastal aquifer and a laboratory-scale conservative transport experiment. Our results allow ascertaining which parameters can impact some moments of the model output pdf while being uninfluential to others. We also investigate the error associated with the evaluation of our sensitivity metrics by replacing the original system model through a gPCE. Our results indicate that the construction of a surrogate model with increasing level of accuracy might be required depending on the statistical moment considered in the GSA. The approach is fully compatible with (and can assist the development of) analysis techniques employed in the context of reduction of model complexity, model calibration, design of experiment, uncertainty quantification and risk assessment.
Poster - Thur Eve - 54: A software solution for ongoing DVH quality assurance in radiation therapy.

PubMed

Annis, S-L; Zeng, G; Wu, X; Macpherson, M

2012-07-01

A program has been developed in MATLAB for use in quality assurance of treatment planning of radiation therapy. It analyzes patient DVH files and compiles dose volume data for review, trending, comparison and analysis. Patient DVH files are exported from the Eclipse treatment planning system and saved according to treatment sites and date. Currently analysis is available for 4 treatment sites; Prostate, Prostate Bed, Lung, and Upper GI, with two functions for data report and analysis: patient-specific and organ-specific. The patient-specific function loads one patient DVH file and reports the user-specified dose volume data of organs and targets. These data can be compiled to an external file for a third party analysis. The organ-specific function extracts a requested dose volume of an organ from the DVH files of a patient group and reports the statistics over this population. A graphical user interface is utilized to select clinical sites, function and structures, and input user's requests. We have implemented this program in planning quality assurance at our center. The program has tracked the dosimetric improvement in GU sites after VMAT was implemented clinically. It has generated dose volume statistics for different groups of patients associated with technique or time range. This program allows reporting and statistical analysis of DVH files. It is an efficient tool for the planning quality control in radiation therapy. © 2012 American Association of Physicists in Medicine.

Centralized Analysis of Local Data, With Dollars and Lives on the Line: Lessons From The Home Radon Experience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, PhillipN.; Gelman, Andrew

2014-11-24

In this chapter we elucidate four main themes. The first is that modern data analyses, including "Big Data" analyses, often rely on data from different sources, which can present challenges in constructing statistical models that can make effective use of all of the data. The second theme is that although data analysis is usually centralized, frequently the final outcome is to provide information or allow decision-making for individuals. Third, data analyses often have multiple uses by design: the outcomes of the analysis are intended to be used by more than one person or group, for more than one purpose. Finally,more » issues of privacy and confidentiality can cause problems in more subtle ways than are usually considered; we will illustrate this point by discussing a case in which there is substantial and effective political opposition to simply acknowledging the geographic distribution of a health hazard. A researcher analyzes some data and learns something important. What happens next? What does it take for the results to make a difference in people's lives? In this chapter we tell a story - a true story - about a statistical analysis that should have changed government policy, but didn't. The project was a research success that did not make its way into policy, and we think it provides some useful insights into the interplay between locally-collected data, statistical analysis, and individual decision making.« less
The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets.

PubMed

Carroll, Adam J; Badger, Murray R; Harvey Millar, A

2010-07-14

Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS) makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP) server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.). Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP) their own data to the server for online processing via a novel raw data processing pipeline. MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to assess data quality and draw their own insights from published metabolomics datasets.
Saturation recovery EPR spin-labeling method for quantification of lipids in biological membrane domains.

PubMed

Mainali, Laxman; Camenisch, Theodore G; Hyde, James S; Subczynski, Witold K

2017-12-01

The presence of integral membrane proteins induces the formation of distinct domains in the lipid bilayer portion of biological membranes. Qualitative application of both continuous wave (CW) and saturation recovery (SR) electron paramagnetic resonance (EPR) spin-labeling methods allowed discrimination of the bulk, boundary, and trapped lipid domains. A recently developed method, which is based on the CW EPR spectra of phospholipid (PL) and cholesterol (Chol) analog spin labels, allows evaluation of the relative amount of PLs (% of total PLs) in the boundary plus trapped lipid domain and the relative amount of Chol (% of total Chol) in the trapped lipid domain [ M. Raguz, L. Mainali, W. J. O'Brien, and W. K. Subczynski (2015), Exp. Eye Res., 140:179-186 ]. Here, a new method is presented that, based on SR EPR spin-labeling, allows quantitative evaluation of the relative amounts of PLs and Chol in the trapped lipid domain of intact membranes. This new method complements the existing one, allowing acquisition of more detailed information about the distribution of lipids between domains in intact membranes. The methodological transition of the SR EPR spin-labeling approach from qualitative to quantitative is demonstrated. The abilities of this method are illustrated for intact cortical and nuclear fiber cell plasma membranes from porcine eye lenses. Statistical analysis (Student's t -test) of the data allowed determination of the separations of mean values above which differences can be treated as statistically significant ( P ≤ 0.05) and can be attributed to sources other than preparation/technique.
Interactions and triggering in a 3D rate and state asperity model

NASA Astrophysics Data System (ADS)

Dublanchet, P.; Bernard, P.

2012-12-01

Precise relocation of micro-seismicity and careful analysis of seismic source parameters have progressively imposed the concept of seismic asperities embedded in a creeping fault segment as being one of the most important aspect that should appear in a realistic representation of micro-seismic sources. Another important issue concerning micro-seismic activity is the existence of robust empirical laws describing the temporal and magnitude distribution of earthquakes, such as the Omori law, the distribution of inter-event time and the Gutenberg-Richter law. In this framework, this study aims at understanding statistical properties of earthquakes, by generating synthetic catalogs with a 3D, quasi-dynamic continuous rate and state asperity model, that takes into account a realistic geometry of asperities. Our approach contrasts with ETAS models (Kagan and Knopoff, 1981) usually implemented to produce earthquake catalogs, in the sense that the non linearity observed in rock friction experiments (Dieterich, 1979) is fully taken into account by the use of rate and state friction law. Furthermore, our model differs from discrete models of faults (Ziv and Cochard, 2006) because the continuity allows us to define realistic geometries and distributions of asperities by the assembling of sub-critical computational cells that always fail in a single event. Moreover, this model allows us to adress the question of the influence of barriers and distribution of asperities on the event statistics. After recalling the main observations of asperities in the specific case of Parkfield segment of San-Andreas Fault, we analyse earthquake statistical properties computed for this area. Then, we present synthetic statistics obtained by our model that allow us to discuss the role of barriers on clustering and triggering phenomena among a population of sources. It appears that an effective size of barrier, that depends on its frictional strength, controls the presence or the absence, in the synthetic catalog, of statistical laws that are similar to what is observed for real earthquakes. As an application, we attempt to draw a comparison between synthetic statistics and the observed statistics of Parkfield in order to characterize what could be a realistic frictional model of Parkfield area. More generally, we obtained synthetic statistical properties that are in agreement with power-law decays characterized by exponents that match the observations at a global scale, showing that our mechanical model is able to provide new insights into the understanding of earthquake interaction processes in general.
Load balancing for massively-parallel soft-real-time systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hailperin, M.

1988-09-01

Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less
Batch Statistical Process Monitoring Approach to a Cocrystallization Process.

PubMed

Sarraguça, Mafalda C; Ribeiro, Paulo R S; Dos Santos, Adenilson O; Lopes, João A

2015-12-01

Cocrystals are defined as crystalline structures composed of two or more compounds that are solid at room temperature held together by noncovalent bonds. Their main advantages are the increase of solubility, bioavailability, permeability, stability, and at the same time retaining active pharmaceutical ingredient bioactivity. The cocrystallization between furosemide and nicotinamide by solvent evaporation was monitored on-line using near-infrared spectroscopy (NIRS) as a process analytical technology tool. The near-infrared spectra were analyzed using principal component analysis. Batch statistical process monitoring was used to create control charts to perceive the process trajectory and define control limits. Normal and non-normal operating condition batches were performed and monitored with NIRS. The use of NIRS associated with batch statistical process models allowed the detection of abnormal variations in critical process parameters, like the amount of solvent or amount of initial components present in the cocrystallization. © 2015 Wiley Periodicals, Inc. and the American Pharmacists Association.
Computational methods to extract meaning from text and advance theories of human cognition.

PubMed

McNamara, Danielle S

2011-01-01

Over the past two decades, researchers have made great advances in the area of computational methods for extracting meaning from text. This research has to a large extent been spurred by the development of latent semantic analysis (LSA), a method for extracting and representing the meaning of words using statistical computations applied to large corpora of text. Since the advent of LSA, researchers have developed and tested alternative statistical methods designed to detect and analyze meaning in text corpora. This research exemplifies how statistical models of semantics play an important role in our understanding of cognition and contribute to the field of cognitive science. Importantly, these models afford large-scale representations of human knowledge and allow researchers to explore various questions regarding knowledge, discourse processing, text comprehension, and language. This topic includes the latest progress by the leading researchers in the endeavor to go beyond LSA. Copyright © 2010 Cognitive Science Society, Inc.
Dangerous "spin": the probability myth of evidence-based prescribing - a Merleau-Pontyian approach.

PubMed

Morstyn, Ron

2011-08-01

The aim of this study was to examine logical positivist statistical probability statements used to support and justify "evidence-based" prescribing rules in psychiatry when viewed from the major philosophical theories of probability, and to propose "phenomenological probability" based on Maurice Merleau-Ponty's philosophy of "phenomenological positivism" as a better clinical and ethical basis for psychiatric prescribing. The logical positivist statistical probability statements which are currently used to support "evidence-based" prescribing rules in psychiatry have little clinical or ethical justification when subjected to critical analysis from any of the major theories of probability and represent dangerous "spin" because they necessarily exclude the individual , intersubjective and ambiguous meaning of mental illness. A concept of "phenomenological probability" founded on Merleau-Ponty's philosophy of "phenomenological positivism" overcomes the clinically destructive "objectivist" and "subjectivist" consequences of logical positivist statistical probability and allows psychopharmacological treatments to be appropriately integrated into psychiatric treatment.
Online Updating of Statistical Inference in the Big Data Setting.

PubMed

Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

2016-01-01

We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
Statistical analysis of QC data and estimation of fuel rod behaviour

NASA Astrophysics Data System (ADS)

Heins, L.; Groβ, H.; Nissen, K.; Wunderlich, F.

1991-02-01

The behaviour of fuel rods while in reactor is influenced by many parameters. As far as fabrication is concerned, fuel pellet diameter and density, and inner cladding diameter are important examples. Statistical analyses of quality control data show a scatter of these parameters within the specified tolerances. At present it is common practice to use a combination of superimposed unfavorable tolerance limits (worst case dataset) in fuel rod design calculations. Distributions are not considered. The results obtained in this way are very conservative but the degree of conservatism is difficult to quantify. Probabilistic calculations based on distributions allow the replacement of the worst case dataset by a dataset leading to results with known, defined conservatism. This is achieved by response surface methods and Monte Carlo calculations on the basis of statistical distributions of the important input parameters. The procedure is illustrated by means of two examples.
Comparing meta-analysis and ecological-longitudinal analysis in time-series studies. A case study of the effects of air pollution on mortality in three Spanish cities.

PubMed

Saez, M; Figueiras, A; Ballester, F; Pérez-Hoyos, S; Ocaña, R; Tobías, A

2001-06-01

The objective of this paper is to introduce a different approach, called the ecological-longitudinal, to carrying out pooled analysis in time series ecological studies. Because it gives a larger number of data points and, hence, increases the statistical power of the analysis, this approach, unlike conventional ones, allows the complementation of aspects such as accommodation of random effect models, of lags, of interaction between pollutants and between pollutants and meteorological variables, that are hardly implemented in conventional approaches. The approach is illustrated by providing quantitative estimates of the short-term effects of air pollution on mortality in three Spanish cities, Barcelona, Valencia and Vigo, for the period 1992-1994. Because the dependent variable was a count, a Poisson generalised linear model was first specified. Several modelling issues are worth mentioning. Firstly, because the relations between mortality and explanatory variables were non-linear, cubic splines were used for covariate control, leading to a generalised additive model, GAM. Secondly, the effects of the predictors on the response were allowed to occur with some lag. Thirdly, the residual autocorrelation, because of imperfect control, was controlled for by means of an autoregressive Poisson GAM. Finally, the longitudinal design demanded the consideration of the existence of individual heterogeneity, requiring the consideration of mixed models. The estimates of the relative risks obtained from the individual analyses varied across cities, particularly those associated with sulphur dioxide. The highest relative risks corresponded to black smoke in Valencia. These estimates were higher than those obtained from the ecological-longitudinal analysis. Relative risks estimated from this latter analysis were practically identical across cities, 1.00638 (95% confidence intervals 1.0002, 1.0011) for a black smoke increase of 10 microg/m(3) and 1.00415 (95% CI 1.0001, 1.0007) for a increase of 10 microg/m(3) of sulphur dioxide. Because the statistical power is higher than in the individual analysis more interactions were statistically significant, especially those among air pollutants and meteorological variables. Air pollutant levels were related to mortality in the three cities of the study, Barcelona, Valencia and Vigo. These results were consistent with similar studies in other cities, with other multicentric studies and coherent with both, previous individual, for each city, and multicentric studies for all three cities.
Modeling the Space Debris Environment with MASTER-2009 and ORDEM2010

NASA Technical Reports Server (NTRS)

Flegel, S.; Gelhaus, J.; Wiedemann, C.; Mockel, M.; Vorsmann, P.; Krisko, P.; Xu, Y. -L.; Horstman, M. F.; Opiela, J. N.; Matney, M.;

2010-01-01

Spacecraft analysis using ORDEM2010 uses a high-fidelity population model to compute risk to on-orbit assets. The ORDEM2010 GUI allows visualization of spacecraft flux in 2-D and 1-D. The population was produced using a Bayesian statistical approach with measured and modeled environment data. Validation of sizes < 1mm were performed using Shuttle window and radiator impact measurements. Validation of sizes > 1mm is on-going.

CalFitter: a web server for analysis of protein thermal denaturation data.

PubMed

Mazurenko, Stanislav; Stourac, Jan; Kunka, Antonin; Nedeljkovic, Sava; Bednar, David; Prokop, Zbynek; Damborsky, Jiri

2018-05-14

Despite significant advances in the understanding of protein structure-function relationships, revealing protein folding pathways still poses a challenge due to a limited number of relevant experimental tools. Widely-used experimental techniques, such as calorimetry or spectroscopy, critically depend on a proper data analysis. Currently, there are only separate data analysis tools available for each type of experiment with a limited model selection. To address this problem, we have developed the CalFitter web server to be a unified platform for comprehensive data fitting and analysis of protein thermal denaturation data. The server allows simultaneous global data fitting using any combination of input data types and offers 12 protein unfolding pathway models for selection, including irreversible transitions often missing from other tools. The data fitting produces optimal parameter values, their confidence intervals, and statistical information to define unfolding pathways. The server provides an interactive and easy-to-use interface that allows users to directly analyse input datasets and simulate modelled output based on the model parameters. CalFitter web server is available free at https://loschmidt.chemi.muni.cz/calfitter/.
Diverse types of genetic variation converge on functional gene networks involved in schizophrenia.

PubMed

Gilman, Sarah R; Chang, Jonathan; Xu, Bin; Bawa, Tejdeep S; Gogos, Joseph A; Karayiorgou, Maria; Vitkup, Dennis

2012-12-01

Despite the successful identification of several relevant genomic loci, the underlying molecular mechanisms of schizophrenia remain largely unclear. We developed a computational approach (NETBAG+) that allows an integrated analysis of diverse disease-related genetic data using a unified statistical framework. The application of this approach to schizophrenia-associated genetic variations, obtained using unbiased whole-genome methods, allowed us to identify several cohesive gene networks related to axon guidance, neuronal cell mobility, synaptic function and chromosomal remodeling. The genes forming the networks are highly expressed in the brain, with higher brain expression during prenatal development. The identified networks are functionally related to genes previously implicated in schizophrenia, autism and intellectual disability. A comparative analysis of copy number variants associated with autism and schizophrenia suggests that although the molecular networks implicated in these distinct disorders may be related, the mutations associated with each disease are likely to lead, at least on average, to different functional consequences.
Using volcano plots and regularized-chi statistics in genetic association studies.

PubMed

Li, Wentian; Freudenberg, Jan; Suh, Young Ju; Yang, Yaning

2014-02-01

Labor intensive experiments are typically required to identify the causal disease variants from a list of disease associated variants in the genome. For designing such experiments, candidate variants are ranked by their strength of genetic association with the disease. However, the two commonly used measures of genetic association, the odds-ratio (OR) and p-value may rank variants in different order. To integrate these two measures into a single analysis, here we transfer the volcano plot methodology from gene expression analysis to genetic association studies. In its original setting, volcano plots are scatter plots of fold-change and t-test statistic (or -log of the p-value), with the latter being more sensitive to sample size. In genetic association studies, the OR and Pearson's chi-square statistic (or equivalently its square root, chi; or the standardized log(OR)) can be analogously used in a volcano plot, allowing for their visual inspection. Moreover, the geometric interpretation of these plots leads to an intuitive method for filtering results by a combination of both OR and chi-square statistic, which we term "regularized-chi". This method selects associated markers by a smooth curve in the volcano plot instead of the right-angled lines which corresponds to independent cutoffs for OR and chi-square statistic. The regularized-chi incorporates relatively more signals from variants with lower minor-allele-frequencies than chi-square test statistic. As rare variants tend to have stronger functional effects, regularized-chi is better suited to the task of prioritization of candidate genes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Comparative efficacy of golimumab, infliximab, and adalimumab for moderately to severely active ulcerative colitis: a network meta-analysis accounting for differences in trial designs.

PubMed

Thorlund, Kristian; Druyts, Eric; Toor, Kabirraaj; Mills, Edward J

2015-05-01

To conduct a network meta-analysis (NMA) to establish the comparative efficacy of infliximab, adalimumab and golimumab for the treatment of moderately to severely active ulcerative colitis (UC). A systematic literature search identified five randomized controlled trials for inclusion in the NMA. One trial assessed golimumab, two assessed infliximab and two assessed adalimumab. Outcomes included clinical response, clinical remission, mucosal healing, sustained clinical response and sustained clinical remission. Innovative methods were used to allow inclusion of the golimumab trial data given the alternative design of this trial (i.e., two-stage re-randomization). After induction, no statistically significant differences were found between golimumab and adalimumab or between golimumab and infliximab. Infliximab was statistically superior to adalimumab after induction for all outcomes and treatment ranking suggested infliximab as the superior treatment for induction. Golimumab and infliximab were associated with similar efficacy for achieving maintained clinical remission and sustained clinical remission, whereas adalimumab was not significantly better than placebo for sustained clinical remission. Golimumab and infliximab were also associated with similar efficacy for achieving maintained clinical response, sustained clinical response and mucosal healing. Finally, golimumab 50 and 100 mg was statistically superior to adalimumab for clinical response and sustained clinical response, and golimumab 100 mg was also statistically superior to adalimumab for mucosal healing. The results of our NMA suggest that infliximab was statistically superior to adalimumab after induction, and that golimumab was statistically superior to adalimumab for sustained outcomes. Golimumab and infliximab appeared comparable in efficacy.
Water Polo Game-Related Statistics in Women’s International Championships: Differences and Discriminatory Power

PubMed Central

Escalante, Yolanda; Saavedra, Jose M.; Tella, Victor; Mansilla, Mirella; García-Hermoso, Antonio; Dominguez, Ana M.

2012-01-01

The aims of this study were (i) to compare women’s water polo game-related statistics by match outcome (winning and losing teams) and phase (preliminary, classificatory, and semi-final/bronze medal/gold medal), and (ii) identify characteristics that discriminate performances for each phase. The game-related statistics of the 124 women’s matches played in five International Championships (World and European Championships) were analyzed. Differences between winning and losing teams in each phase were determined using the chi-squared. A discriminant analysis was then performed according to context in each of the three phases. It was found that the game-related statistics differentiate the winning from the losing teams in each phase of an international championship. The differentiating variables were both offensive (centre goals, power-play goals, counterattack goal, assists, offensive fouls, steals, blocked shots, and won sprints) and defensive (goalkeeper-blocked shots, goalkeeper-blocked inferiority shots, and goalkeeper-blocked 5-m shots). The discriminant analysis showed the game-related statistics to discriminate performance in all phases: preliminary, classificatory, and final phases (92%, 90%, and 83%, respectively). Two variables were discriminatory by match outcome (winning or losing teams) in all three phases: goals and goalkeeper-blocked shots. Key pointsThe preliminary phase that more than one variable was involved in this differentiation, including both offensive and defensive aspects of the game.The game-related statistics were found to have a high discriminatory power in predicting the result of matches with shots and goalkeeper-blocked shots being discriminatory variables in all three phases.Knowledge of the characteristics of women’s water polo game-related statistics of the winning teams and their power to predict match outcomes will allow coaches to take these characteristics into account when planning training and match preparation. PMID:24149356
Evolutionary computing for the design search and optimization of space vehicle power subsystems

NASA Technical Reports Server (NTRS)

Kordon, Mark; Klimeck, Gerhard; Hanks, David; Hua, Hook

2004-01-01

Evolutionary computing has proven to be a straightforward and robust approach for optimizing a wide range of difficult analysis and design problems. This paper discusses the application of these techniques to an existing space vehicle power subsystem resource and performance analysis simulation in a parallel processing environment. Out preliminary results demonstrate that this approach has the potential to improve the space system trade study process by allowing engineers to statistically weight subsystem goals of mass, cost and performance then automatically size power elements based on anticipated performance of the subsystem rather than on worst-case estimates.
Analysis of Yb3+/Er3+-codoped microring resonator cross-grid matrices

NASA Astrophysics Data System (ADS)

Vallés, Juan A.; Gǎlǎtuş, Ramona

2014-09-01

An analytic model of the scattering response of a highly Yb3+/Er3+-codoped phosphate glass microring resonator matrix is considered to obtain the transfer functions of an M x N cross-grid microring resonator structure. Then a detailed model is used to calculate the pump and signal propagation, including a microscopic statistical formalism to describe the high-concentration induced energy-transfer mechanisms and passive and active features are combined to realistically simulate the performance as a wavelength-selective amplifier or laser. This analysis allows the optimization of these structures for telecom or sensing applications.
Impact of spurious shear on cosmological parameter estimates from weak lensing observables

DOE PAGES

Petri, Andrea; May, Morgan; Haiman, Zoltán; ...

2014-12-30

We research, residual errors in shear measurements, after corrections for instrument systematics and atmospheric effects, can impact cosmological parameters derived from weak lensing observations. Here we combine convergence maps from our suite of ray-tracing simulations with random realizations of spurious shear. This allows us to quantify the errors and biases of the triplet (Ω m,w,σ 8) derived from the power spectrum (PS), as well as from three different sets of non-Gaussian statistics of the lensing convergence field: Minkowski functionals (MFs), low-order moments (LMs), and peak counts (PKs). Our main results are as follows: (i) We find an order of magnitudemore » smaller biases from the PS than in previous work. (ii) The PS and LM yield biases much smaller than the morphological statistics (MF, PK). (iii) For strictly Gaussian spurious shear with integrated amplitude as low as its current estimate of σ sys 2 ≈ 10 -7, biases from the PS and LM would be unimportant even for a survey with the statistical power of Large Synoptic Survey Telescope. However, we find that for surveys larger than ≈ 100 deg 2, non-Gaussianity in the noise (not included in our analysis) will likely be important and must be quantified to assess the biases. (iv) The morphological statistics (MF, PK) introduce important biases even for Gaussian noise, which must be corrected in large surveys. The biases are in different directions in (Ωm,w,σ8) parameter space, allowing self-calibration by combining multiple statistics. Our results warrant follow-up studies with more extensive lensing simulations and more accurate spurious shear estimates.« less

Universal statistics of terminal dynamics before collapse

NASA Astrophysics Data System (ADS)

Lenner, Nicolas; Eule, Stephan; Wolf, Fred

Recent biological developments have both drastically increased the precision as well as amount of generated data, allowing for a switching from pure mean value characterization of the process under consideration to an analysis of the whole ensemble, exploiting the stochastic nature of biology. We focus on the general class of non-equilibrium processes with distinguished terminal points as can be found in cell fate decision, check points or cognitive neuroscience. Aligning the data to a terminal point (e.g. represented as an absorbing boundary) allows to device a general methodology to characterize and reverse engineer the terminating history. Using a small noise approximation we derive mean variance and covariance of the aligned data for general finite time singularities.
An efficient method for removing point sources from full-sky radio interferometric maps

NASA Astrophysics Data System (ADS)

Berger, Philippe; Oppermann, Niels; Pen, Ue-Li; Shaw, J. Richard

2017-12-01

A new generation of wide-field radio interferometers designed for 21-cm surveys is being built as drift scan instruments allowing them to observe large fractions of the sky. With large numbers of antennas and frequency channels, the enormous instantaneous data rates of these telescopes require novel, efficient, data management and analysis techniques. The m-mode formalism exploits the periodicity of such data with the sidereal day, combined with the assumption of statistical isotropy of the sky, to achieve large computational savings and render optimal analysis methods computationally tractable. We present an extension to that work that allows us to adopt a more realistic sky model and treat objects such as bright point sources. We develop a linear procedure for deconvolving maps, using a Wiener filter reconstruction technique, which simultaneously allows filtering of these unwanted components. We construct an algorithm, based on the Sherman-Morrison-Woodbury formula, to efficiently invert the data covariance matrix, as required for any optimal signal-to-noise ratio weighting. The performance of our algorithm is demonstrated using simulations of a cylindrical transit telescope.
A new generation scanning system for the high-speed analysis of nuclear emulsions

NASA Astrophysics Data System (ADS)

Alexandrov, A.; Buonaura, A.; Consiglio, L.; D'Ambrosio, N.; De Lellis, G.; Di Crescenzo, A.; Galati, G.; Lauria, A.; Montesi, M. C.; Tioukov, V.; Vladymyrov, M.

2016-06-01

The development of automatic scanning systems was a fundamental issue for large scale neutrino detectors exploiting nuclear emulsions as particle trackers. Such systems speed up significantly the event analysis in emulsion, allowing the feasibility of experiments with unprecedented statistics. In the early 1990s, R&D programs were carried out by Japanese and European laboratories leading to automatic scanning systems more and more efficient. The recent progress in the technology of digital signal processing and of image acquisition allows the fulfillment of new systems with higher performances. In this paper we report the description and the performance of a new generation scanning system able to operate at the record speed of 84 cm2/hour and based on the Large Angle Scanning System for OPERA (LASSO) software infrastructure developed by the Naples scanning group. Such improvement, reduces the scanning time by a factor 4 with respect to the available systems, allowing the readout of huge amount of nuclear emulsions in reasonable time. This opens new perspectives for the employment of such detectors in a wider variety of applications.
A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants.

PubMed

Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

2014-04-01

Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.
A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants

PubMed Central

Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

2014-01-01

Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided. PMID:24834325
EUCLID: an outcome analysis tool for high-dimensional clinical studies

NASA Astrophysics Data System (ADS)

Gayou, Olivier; Parda, David S.; Miften, Moyed

2007-03-01

Treatment management decisions in three-dimensional conformal radiation therapy (3DCRT) and intensity-modulated radiation therapy (IMRT) are usually made based on the dose distributions in the target and surrounding normal tissue. These decisions may include, for example, the choice of one treatment over another and the level of tumour dose escalation. Furthermore, biological predictors such as tumour control probability (TCP) and normal tissue complication probability (NTCP), whose parameters available in the literature are only population-based estimates, are often used to assess and compare plans. However, a number of other clinical, biological and physiological factors also affect the outcome of radiotherapy treatment and are often not considered in the treatment planning and evaluation process. A statistical outcome analysis tool, EUCLID, for direct use by radiation oncologists and medical physicists was developed. The tool builds a mathematical model to predict an outcome probability based on a large number of clinical, biological, physiological and dosimetric factors. EUCLID can first analyse a large set of patients, such as from a clinical trial, to derive regression correlation coefficients between these factors and a given outcome. It can then apply such a model to an individual patient at the time of treatment to derive the probability of that outcome, allowing the physician to individualize the treatment based on medical evidence that encompasses a wide range of factors. The software's flexibility allows the clinicians to explore several avenues to select the best predictors of a given outcome. Its link to record-and-verify systems and data spreadsheets allows for a rapid and practical data collection and manipulation. A wide range of statistical information about the study population, including demographics and correlations between different factors, is available. A large number of one- and two-dimensional plots, histograms and survival curves allow for an easy visual analysis of the population. Several visual and analytical methods are available to quantify the predictive power of the multivariate regression model. The EUCLID tool can be readily integrated with treatment planning and record-and-verify systems.
EUCLID: an outcome analysis tool for high-dimensional clinical studies.

PubMed

Gayou, Olivier; Parda, David S; Miften, Moyed

2007-03-21

Treatment management decisions in three-dimensional conformal radiation therapy (3DCRT) and intensity-modulated radiation therapy (IMRT) are usually made based on the dose distributions in the target and surrounding normal tissue. These decisions may include, for example, the choice of one treatment over another and the level of tumour dose escalation. Furthermore, biological predictors such as tumour control probability (TCP) and normal tissue complication probability (NTCP), whose parameters available in the literature are only population-based estimates, are often used to assess and compare plans. However, a number of other clinical, biological and physiological factors also affect the outcome of radiotherapy treatment and are often not considered in the treatment planning and evaluation process. A statistical outcome analysis tool, EUCLID, for direct use by radiation oncologists and medical physicists was developed. The tool builds a mathematical model to predict an outcome probability based on a large number of clinical, biological, physiological and dosimetric factors. EUCLID can first analyse a large set of patients, such as from a clinical trial, to derive regression correlation coefficients between these factors and a given outcome. It can then apply such a model to an individual patient at the time of treatment to derive the probability of that outcome, allowing the physician to individualize the treatment based on medical evidence that encompasses a wide range of factors. The software's flexibility allows the clinicians to explore several avenues to select the best predictors of a given outcome. Its link to record-and-verify systems and data spreadsheets allows for a rapid and practical data collection and manipulation. A wide range of statistical information about the study population, including demographics and correlations between different factors, is available. A large number of one- and two-dimensional plots, histograms and survival curves allow for an easy visual analysis of the population. Several visual and analytical methods are available to quantify the predictive power of the multivariate regression model. The EUCLID tool can be readily integrated with treatment planning and record-and-verify systems.
SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

PubMed

Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice

2011-05-05

High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

PubMed Central

Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.

2016-01-01

Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095
Detecting Disease Specific Pathway Substructures through an Integrated Systems Biology Approach

PubMed Central

Alaimo, Salvatore; Marceca, Gioacchino Paolo; Ferro, Alfredo; Pulvirenti, Alfredo

2017-01-01

In the era of network medicine, pathway analysis methods play a central role in the prediction of phenotype from high throughput experiments. In this paper, we present a network-based systems biology approach capable of extracting disease-perturbed subpathways within pathway networks in connection with expression data taken from The Cancer Genome Atlas (TCGA). Our system extends pathways with missing regulatory elements, such as microRNAs, and their interactions with genes. The framework enables the extraction, visualization, and analysis of statistically significant disease-specific subpathways through an easy to use web interface. Our analysis shows that the methodology is able to fill the gap in current techniques, allowing a more comprehensive analysis of the phenomena underlying disease states. PMID:29657291
MAGMA: Generalized Gene-Set Analysis of GWAS Data

PubMed Central

de Leeuw, Christiaan A.; Mooij, Joris M.; Heskes, Tom; Posthuma, Danielle

2015-01-01

By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn’s Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn’s Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn’s Disease data was found to be considerably faster as well. PMID:25885710
MAGMA: generalized gene-set analysis of GWAS data.

PubMed

de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

2015-04-01

By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.
The mare reproductive loss syndrome and the eastern tent caterpillar: a toxicokinetic/statistical analysis with clinical, epidemiologic, and mechanistic implications.

PubMed

Sebastian, Manu; Gantz, Marie G; Tobin, Thomas; Harkins, J Daniel; Bosken, Jeffrey M; Hughes, Charlie; Harrison, Lenn R; Bernard, William V; Richter, Dana L; Fitzgerald, Terrence D

2003-01-01

During 2001, central Kentucky experienced acute transient epidemics of early and late fetal losses, pericarditis, and unilateral endophthalmitis, collectively referred to as mare reproductive loss syndrome (MRLS). A toxicokinetic/statistical analysis of experimental and field MRLS data was conducted using accelerated failure time (AFT) analysis of abortions following administration of Eastern tent caterpillars (ETCs; 100 or 50 g/day or 100 g of irradiated caterpillars/day) to late-term pregnant mares. In addition, 2001 late-term fetal loss field data were used in the analysis. Experimental data were fitted by AFT analysis at a high (P <.0001) significance. Times to first abortion ("lag time") and abortion rates were dose dependent. Lag times decreased and abortion rates increased exponentially with dose. Calculated dose x response data curves allow interpretation of abortion data in terms of "intubated ETC equivalents." Analysis suggested that field exposure to ETCs in 2001 in central Kentucky commenced on approximately April 27, was initially equivalent to approximately 5 g of intubated ETCs/day, and increased to approximately 30 g/day at the outbreak peak. This analysis accounts for many aspects of the epidemiology, clinical presentations, and manifestations of MRLS. It allows quantitative interpretation of experimental and field MRLS data and has implications for the basic mechanisms underlying MRLS. The results support suggestions that MRLS is caused by exposure to or ingestion of ETCs. The results also show that high levels of ETC exposure produce intense, focused outbreaks of MRLS, closely linked in time and place to dispersing ETCs, as occurred in central Kentucky in 2001. With less intense exposure, lag time is longer and abortions tend to spread out over time and may occur out of phase with ETC exposure, obscuring both diagnosis of this syndrome and the role of the caterpillars.
Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Na, Seungjin; Payne, Samuel H.; Bandeira, Nuno

The spectral networks approach enables the detection of pairs of spectra from related peptides and thus allows for the propagation of annotations from identified peptides to unidentified spectra. Beyond allowing for unbiased discovery of unexpected post-translational modifications, spectral networks are also applicable to multi-species comparative proteomics or metaproteomics to identify numerous orthologous versions of a protein. We present algorithmic and statistical advances in spectral networks that have made it possible to rigorously assess the statistical significance of spectral pairs and accurately estimate the error rate of identifications via propagation. In the analysis of three related Cyanothece species, a model organismmore » for biohydrogen production, spectral networks identified peptides with highly divergent sequences with up to dozens of variants per peptide, including many novel peptides in species that lack a sequenced genome. Furthermore, spectral networks strongly suggested the presence of novel peptides even in genomically characterized species (i.e. missing from databases) in that a significant portion of unidentified multi-species networks included at least two polymorphic peptide variants.« less
Bi-PROF

PubMed Central

Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha

2013-01-01

The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588
Genetic Simulation Resources: a website for the registration and discovery of genetic data simulators

PubMed Central

Peng, Bo; Chen, Huann-Sheng; Mechanic, Leah E.; Racine, Ben; Clarke, John; Clarke, Lauren; Gillanders, Elizabeth; Feuer, Eric J.

2013-01-01

Summary: Many simulation methods and programs have been developed to simulate genetic data of the human genome. These data have been widely used, for example, to predict properties of populations retrospectively or prospectively according to mathematically intractable genetic models, and to assist the validation, statistical inference and power analysis of a variety of statistical models. However, owing to the differences in type of genetic data of interest, simulation methods, evolutionary features, input and output formats, terminologies and assumptions for different applications, choosing the right tool for a particular study can be a resource-intensive process that usually involves searching, downloading and testing many different simulation programs. Genetic Simulation Resources (GSR) is a website provided by the National Cancer Institute (NCI) that aims to help researchers compare and choose the appropriate simulation tools for their studies. This website allows authors of simulation software to register their applications and describe them with well-defined attributes, thus allowing site users to search and compare simulators according to specified features. Availability: http://popmodels.cancercontrol.cancer.gov/gsr. Contact: gsr@mail.nih.gov PMID:23435068
SEDA: A software package for the Statistical Earthquake Data Analysis

NASA Astrophysics Data System (ADS)

Lombardi, A. M.

2017-03-01

In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package.
Blueprint XAS: a Matlab-based toolbox for the fitting and analysis of XAS spectra.

PubMed

Delgado-Jaime, Mario Ulises; Mewis, Craig Philip; Kennepohl, Pierre

2010-01-01

Blueprint XAS is a new Matlab-based program developed to fit and analyse X-ray absorption spectroscopy (XAS) data, most specifically in the near-edge region of the spectrum. The program is based on a methodology that introduces a novel background model into the complete fit model and that is capable of generating any number of independent fits with minimal introduction of user bias [Delgado-Jaime & Kennepohl (2010), J. Synchrotron Rad. 17, 119-128]. The functions and settings on the five panels of its graphical user interface are designed to suit the needs of near-edge XAS data analyzers. A batch function allows for the setting of multiple jobs to be run with Matlab in the background. A unique statistics panel allows the user to analyse a family of independent fits, to evaluate fit models and to draw statistically supported conclusions. The version introduced here (v0.2) is currently a toolbox for Matlab. Future stand-alone versions of the program will also incorporate several other new features to create a full package of tools for XAS data processing.
Interchangeability of Procalcitonin Measurements Using the Point of Care Testing i-CHROMATM Reader and the Automated Liaison XL.

PubMed

Stenner, Elisabetta; Barbati, Giulia; West, Nicole; Ben, Fabia Del; Martin, Francesca; Ruscio, Maurizio

2018-06-01

Our aim was to verify if procalcitonin (PCT) measurements using the new point-of-care testing i-CHROMATM are interchangeable with those of Liaison XL. One hundred seventeen serum samples were processed sequentially on a Liaison XL and i-CHROMATM. Statistical analysis was done using the Passing-Bablok regression, Bland-Altman test, and Cohen's Kappa statistic. Proportional and constant differences were observed between i-CHROMATM and Liaison XL. The 95% CI of the mean bias% was very large, exceeding the maximum allowable TE% and the clinical reference change value. However, the concordance between methods at the clinical relevant cutoffs was strong, with the exception of the 0.25 ng/mL cutoff which was moderate. Our data suggest that i-CHROMATM is not interchangeable with Liaison XL. However, while the strong concordance at the clinical relevant cutoffs allows us to consider i-CHROMATM a suitable option to Liaison XL to support clinicians' decision-making; nevertheless, the moderate agreement at the 0.25 ng/mL cutoff recommends caution in interpreting the data around this cutoff.
SEDA: A software package for the Statistical Earthquake Data Analysis

PubMed Central

Lombardi, A. M.

2017-01-01

In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package. PMID:28290482

Condenser: a statistical aggregation tool for multi-sample quantitative proteomic data from Matrix Science Mascot Distiller™.

PubMed

Knudsen, Anders Dahl; Bennike, Tue; Kjeldal, Henrik; Birkelund, Svend; Otzen, Daniel Erik; Stensballe, Allan

2014-05-30

We describe Condenser, a freely available, comprehensive open-source tool for merging multidimensional quantitative proteomics data from the Matrix Science Mascot Distiller Quantitation Toolbox into a common format ready for subsequent bioinformatic analysis. A number of different relative quantitation technologies, such as metabolic (15)N and amino acid stable isotope incorporation, label-free and chemical-label quantitation are supported. The program features multiple options for curative filtering of the quantified peptides, allowing the user to choose data quality thresholds appropriate for the current dataset, and ensure the quality of the calculated relative protein abundances. Condenser also features optional global normalization, peptide outlier removal, multiple testing and calculation of t-test statistics for highlighting and evaluating proteins with significantly altered relative protein abundances. Condenser provides an attractive addition to the gold-standard quantitative workflow of Mascot Distiller, allowing easy handling of larger multi-dimensional experiments. Source code, binaries, test data set and documentation are available at http://condenser.googlecode.com/. Copyright © 2014 Elsevier B.V. All rights reserved.
Constraining sterile neutrinos with AMANDA and IceCube atmospheric neutrino data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Esmaili, Arman; Peres, O.L.G.; Halzen, Francis, E-mail: aesmaili@ifi.unicamp.br, E-mail: halzen@icecube.wisc.edu, E-mail: orlando@ifi.unicamp.br

2012-11-01

We demonstrate that atmospheric neutrino data accumulated with the AMANDA and the partially deployed IceCube experiments constrain the allowed parameter space for a hypothesized fourth sterile neutrino beyond the reach of a combined analysis of all other experiments, for Δm{sup 2}{sub 41}∼<1 eV{sup 2}. Although the IceCube data wins the statistics in the analysis, the advantage of a combined analysis of AMANDA and IceCube data is the partial remedy of yet unknown instrumental systematic uncertainties. We also illustrate the sensitivity of the completed IceCube detector, that is now taking data, to the parameter space of 3+1 model.
FADTTS: functional analysis of diffusion tensor tract statistics.

PubMed

Zhu, Hongtu; Kong, Linglong; Li, Runze; Styner, Martin; Gerig, Guido; Lin, Weili; Gilmore, John H

2011-06-01

The aim of this paper is to present a functional analysis of a diffusion tensor tract statistics (FADTTS) pipeline for delineating the association between multiple diffusion properties along major white matter fiber bundles with a set of covariates of interest, such as age, diagnostic status and gender, and the structure of the variability of these white matter tract properties in various diffusion tensor imaging studies. The FADTTS integrates five statistical tools: (i) a multivariate varying coefficient model for allowing the varying coefficient functions in terms of arc length to characterize the varying associations between fiber bundle diffusion properties and a set of covariates, (ii) a weighted least squares estimation of the varying coefficient functions, (iii) a functional principal component analysis to delineate the structure of the variability in fiber bundle diffusion properties, (iv) a global test statistic to test hypotheses of interest, and (v) a simultaneous confidence band to quantify the uncertainty in the estimated coefficient functions. Simulated data are used to evaluate the finite sample performance of FADTTS. We apply FADTTS to investigate the development of white matter diffusivities along the splenium of the corpus callosum tract and the right internal capsule tract in a clinical study of neurodevelopment. FADTTS can be used to facilitate the understanding of normal brain development, the neural bases of neuropsychiatric disorders, and the joint effects of environmental and genetic factors on white matter fiber bundles. The advantages of FADTTS compared with the other existing approaches are that they are capable of modeling the structured inter-subject variability, testing the joint effects, and constructing their simultaneous confidence bands. However, FADTTS is not crucial for estimation and reduces to the functional analysis method for the single measure. Copyright © 2011 Elsevier Inc. All rights reserved.
Methods of Information Geometry to model complex shapes

NASA Astrophysics Data System (ADS)

De Sanctis, A.; Gattone, S. A.

2016-09-01

In this paper, a new statistical method to model patterns emerging in complex systems is proposed. A framework for shape analysis of 2- dimensional landmark data is introduced, in which each landmark is represented by a bivariate Gaussian distribution. From Information Geometry we know that Fisher-Rao metric endows the statistical manifold of parameters of a family of probability distributions with a Riemannian metric. Thus this approach allows to reconstruct the intermediate steps in the evolution between observed shapes by computing the geodesic, with respect to the Fisher-Rao metric, between the corresponding distributions. Furthermore, the geodesic path can be used for shape predictions. As application, we study the evolution of the rat skull shape. A future application in Ophthalmology is introduced.
Laser ektacytometry and evaluation of statistical characteristics of inhomogeneous ensembles of red blood cells

NASA Astrophysics Data System (ADS)

Nikitin, S. Yu.; Priezzhev, A. V.; Lugovtsov, A. E.; Ustinov, V. D.; Razgulin, A. V.

2014-10-01

The paper is devoted to development of the laser ektacytometry technique for evaluation of the statistical characteristics of inhomogeneous ensembles of red blood cells (RBCs). We have analyzed theoretically laser beam scattering by the inhomogeneous ensembles of elliptical discs, modeling red blood cells in the ektacytometer. The analysis shows that the laser ektacytometry technique allows for quantitative evaluation of such population characteristics of RBCs as the cells mean shape, the cells deformability variance and asymmetry of the cells distribution in the deformability. Moreover, we show that the deformability distribution itself can be retrieved by solving a specific Fredholm integral equation of the first kind. At this stage we do not take into account the scatter in the RBC sizes.
Investigating the collision energy dependence of η /s in the beam energy scan at the BNL Relativistic Heavy Ion Collider using Bayesian statistics

NASA Astrophysics Data System (ADS)

Auvinen, Jussi; Bernhard, Jonah E.; Bass, Steffen A.; Karpenko, Iurii

2018-04-01

We determine the probability distributions of the shear viscosity over the entropy density ratio η /s in the quark-gluon plasma formed in Au + Au collisions at √{sN N}=19.6 ,39 , and 62.4 GeV , using Bayesian inference and Gaussian process emulators for a model-to-data statistical analysis that probes the full input parameter space of a transport + viscous hydrodynamics hybrid model. We find the most likely value of η /s to be larger at smaller √{sN N}, although the uncertainties still allow for a constant value between 0.10 and 0.15 for the investigated collision energy range.
How precise can atoms of a nanocluster be located in 3D using a tilt series of scanning transmission electron microscopy images?

PubMed

Alania, M; De Backer, A; Lobato, I; Krause, F F; Van Dyck, D; Rosenauer, A; Van Aert, S

2017-10-01

In this paper, we investigate how precise atoms of a small nanocluster can ultimately be located in three dimensions (3D) from a tilt series of images acquired using annular dark field (ADF) scanning transmission electron microscopy (STEM). Therefore, we derive an expression for the statistical precision with which the 3D atomic position coordinates can be estimated in a quantitative analysis. Evaluating this statistical precision as a function of the microscope settings also allows us to derive the optimal experimental design. In this manner, the optimal angular tilt range, required electron dose, optimal detector angles, and number of projection images can be determined. Copyright © 2016 Elsevier B.V. All rights reserved.
Statistical Emulator for Expensive Classification Simulators

NASA Technical Reports Server (NTRS)

Ross, Jerret; Samareh, Jamshid A.

2016-01-01

Expensive simulators prevent any kind of meaningful analysis to be performed on the phenomena they model. To get around this problem the concept of using a statistical emulator as a surrogate representation of the simulator was introduced in the 1980's. Presently, simulators have become more and more complex and as a result running a single example on these simulators is very expensive and can take days to weeks or even months. Many new techniques have been introduced, termed criteria, which sequentially select the next best (most informative to the emulator) point that should be run on the simulator. These criteria methods allow for the creation of an emulator with only a small number of simulator runs. We follow and extend this framework to expensive classification simulators.
Extending local canonical correlation analysis to handle general linear contrasts for FMRI data.

PubMed

Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar

2012-01-01

Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.
Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data

PubMed Central

Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar

2012-01-01

Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic. PMID:22461786
Student Achievement in Undergraduate Statistics: The Potential Value of Allowing Failure

ERIC Educational Resources Information Center

Ferrandino, Joseph A.

2016-01-01

This article details what resulted when I re-designed my undergraduate statistics course to allow failure as a learning strategy and focused on achievement rather than performance. A variety of within and between sample t-tests are utilized to determine the impact of unlimited test and quiz opportunities on student learning on both quizzes and…
Breath Analysis as a Potential and Non-Invasive Frontier in Disease Diagnosis: An Overview

PubMed Central

Pereira, Jorge; Porto-Figueira, Priscilla; Cavaco, Carina; Taunk, Khushman; Rapole, Srikanth; Dhakne, Rahul; Nagarajaram, Hampapathalu; Câmara, José S.

2015-01-01

Currently, a small number of diseases, particularly cardiovascular (CVDs), oncologic (ODs), neurodegenerative (NDDs), chronic respiratory diseases, as well as diabetes, form a severe burden to most of the countries worldwide. Hence, there is an urgent need for development of efficient diagnostic tools, particularly those enabling reliable detection of diseases, at their early stages, preferably using non-invasive approaches. Breath analysis is a non-invasive approach relying only on the characterisation of volatile composition of the exhaled breath (EB) that in turn reflects the volatile composition of the bloodstream and airways and therefore the status and condition of the whole organism metabolism. Advanced sampling procedures (solid-phase and needle traps microextraction) coupled with modern analytical technologies (proton transfer reaction mass spectrometry, selected ion flow tube mass spectrometry, ion mobility spectrometry, e-noses, etc.) allow the characterisation of EB composition to an unprecedented level. However, a key challenge in EB analysis is the proper statistical analysis and interpretation of the large and heterogeneous datasets obtained from EB research. There is no standard statistical framework/protocol yet available in literature that can be used for EB data analysis towards discovery of biomarkers for use in a typical clinical setup. Nevertheless, EB analysis has immense potential towards development of biomarkers for the early disease diagnosis of diseases. PMID:25584743
The Statistical Value of Raw Fluorescence Signal in Luminex xMAP Based Multiplex Immunoassays

PubMed Central

Breen, Edmond J.; Tan, Woei; Khan, Alamgir

2016-01-01

Tissue samples (plasma, saliva, serum or urine) from 169 patients classified as either normal or having one of seven possible diseases are analysed across three 96-well plates for the presences of 37 analytes using cytokine inflammation multiplexed immunoassay panels. Censoring for concentration data caused problems for analysis of the low abundant analytes. Using fluorescence analysis over concentration based analysis allowed analysis of these low abundant analytes. Mixed-effects analysis on the resulting fluorescence and concentration responses reveals a combination of censoring and mapping the fluorescence responses to concentration values, through a 5PL curve, changed observed analyte concentrations. Simulation verifies this, by showing a dependence on the mean florescence response and its distribution on the observed analyte concentration levels. Differences from normality, in the fluorescence responses, can lead to differences in concentration estimates and unreliable probabilities for treatment effects. It is seen that when fluorescence responses are normally distributed, probabilities of treatment effects for fluorescence based t-tests has greater statistical power than the same probabilities from concentration based t-tests. We add evidence that the fluorescence response, unlike concentration values, doesn’t require censoring and we show with respect to differential analysis on the fluorescence responses that background correction is not required. PMID:27243383
Breath analysis as a potential and non-invasive frontier in disease diagnosis: an overview.

PubMed

Pereira, Jorge; Porto-Figueira, Priscilla; Cavaco, Carina; Taunk, Khushman; Rapole, Srikanth; Dhakne, Rahul; Nagarajaram, Hampapathalu; Câmara, José S

2015-01-09

Currently, a small number of diseases, particularly cardiovascular (CVDs), oncologic (ODs), neurodegenerative (NDDs), chronic respiratory diseases, as well as diabetes, form a severe burden to most of the countries worldwide. Hence, there is an urgent need for development of efficient diagnostic tools, particularly those enabling reliable detection of diseases, at their early stages, preferably using non-invasive approaches. Breath analysis is a non-invasive approach relying only on the characterisation of volatile composition of the exhaled breath (EB) that in turn reflects the volatile composition of the bloodstream and airways and therefore the status and condition of the whole organism metabolism. Advanced sampling procedures (solid-phase and needle traps microextraction) coupled with modern analytical technologies (proton transfer reaction mass spectrometry, selected ion flow tube mass spectrometry, ion mobility spectrometry, e-noses, etc.) allow the characterisation of EB composition to an unprecedented level. However, a key challenge in EB analysis is the proper statistical analysis and interpretation of the large and heterogeneous datasets obtained from EB research. There is no standard statistical framework/protocol yet available in literature that can be used for EB data analysis towards discovery of biomarkers for use in a typical clinical setup. Nevertheless, EB analysis has immense potential towards development of biomarkers for the early disease diagnosis of diseases.
Time-Gated Raman Spectroscopy for Quantitative Determination of Solid-State Forms of Fluorescent Pharmaceuticals.

PubMed

Lipiäinen, Tiina; Pessi, Jenni; Movahedi, Parisa; Koivistoinen, Juha; Kurki, Lauri; Tenhunen, Mari; Yliruusi, Jouko; Juppo, Anne M; Heikkonen, Jukka; Pahikkala, Tapio; Strachan, Clare J

2018-04-03

Raman spectroscopy is widely used for quantitative pharmaceutical analysis, but a common obstacle to its use is sample fluorescence masking the Raman signal. Time-gating provides an instrument-based method for rejecting fluorescence through temporal resolution of the spectral signal and allows Raman spectra of fluorescent materials to be obtained. An additional practical advantage is that analysis is possible in ambient lighting. This study assesses the efficacy of time-gated Raman spectroscopy for the quantitative measurement of fluorescent pharmaceuticals. Time-gated Raman spectroscopy with a 128 × (2) × 4 CMOS SPAD detector was applied for quantitative analysis of ternary mixtures of solid-state forms of the model drug, piroxicam (PRX). Partial least-squares (PLS) regression allowed quantification, with Raman-active time domain selection (based on visual inspection) improving performance. Model performance was further improved by using kernel-based regularized least-squares (RLS) regression with greedy feature selection in which the data use in both the Raman shift and time dimensions was statistically optimized. Overall, time-gated Raman spectroscopy, especially with optimized data analysis in both the spectral and time dimensions, shows potential for sensitive and relatively routine quantitative analysis of photoluminescent pharmaceuticals during drug development and manufacturing.
Fast Quantitative Analysis Of Museum Objects Using Laser-Induced Breakdown Spectroscopy And Multiple Regression Algorithms

NASA Astrophysics Data System (ADS)

Lorenzetti, G.; Foresta, A.; Palleschi, V.; Legnaioli, S.

2009-09-01

The recent development of mobile instrumentation, specifically devoted to in situ analysis and study of museum objects, allows the acquisition of many LIBS spectra in very short time. However, such large amount of data calls for new analytical approaches which would guarantee a prompt analysis of the results obtained. In this communication, we will present and discuss the advantages of statistical analytical methods, such as Partial Least Squares Multiple Regression algorithms vs. the classical calibration curve approach. PLS algorithms allows to obtain in real time the information on the composition of the objects under study; this feature of the method, compared to the traditional off-line analysis of the data, is extremely useful for the optimization of the measurement times and number of points associated with the analysis. In fact, the real time availability of the compositional information gives the possibility of concentrating the attention on the most `interesting' parts of the object, without over-sampling the zones which would not provide useful information for the scholars or the conservators. Some example on the applications of this method will be presented, including the studies recently performed by the researcher of the Applied Laser Spectroscopy Laboratory on museum bronze objects.
Hyperspectral imaging coupled with chemometric analysis for non-invasive differentiation of black pens

NASA Astrophysics Data System (ADS)

Chlebda, Damian K.; Majda, Alicja; Łojewski, Tomasz; Łojewska, Joanna

2016-11-01

Differentiation of the written text can be performed with a non-invasive and non-contact tool that connects conventional imaging methods with spectroscopy. Hyperspectral imaging (HSI) is a relatively new and rapid analytical technique that can be applied in forensic science disciplines. It allows an image of the sample to be acquired, with full spectral information within every pixel. For this paper, HSI and three statistical methods (hierarchical cluster analysis, principal component analysis, and spectral angle mapper) were used to distinguish between traces of modern black gel pen inks. Non-invasiveness and high efficiency are among the unquestionable advantages of ink differentiation using HSI. It is also less time-consuming than traditional methods such as chromatography. In this study, a set of 45 modern gel pen ink marks deposited on a paper sheet were registered. The spectral characteristics embodied in every pixel were extracted from an image and analysed using statistical methods, externally and directly on the hypercube. As a result, different black gel inks deposited on paper can be distinguished and classified into several groups, in a non-invasive manner.
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

PubMed

Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

2015-10-01

Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Analysis of categorical moderators in mixed-effects meta-analysis: Consequences of using pooled versus separate estimates of the residual between-studies variances.

PubMed

Rubio-Aparicio, María; Sánchez-Meca, Julio; López-López, José Antonio; Botella, Juan; Marín-Martínez, Fulgencio

2017-11-01

Subgroup analyses allow us to examine the influence of a categorical moderator on the effect size in meta-analysis. We conducted a simulation study using a dichotomous moderator, and compared the impact of pooled versus separate estimates of the residual between-studies variance on the statistical performance of the Q B (P) and Q B (S) tests for subgroup analyses assuming a mixed-effects model. Our results suggested that similar performance can be expected as long as there are at least 20 studies and these are approximately balanced across categories. Conversely, when subgroups were unbalanced, the practical consequences of having heterogeneous residual between-studies variances were more evident, with both tests leading to the wrong statistical conclusion more often than in the conditions with balanced subgroups. A pooled estimate should be preferred for most scenarios, unless the residual between-studies variances are clearly different and there are enough studies in each category to obtain precise separate estimates. © 2017 The British Psychological Society.
Data Analysis & Statistical Methods for Command File Errors

NASA Technical Reports Server (NTRS)

Meshkat, Leila; Waggoner, Bruce; Bryant, Larry

2014-01-01

This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.

A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

PubMed

Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

2018-03-15

RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .
Topographic ERP analyses: a step-by-step tutorial review.

PubMed

Murray, Micah M; Brunet, Denis; Michel, Christoph M

2008-06-01

In this tutorial review, we detail both the rationale for as well as the implementation of a set of analyses of surface-recorded event-related potentials (ERPs) that uses the reference-free spatial (i.e. topographic) information available from high-density electrode montages to render statistical information concerning modulations in response strength, latency, and topography both between and within experimental conditions. In these and other ways these topographic analysis methods allow the experimenter to glean additional information and neurophysiologic interpretability beyond what is available from canonical waveform analyses. In this tutorial we present the example of somatosensory evoked potentials (SEPs) in response to stimulation of each hand to illustrate these points. For each step of these analyses, we provide the reader with both a conceptual and mathematical description of how the analysis is carried out, what it yields, and how to interpret its statistical outcome. We show that these topographic analysis methods are intuitive and easy-to-use approaches that can remove much of the guesswork often confronting ERP researchers and also assist in identifying the information contained within high-density ERP datasets.
Length bias correction in gene ontology enrichment analysis using logistic regression.

PubMed

Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

2012-01-01

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Monte Carlo investigation of thrust imbalance of solid rocket motor pairs

NASA Technical Reports Server (NTRS)

Sforzini, R. H.; Foster, W. A., Jr.

1976-01-01

The Monte Carlo method of statistical analysis is used to investigate the theoretical thrust imbalance of pairs of solid rocket motors (SRMs) firing in parallel. Sets of the significant variables are selected using a random sampling technique and the imbalance calculated for a large number of motor pairs using a simplified, but comprehensive, model of the internal ballistics. The treatment of burning surface geometry allows for the variations in the ovality and alignment of the motor case and mandrel as well as those arising from differences in the basic size dimensions and propellant properties. The analysis is used to predict the thrust-time characteristics of 130 randomly selected pairs of Titan IIIC SRMs. A statistical comparison of the results with test data for 20 pairs shows the theory underpredicts the standard deviation in maximum thrust imbalance by 20% with variability in burning times matched within 2%. The range in thrust imbalance of Space Shuttle type SRM pairs is also estimated using applicable tolerances and variabilities and a correction factor based on the Titan IIIC analysis.
Evaluation of centrifuged bone marrow on bone regeneration around implants in rabbit tibia.

PubMed

Betoni, Walter; Queiroz, Thallita P; Luvizuto, Eloá R; Valentini-Neto, Rodolpho; Garcia-Júnior, Idelmo R; Bernabé, Pedro F E

2012-12-01

To evaluate the bone regeneration of cervical defects produced around titanium implants filled with blood clot and filled with centrifuged bone marrow (CBM) by means of histomorphometric analysis. Twelve rabbits received 2 titanium implants in each right tibia, with the upper cortical prepared with a 5-mm drill and the lower cortex with a 3-mm-diameter drill. Euthanasia was performed to allow analysis at 7, 21, and 60 days after operation. The samples were embedded in light curing resin, cut and stained with alizarin red and Stevenel blue for a histomorphometric analysis of the bone-to-implant contact (BIC) and the bone area around implant (BA). The values obtained were statistically analyzed using the nonparametric Kruskal-Wallis test (P = 0.05). At 60 days postoperation, the groups had their cervical defects completely filled by neoformed bone tissue. There was no statistically significant difference between the groups regarding BIC and BA during the analyzed periods. There was no difference in the bone repair of periimplant cervical defects with or without the use of CBM.
[Changes in cerebrospinal fluid in patients with tuberculosis of the central nervous system].

PubMed

Jedrychowski, Michał; Garlicki, Aleksander

2008-01-01

The aim of the study was to analyze the parameters of the cerebrospinal fluid in patients with tuberculosis of the central nervous system confirmed by culture or molecular methods, in comparison to patients without such confirmation. The analysis of medical documentation of 13 patients with CNS tuberculosis, 10 male and 3 female who were hospitalized at the Clinic of Infectious Diseases in Kraków in years 2001-2006 was performed. Following parameters of the cerebrospinal fluid were taken into account in both groups of patients: cytologic analysis, protein, glucose and chloride concentration. Statistical analysis was done using the non-parametric Mann-Whitney U test. The only parameter for which statistically significant difference between the two groups of patients was found was the level of glucose in CSF (p<0.05). Lower glucose concentration was observed in the group with etiologically confirmed CNS tuberculosis. Moreover additional localisation of tuberculosis was observed in this group of patients. Introduction of the molecular biology methods in diagnosis allowed to detect the etiologic factor more often.
Analysis of Climatic and Environmental Changes Using CLEARS Web-GIS Information-Computational System: Siberia Case Study

NASA Astrophysics Data System (ADS)

Titov, A. G.; Gordov, E. P.; Okladnikov, I.; Shulgina, T. M.

2011-12-01

Analysis of recent climatic and environmental changes in Siberia performed on the basis of the CLEARS (CLimate and Environment Analysis and Research System) information-computational system is presented. The system was developed using the specialized software framework for rapid development of thematic information-computational systems based on Web-GIS technologies. It comprises structured environmental datasets, computational kernel, specialized web portal implementing web mapping application logic, and graphical user interface. Functional capabilities of the system include a number of procedures for mathematical and statistical analysis, data processing and visualization. At present a number of georeferenced datasets is available for processing including two editions of NCEP/NCAR Reanalysis, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 and ERA Interim Reanalysis, meteorological observation data for the territory of the former USSR, and others. Firstly, using functionality of the computational kernel employing approved statistical methods it was shown that the most reliable spatio-temporal characteristics of surface temperature and precipitation in Siberia in the second half of 20th and beginning of 21st centuries are provided by ERA-40/ERA Interim Reanalysis and APHRODITE JMA Reanalysis, respectively. Namely those Reanalyses are statistically consistent with reliable in situ meteorological observations. Analysis of surface temperature and precipitation dynamics for the territory of Siberia performed on the base of the developed information-computational system reveals fine spatial and temporal details in heterogeneous patterns obtained for the region earlier. Dynamics of bioclimatic indices determining climate change impact on structure and functioning of regional vegetation cover was investigated as well. Analysis shows significant positive trends of growing season length accompanied by statistically significant increase of sum of growing degree days and total annual precipitation over the south of Western Siberia. In particular, we conclude that analysis of trends of growing season length, sum of growing degree-days and total precipitation during the growing season reveals a tendency to an increase of vegetation ecosystems productivity across the south of Western Siberia (55°-60°N, 59°-84°E) in the past several decades. The developed system functionality providing instruments for comparison of modeling and observational data and for reliable climatological analysis allowed us to obtain new results characterizing regional manifestations of global change. It should be added that each analysis performed using the system leads also to generation of the archive of spatio-temporal data fields ready for subsequent usage by other specialists. In particular, the archive of bioclimatic indices obtained will allow performing further detailed studies of interrelations between local climate and vegetation cover changes, including changes of carbon uptake related to variations of types and amount of vegetation and spatial shift of vegetation zones. This work is partially supported by RFBR grants #10-07-00547 and #11-05-01190-a, SB RAS Basic Program Projects 4.31.1.5 and 4.31.2.7.
PubMed Central

PASSALI, D.; CARUSO, G.; ARIGLIANO, L.C.; PASSALI, F.M.; BELLUSSI, L.

2012-01-01

SUMMARY Obstructive sleep apnoea syndrome (OSAS) results from upper airway collapse during sleep. It represents an increasingly recognized pathology associated with many diseases. Herein, we describe a database for patients with OSAS. This has different goals: to facilitate good uniformity in clinical assessment, to allow the use of the application even by non-ENT specialists, to evaluate the results of medical and/or surgical treatments and to enable a statistical meta-analysis derived from the data collected in many OSAS medical centres. PMID:23093815
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chertkov, Michael; Turitsyn, Konstantin; Sulc, Petr

The anticipated increase in the number of plug-in electric vehicles (EV) will put additional strain on electrical distribution circuits. Many control schemes have been proposed to control EV charging. Here, we develop control algorithms based on randomized EV charging start times and simple one-way broadcast communication allowing for a time delay between communication events. Using arguments from queuing theory and statistical analysis, we seek to maximize the utilization of excess distribution circuit capacity while keeping the probability of a circuit overload negligible.
Statistical Analysis of Physiological Signals

NASA Astrophysics Data System (ADS)

Ruiz, María G.; Pérez, Leticia

2003-07-01

In spite of two hundred years of clinical practice, Homeopathy still lacks of scientific basis. Its fundamental laws, similia principle and the activity of the denominated ultra-high dilutions are controversial issues that do not fit into the mainstream medicine or current physical-chemistry field as well. Aside its clinical efficacy, the identification of physical - chemistry parameters, as markers of the homeopathic effect, would allow to construct mathematic models [1], which in turn, could provide clues regarding the involved mechanism.
StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics

PubMed Central

Ramirez-Gonzalez, Ricardo H.; Leggett, Richard M.; Waite, Darren; Thanki, Anil; Drou, Nizar; Caccamo, Mario; Davey, Robert

2014-01-01

Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. PMID:24627795
Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature

PubMed Central

2011-01-01

Background Meta-analysis is a popular methodology in several fields of medical research, including genetic association studies. However, the methods used for meta-analysis of association studies that report haplotypes have not been studied in detail. In this work, methods for performing meta-analysis of haplotype association studies are summarized, compared and presented in a unified framework along with an empirical evaluation of the literature. Results We present multivariate methods that use summary-based data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here avoid the inflation of the type I error rate that could be the result of the traditional approach of comparing a haplotype against the remaining ones, whereas, they can be fitted using standard software. Moreover, formal global tests are presented for assessing the statistical significance of the overall association. Although the methods presented here assume that the haplotypes are directly observed, they can be easily extended to allow for such an uncertainty by weighting the haplotypes by their probability. Conclusions An empirical evaluation of the published literature and a comparison against the meta-analyses that use single nucleotide polymorphisms, suggests that the studies reporting meta-analysis of haplotypes contain approximately half of the included studies and produce significant results twice more often. We show that this excess of statistically significant results, stems from the sub-optimal method of analysis used and, in approximately half of the cases, the statistical significance is refuted if the data are properly re-analyzed. Illustrative examples of code are given in Stata and it is anticipated that the methods developed in this work will be widely applied in the meta-analysis of haplotype association studies. PMID:21247440
Network meta-analysis: a technique to gather evidence from direct and indirect comparisons

PubMed Central

2017-01-01

Systematic reviews and pairwise meta-analyses of randomized controlled trials, at the intersection of clinical medicine, epidemiology and statistics, are positioned at the top of evidence-based practice hierarchy. These are important tools to base drugs approval, clinical protocols and guidelines formulation and for decision-making. However, this traditional technique only partially yield information that clinicians, patients and policy-makers need to make informed decisions, since it usually compares only two interventions at the time. In the market, regardless the clinical condition under evaluation, usually many interventions are available and few of them have been studied in head-to-head studies. This scenario precludes conclusions to be drawn from comparisons of all interventions profile (e.g. efficacy and safety). The recent development and introduction of a new technique – usually referred as network meta-analysis, indirect meta-analysis, multiple or mixed treatment comparisons – has allowed the estimation of metrics for all possible comparisons in the same model, simultaneously gathering direct and indirect evidence. Over the last years this statistical tool has matured as technique with models available for all types of raw data, producing different pooled effect measures, using both Frequentist and Bayesian frameworks, with different software packages. However, the conduction, report and interpretation of network meta-analysis still poses multiple challenges that should be carefully considered, especially because this technique inherits all assumptions from pairwise meta-analysis but with increased complexity. Thus, we aim to provide a basic explanation of network meta-analysis conduction, highlighting its risks and benefits for evidence-based practice, including information on statistical methods evolution, assumptions and steps for performing the analysis. PMID:28503228
Improving the analysis of slug tests

USGS Publications Warehouse

McElwee, C.D.

2002-01-01

This paper examines several techniques that have the potential to improve the quality of slug test analysis. These techniques are applicable in the range from low hydraulic conductivities with overdamped responses to high hydraulic conductivities with nonlinear oscillatory responses. Four techniques for improving slug test analysis will be discussed: use of an extended capability nonlinear model, sensitivity analysis, correction for acceleration and velocity effects, and use of multiple slug tests. The four-parameter nonlinear slug test model used in this work is shown to allow accurate analysis of slug tests with widely differing character. The parameter ?? represents a correction to the water column length caused primarily by radius variations in the wellbore and is most useful in matching the oscillation frequency and amplitude. The water column velocity at slug initiation (V0) is an additional model parameter, which would ideally be zero but may not be due to the initiation mechanism. The remaining two model parameters are A (parameter for nonlinear effects) and K (hydraulic conductivity). Sensitivity analysis shows that in general ?? and V0 have the lowest sensitivity and K usually has the highest. However, for very high K values the sensitivity to A may surpass the sensitivity to K. Oscillatory slug tests involve higher accelerations and velocities of the water column; thus, the pressure transducer responses are affected by these factors and the model response must be corrected to allow maximum accuracy for the analysis. The performance of multiple slug tests will allow some statistical measure of the experimental accuracy and of the reliability of the resulting aquifer parameters. ?? 2002 Elsevier Science B.V. All rights reserved.
Game Related Statistics Which Discriminate Between Winning and Losing Under-16 Male Basketball Games

PubMed Central

Lorenzo, Alberto; Gómez, Miguel Ángel; Ortega, Enrique; Ibáñez, Sergio José; Sampaio, Jaime

2010-01-01

The aim of the present study was to identify the game-related statistics which discriminate between winning and losing teams in under-16 years old male basketball games. The sample gathered all 122 games in the 2004 and 2005 Under-16 European Championships. The game-related statistics analysed were the free-throws (both successful and unsuccessful), 2- and 3-points field-goals (both successful and unsuccessful) offensive and defensive rebounds, blocks, assists, fouls, turnovers and steals. The winning teams exhibited lower ball possessions per game and better offensive and defensive efficacy coefficients than the losing teams. Results from discriminant analysis were statistically significant and allowed to emphasize several structure coefficients (SC). In close games (final score differences below 9 points), the discriminant variables were the turnovers (SC = -0.47) and the assists (SC = 0.33). In balanced games (final score differences between 10 and 29 points), the variables that discriminated between the groups were the successful 2-point field-goals (SC = -0.34) and defensive rebounds (SC = -0. 36); and in unbalanced games (final score differences above 30 points) the variables that best discriminated both groups were the successful 2-point field-goals (SC = 0.37). These results allowed understanding that these players' specific characteristics result in a different game-related statistical profile and helped to point out the importance of the perceptive and decision making process in practice and in competition. Key points The players' game-related statistical profile varied according to game type, game outcome and in formative categories in basketball. The results of this work help to point out the different player's performance described in U-16 men's basketball teams compared with senior and professional men's basketball teams. The results obtained enhance the importance of the perceptive and decision making process in practice and in competition. PMID:24149794
A new metaphor for projection-based visual analysis and data exploration

NASA Astrophysics Data System (ADS)

Schreck, Tobias; Panse, Christian

2007-01-01

In many important application domains such as Business and Finance, Process Monitoring, and Security, huge and quickly increasing volumes of complex data are collected. Strong efforts are underway developing automatic and interactive analysis tools for mining useful information from these data repositories. Many data analysis algorithms require an appropriate definition of similarity (or distance) between data instances to allow meaningful clustering, classification, and retrieval, among other analysis tasks. Projection-based data visualization is highly interesting (a) for visual discrimination analysis of a data set within a given similarity definition, and (b) for comparative analysis of similarity characteristics of a given data set represented by different similarity definitions. We introduce an intuitive and effective novel approach for projection-based similarity visualization for interactive discrimination analysis, data exploration, and visual evaluation of metric space effectiveness. The approach is based on the convex hull metaphor for visually aggregating sets of points in projected space, and it can be used with a variety of different projection techniques. The effectiveness of the approach is demonstrated by application on two well-known data sets. Statistical evidence supporting the validity of the hull metaphor is presented. We advocate the hull-based approach over the standard symbol-based approach to projection visualization, as it allows a more effective perception of similarity relationships and class distribution characteristics.
Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset.

PubMed

Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne

2016-01-30

Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.
RADSS: an integration of GIS, spatial statistics, and network service for regional data mining

NASA Astrophysics Data System (ADS)

Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing

2005-10-01

Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.
Ensemble of Thermostatically Controlled Loads: Statistical Physics Approach.

PubMed

Chertkov, Michael; Chernyak, Vladimir

2017-08-17

Thermostatically controlled loads, e.g., air conditioners and heaters, are by far the most widespread consumers of electricity. Normally the devices are calibrated to provide the so-called bang-bang control - changing from on to off, and vice versa, depending on temperature. We considered aggregation of a large group of similar devices into a statistical ensemble, where the devices operate following the same dynamics, subject to stochastic perturbations and randomized, Poisson on/off switching policy. Using theoretical and computational tools of statistical physics, we analyzed how the ensemble relaxes to a stationary distribution and established a relationship between the relaxation and the statistics of the probability flux associated with devices' cycling in the mixed (discrete, switch on/off, and continuous temperature) phase space. This allowed us to derive the spectrum of the non-equilibrium (detailed balance broken) statistical system and uncover how switching policy affects oscillatory trends and the speed of the relaxation. Relaxation of the ensemble is of practical interest because it describes how the ensemble recovers from significant perturbations, e.g., forced temporary switching off aimed at utilizing the flexibility of the ensemble to provide "demand response" services to change consumption temporarily to balance a larger power grid. We discuss how the statistical analysis can guide further development of the emerging demand response technology.
Ensemble of Thermostatically Controlled Loads: Statistical Physics Approach

DOE PAGES

Chertkov, Michael; Chernyak, Vladimir

2017-01-17

Thermostatically Controlled Loads (TCL), e.g. air-conditioners and heaters, are by far the most wide-spread consumers of electricity. Normally the devices are calibrated to provide the so-called bang-bang control of temperature - changing from on to off , and vice versa, depending on temperature. Aggregation of a large group of similar devices into a statistical ensemble is considered, where the devices operate following the same dynamics subject to stochastic perturbations and randomized, Poisson on/off switching policy. We analyze, using theoretical and computational tools of statistical physics, how the ensemble relaxes to a stationary distribution and establish relation between the re- laxationmore » and statistics of the probability flux, associated with devices' cycling in the mixed (discrete, switch on/off , and continuous, temperature) phase space. This allowed us to derive and analyze spec- trum of the non-equilibrium (detailed balance broken) statistical system. and uncover how switching policy affects oscillatory trend and speed of the relaxation. Relaxation of the ensemble is of a practical interest because it describes how the ensemble recovers from significant perturbations, e.g. forceful temporary switching o aimed at utilizing flexibility of the ensemble in providing "demand response" services relieving consumption temporarily to balance larger power grid. We discuss how the statistical analysis can guide further development of the emerging demand response technology.« less

Ensemble of Thermostatically Controlled Loads: Statistical Physics Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chertkov, Michael; Chernyak, Vladimir

Thermostatically Controlled Loads (TCL), e.g. air-conditioners and heaters, are by far the most wide-spread consumers of electricity. Normally the devices are calibrated to provide the so-called bang-bang control of temperature - changing from on to off , and vice versa, depending on temperature. Aggregation of a large group of similar devices into a statistical ensemble is considered, where the devices operate following the same dynamics subject to stochastic perturbations and randomized, Poisson on/off switching policy. We analyze, using theoretical and computational tools of statistical physics, how the ensemble relaxes to a stationary distribution and establish relation between the re- laxationmore » and statistics of the probability flux, associated with devices' cycling in the mixed (discrete, switch on/off , and continuous, temperature) phase space. This allowed us to derive and analyze spec- trum of the non-equilibrium (detailed balance broken) statistical system. and uncover how switching policy affects oscillatory trend and speed of the relaxation. Relaxation of the ensemble is of a practical interest because it describes how the ensemble recovers from significant perturbations, e.g. forceful temporary switching o aimed at utilizing flexibility of the ensemble in providing "demand response" services relieving consumption temporarily to balance larger power grid. We discuss how the statistical analysis can guide further development of the emerging demand response technology.« less
Method of analysis of local neuronal circuits in the vertebrate central nervous system.

PubMed

Reinis, S; Weiss, D S; McGaraughty, S; Tsoukatos, J

1992-06-01

Although a considerable amount of knowledge has been accumulated about the activity of individual nerve cells in the brain, little is known about their mutual interactions at the local level. The method presented in this paper allows the reconstruction of functional relations within a group of neurons as recorded by a single microelectrode. Data are sampled at 10 or 13 kHz. Prominent spikes produced by one or more single cells are selected and sorted by K-means cluster analysis. The activities of single cells are then related to the background firing of neurons in their vicinity. Auto-correlograms of the leading cells, auto-correlograms of the background cells (mass correlograms) and cross-correlograms between these two levels of firing are computed and evaluated. The statistical probability of mutual interactions is determined, and the statistically significant, most common interspike intervals are stored and attributed to real pairs of spikes in the original record. Selected pairs of spikes, characterized by statistically significant intervals between them, are then assembled into a working model of the system. This method has revealed substantial differences between the information processing in the visual cortex, the inferior colliculus, the rostral ventromedial medulla and the ventrobasal complex of the thalamus. Even short 1-s records of the multiple neuronal activity may provide meaningful and statistically significant results.
Highly Efficient Design-of-Experiments Methods for Combining CFD Analysis and Experimental Data

NASA Technical Reports Server (NTRS)

Anderson, Bernhard H.; Haller, Harold S.

2009-01-01

It is the purpose of this study to examine the impact of "highly efficient" Design-of-Experiments (DOE) methods for combining sets of CFD generated analysis data with smaller sets of Experimental test data in order to accurately predict performance results where experimental test data were not obtained. The study examines the impact of micro-ramp flow control on the shock wave boundary layer (SWBL) interaction where a complete paired set of data exist from both CFD analysis and Experimental measurements By combining the complete set of CFD analysis data composed of fifteen (15) cases with a smaller subset of experimental test data containing four/five (4/5) cases, compound data sets (CFD/EXP) were generated which allows the prediction of the complete set of Experimental results No statistical difference were found to exist between the combined (CFD/EXP) generated data sets and the complete Experimental data set composed of fifteen (15) cases. The same optimal micro-ramp configuration was obtained using the (CFD/EXP) generated data as obtained with the complete set of Experimental data, and the DOE response surfaces generated by the two data sets were also not statistically different.
Improving Accuracy and Temporal Resolution of Learning Curve Estimation for within- and across-Session Analysis

PubMed Central

Tabelow, Karsten; König, Reinhard; Polzehl, Jörg

2016-01-01

Estimation of learning curves is ubiquitously based on proportions of correct responses within moving trial windows. Thereby, it is tacitly assumed that learning performance is constant within the moving windows, which, however, is often not the case. In the present study we demonstrate that violations of this assumption lead to systematic errors in the analysis of learning curves, and we explored the dependency of these errors on window size, different statistical models, and learning phase. To reduce these errors in the analysis of single-subject data as well as on the population level, we propose adequate statistical methods for the estimation of learning curves and the construction of confidence intervals, trial by trial. Applied to data from an avoidance learning experiment with rodents, these methods revealed performance changes occurring at multiple time scales within and across training sessions which were otherwise obscured in the conventional analysis. Our work shows that the proper assessment of the behavioral dynamics of learning at high temporal resolution can shed new light on specific learning processes, and, thus, allows to refine existing learning concepts. It further disambiguates the interpretation of neurophysiological signal changes recorded during training in relation to learning. PMID:27303809
A statistical analysis of cervical auscultation signals from adults with unsafe airway protection.

PubMed

Dudik, Joshua M; Kurosu, Atsuko; Coyle, James L; Sejdić, Ervin

2016-01-22

Aspiration, where food or liquid is allowed to enter the larynx during a swallow, is recognized as the most clinically salient feature of oropharyngeal dysphagia. This event can lead to short-term harm via airway obstruction or more long-term effects such as pneumonia. In order to non-invasively identify this event using high resolution cervical auscultation there is a need to characterize cervical auscultation signals from subjects with dysphagia who aspirate. In this study, we collected swallowing sound and vibration data from 76 adults (50 men, 26 women, mean age 62) who underwent a routine videofluoroscopy swallowing examination. The analysis was limited to swallows of liquid with either thin (<5 cps) or viscous (≈300 cps) consistency and was divided into those with deep laryngeal penetration or aspiration (unsafe airway protection), and those with either shallow or no laryngeal penetration (safe airway protection), using a standardized scale. After calculating a selection of time, frequency, and time-frequency features for each swallow, the safe and unsafe categories were compared using Wilcoxon rank-sum statistical tests. Our analysis found that few of our chosen features varied in magnitude between safe and unsafe swallows with thin swallows demonstrating no statistical variation. We also supported our past findings with regard to the effects of sex and the presence or absence of stroke on cervical ausculation signals, but noticed certain discrepancies with regards to bolus viscosity. Overall, our results support the necessity of using multiple statistical features concurrently to identify laryngeal penetration of swallowed boluses in future work with high resolution cervical auscultation.
Analysis of the economic structure of the eating-out sector: The case of Spain.

PubMed

Cabiedes-Miragaya, Laura

2017-12-01

The objective of this article is to analyse the structure of the Spanish eating-out sector from an economic point of view, and more specifically, from the supply perspective. This aspect has been studied less than the demand side, almost certainly due to the gaps which exist in available official statistics in Spain, and which have been filled basically with consumer surveys. For this reason, focus is also placed on the economic relevance of the sector and attention is drawn to the serious shortcomings regarding official statistics in this domain, in contrast to the priority that hotel industry statistics have traditionally received in Spain. Based on official statistics, a descriptive analysis was carried out, focused mainly, though not exclusively, on diverse structural aspects of the sector. Special emphasis was placed on issues such as business demography (for instance, number and types of enterprises, survival rates, size distribution, and age structure), market concentration and structure of costs. Among other conclusions, the analysis allowed us to conclude that: part of the sector is more concentrated than it may at first appear to be; the dual structure of the sector described by the literature in relation to other countries is also present in the Spanish case; and the impact of ICTs (Information and Communication Technologies) on the sector are, and will foreseeably continue to be, particularly relevant. The main conclusion of this study refers to the fact that consumers have gained prominence in their contribution to shaping the structure of the sector. Copyright © 2017 Elsevier Ltd. All rights reserved.
Development of a Front Tracking Method for Two-Phase Micromixing of Incompressible Viscous Fluids with Interfacial Tension in Solvent Extraction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Yijie; Lim, Hyun-Kyung; de Almeida, Valmor F

2012-06-01

This progress report describes the development of a front tracking method for the solution of the governing equations of motion for two-phase micromixing of incompressible, viscous, liquid-liquid solvent extraction processes. The ability to compute the detailed local interfacial structure of the mixture allows characterization of the statistical properties of the two-phase mixture in terms of droplets, filaments, and other structures which emerge as a dispersed phase embedded into a continuous phase. Such a statistical picture provides the information needed for building a consistent coarsened model applicable to the entire mixing device. Coarsening is an undertaking for a future mathematical developmentmore » and is outside the scope of the present work. We present here a method for accurate simulation of the micromixing dynamics of an aqueous and an organic phase exposed to intense centrifugal force and shearing stress. The onset of mixing is the result of the combination of the classical Rayleigh- Taylor and Kelvin-Helmholtz instabilities. A mixing environment that emulates a sector of the annular mixing zone of a centrifugal contactor is used for the mathematical domain. The domain is small enough to allow for resolution of the individual interfacial structures and large enough to allow for an analysis of their statistical distribution of sizes and shapes. A set of accurate algorithms for this application requires an advanced front tracking approach constrained by the incompressibility condition. This research is aimed at designing and implementing these algorithms. We demonstrate verification and convergence results for one-phase and unmixed, two-phase flows. In addition we report on preliminary results for mixed, two-phase flow for realistic operating flow parameters.« less
Geostatistics - a tool applied to the distribution of Legionella pneumophila in a hospital water system.

PubMed

Laganà, Pasqualina; Moscato, Umberto; Poscia, Andrea; La Milia, Daniele Ignazio; Boccia, Stefania; Avventuroso, Emanuela; Delia, Santi

2015-01-01

Legionnaires' disease is normally acquired by inhalation of legionellae from a contaminated environmental source. Water systems of large buildings, such as hospitals, are often contaminated with legionellae and therefore represent a potential risk for the hospital population. The aim of this study was to evaluate the potential contamination of Legionella pneumophila (LP) in a large hospital in Italy through georeferential statistical analysis to assess the possible sources of dispersion and, consequently, the risk of exposure for both health care staff and patients. LP serogroups 1 and 2-14 distribution was considered in the wards housed on two consecutive floors of the hospital building. On the basis of information provided by 53 bacteriological analysis, a 'random' grid of points was chosen and spatial geostatistics or FAIk Kriging was applied and compared with the results of classical statistical analysis. Over 50% of the examined samples were positive for Legionella pneumophila. LP 1 was isolated in 69% of samples from the ground floor and in 60% of sample from the first floor; LP 2-14 in 36% of sample from the ground floor and 24% from the first. The iso-estimation maps show clearly the most contaminated pipe and the difference in the diffusion of the different L. pneumophila serogroups. Experimental work has demonstrated that geostatistical methods applied to the microbiological analysis of water matrices allows a better modeling of the phenomenon under study, a greater potential for risk management and a greater choice of methods of prevention and environmental recovery to be put in place with respect to the classical statistical analysis.
A functional U-statistic method for association analysis of sequencing data.

PubMed

Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

2017-11-01

Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.
Only pick the right grains: Modelling the bias due to subjective grain-size interval selection for chronometric and fingerprinting approaches.

NASA Astrophysics Data System (ADS)

Dietze, Michael; Fuchs, Margret; Kreutzer, Sebastian

2016-04-01

Many modern approaches of radiometric dating or geochemical fingerprinting rely on sampling sedimentary deposits. A key assumption of most concepts is that the extracted grain-size fraction of the sampled sediment adequately represents the actual process to be dated or the source area to be fingerprinted. However, these assumptions are not always well constrained. Rather, they have to align with arbitrary, method-determined size intervals, such as "coarse grain" or "fine grain" with partly even different definitions. Such arbitrary intervals violate principal process-based concepts of sediment transport and can thus introduce significant bias to the analysis outcome (i.e., a deviation of the measured from the true value). We present a flexible numerical framework (numOlum) for the statistical programming language R that allows quantifying the bias due to any given analysis size interval for different types of sediment deposits. This framework is applied to synthetic samples from the realms of luminescence dating and geochemical fingerprinting, i.e. a virtual reworked loess section. We show independent validation data from artificially dosed and subsequently mixed grain-size proportions and we present a statistical approach (end-member modelling analysis, EMMA) that allows accounting for the effect of measuring the compound dosimetric history or geochemical composition of a sample. EMMA separates polymodal grain-size distributions into the underlying transport process-related distributions and their contribution to each sample. These underlying distributions can then be used to adjust grain-size preparation intervals to minimise the incorporation of "undesired" grain-size fractions.
Statistical Modeling for Radiation Hardness Assurance: Toward Bigger Data

NASA Technical Reports Server (NTRS)

Ladbury, R.; Campola, M. J.

2015-01-01

New approaches to statistical modeling in radiation hardness assurance are discussed. These approaches yield quantitative bounds on flight-part radiation performance even in the absence of conventional data sources. This allows the analyst to bound radiation risk at all stages and for all decisions in the RHA process. It also allows optimization of RHA procedures for the project's risk tolerance.
Longitudinal Commercial Claims-Based Cost Analysis of Diabetic Retinopathy Screening Patterns.

PubMed

Fitch, Kathryn; Weisman, Thomas; Engel, Tyler; Turpcu, Adam; Blumen, Helen; Rajput, Yamina; Dave, Purav

2015-09-01

Diabetic retinopathy is one of the most common complications of diabetes. The screening of patients with diabetes to detect retinopathy is recommended by several professional guidelines but is an underutilized service. To analyze the relationship between the frequency of retinopathy screening and the cost of care in adult patients with diabetes. Truven Health MarketScan commercial databases (2000-2013) were used to identify the diabetic population aged 18 to 64 years for the performance of a 2001-2013 annual trend analysis of patients with type 1 and type 2 diabetes and a 10-year longitudinal analysis of patients with newly diagnosed type 2 diabetes. In the trend analysis, the prevalence of diabetes, screening rate, and allowed cost per member per month (PMPM) were calculated. In the longitudinal analysis, data from 4 index years (2001-2004) of patients newly diagnosed with type 2 diabetes were combined, and the costs were adjusted to be comparable to the 2004 index year cohort, using the annual diabetes population cost trends calculated in the trend analysis. The longitudinal population was segmented into the number of years of diabetic retinopathy screening (ie, 0, 1-4, 5-7, and 8-10), and the relationship between the years of screening and the PMPM allowed costs was analyzed. The difference in mean incremental cost between years 1 and 10 in each of the 4 cohorts was compared after adjusting for explanatory variables. In the trend analysis, between 2001 and 2013, the prevalence of diabetes increased from 3.93% to 5.08%, retinal screening increased from 26.27% to 29.58%, and the average total unadjusted allowed cost of care for each patient with diabetes increased from $822 to $1395 PMPM. In the longitudinal analysis, the difference between the screening cohorts' mean incremental cost increase was $185 between the 0- and 1-4-year cohorts (P <.003) and $202 between the 0- and 5-7-year cohorts (P <.023). The cost differences between the other cohorts, including $217 between the 0- and 8-10-year cohorts (P <.066), were not statistically significant. Based on our analysis, the annual retinopathy screening rate for patients with diabetes has remained low since 2001, and has been well below the guideline-recommended screening levels. For patients with type 2 diabetes, the mean increase in healthcare expenditures over a 10-year period after diagnosis is not statistically different among those with various retinopathy screening rates, although the increase in healthcare spending is lower for patients with diabetes who were not screened for retinopathy compared with patients who did get screened.
iTTVis: Interactive Visualization of Table Tennis Data.

PubMed

Wu, Yingcai; Lan, Ji; Shu, Xinhuan; Ji, Chenyang; Zhao, Kejian; Wang, Jiachen; Zhang, Hui

2018-01-01

The rapid development of information technology paved the way for the recording of fine-grained data, such as stroke techniques and stroke placements, during a table tennis match. This data recording creates opportunities to analyze and evaluate matches from new perspectives. Nevertheless, the increasingly complex data poses a significant challenge to make sense of and gain insights into. Analysts usually employ tedious and cumbersome methods which are limited to watching videos and reading statistical tables. However, existing sports visualization methods cannot be applied to visualizing table tennis competitions due to different competition rules and particular data attributes. In this work, we collaborate with data analysts to understand and characterize the sophisticated domain problem of analysis of table tennis data. We propose iTTVis, a novel interactive table tennis visualization system, which to our knowledge, is the first visual analysis system for analyzing and exploring table tennis data. iTTVis provides a holistic visualization of an entire match from three main perspectives, namely, time-oriented, statistical, and tactical analyses. The proposed system with several well-coordinated views not only supports correlation identification through statistics and pattern detection of tactics with a score timeline but also allows cross analysis to gain insights. Data analysts have obtained several new insights by using iTTVis. The effectiveness and usability of the proposed system are demonstrated with four case studies.
Evaluating the efficiency of environmental monitoring programs

USGS Publications Warehouse

Levine, Carrie R.; Yanai, Ruth D.; Lampman, Gregory G.; Burns, Douglas A.; Driscoll, Charles T.; Lawrence, Gregory B.; Lynch, Jason; Schoch, Nina

2014-01-01

Statistical uncertainty analyses can be used to improve the efficiency of environmental monitoring, allowing sampling designs to maximize information gained relative to resources required for data collection and analysis. In this paper, we illustrate four methods of data analysis appropriate to four types of environmental monitoring designs. To analyze a long-term record from a single site, we applied a general linear model to weekly stream chemistry data at Biscuit Brook, NY, to simulate the effects of reducing sampling effort and to evaluate statistical confidence in the detection of change over time. To illustrate a detectable difference analysis, we analyzed a one-time survey of mercury concentrations in loon tissues in lakes in the Adirondack Park, NY, demonstrating the effects of sampling intensity on statistical power and the selection of a resampling interval. To illustrate a bootstrapping method, we analyzed the plot-level sampling intensity of forest inventory at the Hubbard Brook Experimental Forest, NH, to quantify the sampling regime needed to achieve a desired confidence interval. Finally, to analyze time-series data from multiple sites, we assessed the number of lakes and the number of samples per year needed to monitor change over time in Adirondack lake chemistry using a repeated-measures mixed-effects model. Evaluations of time series and synoptic long-term monitoring data can help determine whether sampling should be re-allocated in space or time to optimize the use of financial and human resources.
Soft Decision Analyzer

NASA Technical Reports Server (NTRS)

Steele, Glen; Lansdowne, Chatwin; Zucha, Joan; Schlensinger, Adam

2013-01-01

The Soft Decision Analyzer (SDA) is an instrument that combines hardware, firmware, and software to perform realtime closed-loop end-to-end statistical analysis of single- or dual- channel serial digital RF communications systems operating in very low signal-to-noise conditions. As an innovation, the unique SDA capabilities allow it to perform analysis of situations where the receiving communication system slips bits due to low signal-to-noise conditions or experiences constellation rotations resulting in channel polarity in versions or channel assignment swaps. SDA s closed-loop detection allows it to instrument a live system and correlate observations with frame, codeword, and packet losses, as well as Quality of Service (QoS) and Quality of Experience (QoE) events. The SDA s abilities are not confined to performing analysis in low signal-to-noise conditions. Its analysis provides in-depth insight of a communication system s receiver performance in a variety of operating conditions. The SDA incorporates two techniques for identifying slips. The first is an examination of content of the received data stream s relation to the transmitted data content and the second is a direct examination of the receiver s recovered clock signals relative to a reference. Both techniques provide benefits in different ways and allow the communication engineer evaluating test results increased confidence and understanding of receiver performance. Direct examination of data contents is performed by two different data techniques, power correlation or a modified Massey correlation, and can be applied to soft decision data widths 1 to 12 bits wide over a correlation depth ranging from 16 to 512 samples. The SDA detects receiver bit slips within a 4 bits window and can handle systems with up to four quadrants (QPSK, SQPSK, and BPSK systems). The SDA continuously monitors correlation results to characterize slips and quadrant change and is capable of performing analysis even when the receiver under test is subjected to conditions where its performance degrades to high error rates (30 percent or beyond). The design incorporates a number of features, such as watchdog triggers that permit the SDA system to recover from large receiver upsets automatically and continue accumulating performance analysis unaided by operator intervention. This accommodates tests that can last in the order of days in order to gain statistical confidence in results and is also useful for capturing snapshots of rare events.
General solution of the chemical master equation and modality of marginal distributions for hierarchic first-order reaction networks.

PubMed

Reis, Matthias; Kromer, Justus A; Klipp, Edda

2018-01-20

Multimodality is a phenomenon which complicates the analysis of statistical data based exclusively on mean and variance. Here, we present criteria for multimodality in hierarchic first-order reaction networks, consisting of catalytic and splitting reactions. Those networks are characterized by independent and dependent subnetworks. First, we prove the general solvability of the Chemical Master Equation (CME) for this type of reaction network and thereby extend the class of solvable CME's. Our general solution is analytical in the sense that it allows for a detailed analysis of its statistical properties. Given Poisson/deterministic initial conditions, we then prove the independent species to be Poisson/binomially distributed, while the dependent species exhibit generalized Poisson/Khatri Type B distributions. Generalized Poisson/Khatri Type B distributions are multimodal for an appropriate choice of parameters. We illustrate our criteria for multimodality by several basic models, as well as the well-known two-stage transcription-translation network and Bateman's model from nuclear physics. For both examples, multimodality was previously not reported.
Bonneville Power Administration Communication Alarm Processor expert system:

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goeltz, R.; Purucker, S.; Tonn, B.

This report describes the Communications Alarm Processor (CAP), a prototype expert system developed for the Bonneville Power Administration by Oak Ridge National Laboratory. The system is designed to receive and diagnose alarms from Bonneville's Microwave Communications System (MCS). The prototype encompasses one of seven branches of the communications network and a subset of alarm systems and alarm types from each system. The expert system employs a backward chaining approach to diagnosing alarms. Alarms are fed into the expert system directly from the communication system via RS232 ports and sophisticated alarm filtering and mailbox software. Alarm diagnoses are presented to operatorsmore » for their review and concurrence before the diagnoses are archived. Statistical software is incorporated to allow analysis of archived data for report generation and maintenance studies. The delivered system resides on a Digital Equipment Corporation VAX 3200 workstation and utilizes Nexpert Object and SAS for the expert system and statistical analysis, respectively. 11 refs., 23 figs., 7 tabs.« less
Nonclassical point of view of the Brownian motion generation via fractional deterministic model

NASA Astrophysics Data System (ADS)

Gilardi-Velázquez, H. E.; Campos-Cantón, E.

In this paper, we present a dynamical system based on the Langevin equation without stochastic term and using fractional derivatives that exhibit properties of Brownian motion, i.e. a deterministic model to generate Brownian motion is proposed. The stochastic process is replaced by considering an additional degree of freedom in the second-order Langevin equation. Thus, it is transformed into a system of three first-order linear differential equations, additionally α-fractional derivative are considered which allow us to obtain better statistical properties. Switching surfaces are established as a part of fluctuating acceleration. The final system of three α-order linear differential equations does not contain a stochastic term, so the system generates motion in a deterministic way. Nevertheless, from the time series analysis, we found that the behavior of the system exhibits statistics properties of Brownian motion, such as, a linear growth in time of mean square displacement, a Gaussian distribution. Furthermore, we use the detrended fluctuation analysis to prove the Brownian character of this motion.
Unbiased estimation in seamless phase II/III trials with unequal treatment effect variances and hypothesis-driven selection rules.

PubMed

Robertson, David S; Prevost, A Toby; Bowden, Jack

2016-09-30

Seamless phase II/III clinical trials offer an efficient way to select an experimental treatment and perform confirmatory analysis within a single trial. However, combining the data from both stages in the final analysis can induce bias into the estimates of treatment effects. Methods for bias adjustment developed thus far have made restrictive assumptions about the design and selection rules followed. In order to address these shortcomings, we apply recent methodological advances to derive the uniformly minimum variance conditionally unbiased estimator for two-stage seamless phase II/III trials. Our framework allows for the precision of the treatment arm estimates to take arbitrary values, can be utilised for all treatments that are taken forward to phase III and is applicable when the decision to select or drop treatment arms is driven by a multiplicity-adjusted hypothesis testing procedure. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Algorithms for constructing optimal paths and statistical analysis of passenger traffic

NASA Astrophysics Data System (ADS)

Trofimov, S. P.; Druzhinina, N. G.; Trofimova, O. G.

2018-01-01

Several existing information systems of urban passenger transport (UPT) are considered. Author’s UPT network model is presented. To a passenger a new service is offered that is the best path from one stop to another stop at a specified time. The algorithm and software implementation for finding the optimal path are presented. The algorithm uses the current UPT schedule. The article also describes the algorithm of statistical analysis of trip payments by the electronic E-cards. The algorithm allows obtaining the density of passenger traffic during the day. This density is independent of the network topology and UPT schedules. The resulting density of the traffic flow can solve a number of practical problems. In particular, the forecast for the overflow of passenger transport in the «rush» hours, the quantitative comparison of different topologies transport networks, constructing of the best UPT timetable. The efficiency of the proposed integrated approach is demonstrated by the example of the model town with arbitrary dimensions.

A Monte Carlo–Based Bayesian Approach for Measuring Agreement in a Qualitative Scale

PubMed Central

Pérez Sánchez, Carlos Javier

2014-01-01

Agreement analysis has been an active research area whose techniques have been widely applied in psychology and other fields. However, statistical agreement among raters has been mainly considered from a classical statistics point of view. Bayesian methodology is a viable alternative that allows the inclusion of subjective initial information coming from expert opinions, personal judgments, or historical data. A Bayesian approach is proposed by providing a unified Monte Carlo–based framework to estimate all types of measures of agreement in a qualitative scale of response. The approach is conceptually simple and it has a low computational cost. Both informative and non-informative scenarios are considered. In case no initial information is available, the results are in line with the classical methodology, but providing more information on the measures of agreement. For the informative case, some guidelines are presented to elicitate the prior distribution. The approach has been applied to two applications related to schizophrenia diagnosis and sensory analysis. PMID:29881002
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

PubMed Central

Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

2016-01-01

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062
Scheduler software for tracking and data relay satellite system loading analysis: User manual and programmer guide

NASA Technical Reports Server (NTRS)

Craft, R.; Dunn, C.; Mccord, J.; Simeone, L.

1980-01-01

A user guide and programmer documentation is provided for a system of PRIME 400 minicomputer programs. The system was designed to support loading analyses on the Tracking Data Relay Satellite System (TDRSS). The system is a scheduler for various types of data relays (including tape recorder dumps and real time relays) from orbiting payloads to the TDRSS. Several model options are available to statistically generate data relay requirements. TDRSS time lines (representing resources available for scheduling) and payload/TDRSS acquisition and loss of sight time lines are input to the scheduler from disk. Tabulated output from the interactive system includes a summary of the scheduler activities over time intervals specified by the user and overall summary of scheduler input and output information. A history file, which records every event generated by the scheduler, is written to disk to allow further scheduling on remaining resources and to provide data for graphic displays or additional statistical analysis.
Statistical methodology: II. Reliability and validity assessment in study design, Part B.

PubMed

Karras, D J

1997-02-01

Validity measures the correspondence between a test and other purported measures of the same or similar qualities. When a reference standard exists, a criterion-based validity coefficient can be calculated. If no such standard is available, the concepts of content and construct validity may be used, but quantitative analysis may not be possible. The Pearson and Spearman tests of correlation are often used to assess the correspondence between tests, but do not account for measurement biases and may yield misleading results. Techniques that measure interest differences may be more meaningful in validity assessment, and the kappa statistic is useful for analyzing categorical variables. Questionnaires often can be designed to allow quantitative assessment of reliability and validity, although this may be difficult. Inclusion of homogeneous questions is necessary to assess reliability. Analysis is enhanced by using Likert scales or similar techniques that yield ordinal data. Validity assessment of questionnaires requires careful definition of the scope of the test and comparison with previously validated tools.
Recognition of speaker-dependent continuous speech with KEAL

NASA Astrophysics Data System (ADS)

Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

1989-04-01

A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.
Cored density profiles in the DARKexp model

NASA Astrophysics Data System (ADS)

Destri, Claudio

2018-05-01

The DARKexp model represents a novel and promising attempt to solve a long standing problem of statistical mechanics, that of explaining from first principles the quasi-stationary states at the end of the collisionless gravitational collapse. The model, which yields good fits to observation and simulation data on several scales, was originally conceived to provide a theoretical basis for the 1/r cusp of the Navarro-Frenk-White profile. In this note we show that it also allows for cored density profiles that, when viewed in three dimensions, in the r→0 limit have the conical shape characteristic of the Burkert profile. It remains to be established whether both cusps and cores, or only one of the two types, are allowed beyond the asymptotic analysis of this work.
Angular reconstitution-based 3D reconstructions of nanomolecular structures from superresolution light-microscopy images

PubMed Central

Salas, Desirée; Le Gall, Antoine; Fiche, Jean-Bernard; Valeri, Alessandro; Ke, Yonggang; Bron, Patrick; Bellot, Gaetan

2017-01-01

Superresolution light microscopy allows the imaging of labeled supramolecular assemblies at a resolution surpassing the classical diffraction limit. A serious limitation of the superresolution approach is sample heterogeneity and the stochastic character of the labeling procedure. To increase the reproducibility and the resolution of the superresolution results, we apply multivariate statistical analysis methods and 3D reconstruction approaches originally developed for cryogenic electron microscopy of single particles. These methods allow for the reference-free 3D reconstruction of nanomolecular structures from two-dimensional superresolution projection images. Since these 2D projection images all show the structure in high-resolution directions of the optical microscope, the resulting 3D reconstructions have the best possible isotropic resolution in all directions. PMID:28811371
Introducing Statistical Research to Undergraduate Mathematical Statistics Students Using the Guitar Hero Video Game Series

ERIC Educational Resources Information Center

Ramler, Ivan P.; Chapman, Jessica L.

2011-01-01

In this article we describe a semester-long project, based on the popular video game series Guitar Hero, designed to introduce upper-level undergraduate statistics students to statistical research. Some of the goals of this project are to help students develop statistical thinking that allows them to approach and answer open-ended research…
Characterization of x-ray framing cameras for the National Ignition Facility using single photon pulse height analysis.

PubMed

Holder, J P; Benedetti, L R; Bradley, D K

2016-11-01

Single hit pulse height analysis is applied to National Ignition Facility x-ray framing cameras to quantify gain and gain variation in a single micro-channel plate-based instrument. This method allows the separation of gain from detectability in these photon-detecting devices. While pulse heights measured by standard-DC calibration methods follow the expected exponential distribution at the limit of a compound-Poisson process, gain-gated pulse heights follow a more complex distribution that may be approximated as a weighted sum of a few exponentials. We can reproduce this behavior with a simple statistical-sampling model.
Classification of edible oils by employing 31P and 1H NMR spectroscopy in combination with multivariate statistical analysis. A proposal for the detection of seed oil adulteration in virgin olive oils.

PubMed

Vigli, Georgia; Philippidis, Angelos; Spyros, Apostolos; Dais, Photis

2003-09-10

A combination of (1)H NMR and (31)P NMR spectroscopy and multivariate statistical analysis was used to classify 192 samples from 13 types of vegetable oils, namely, hazelnut, sunflower, corn, soybean, sesame, walnut, rapeseed, almond, palm, groundnut, safflower, coconut, and virgin olive oils from various regions of Greece. 1,2-Diglycerides, 1,3-diglycerides, the ratio of 1,2-diglycerides to total diglycerides, acidity, iodine value, and fatty acid composition determined upon analysis of the respective (1)H NMR and (31)P NMR spectra were selected as variables to establish a classification/prediction model by employing discriminant analysis. This model, obtained from the training set of 128 samples, resulted in a significant discrimination among the different classes of oils, whereas 100% of correct validated assignments for 64 samples were obtained. Different artificial mixtures of olive-hazelnut, olive-corn, olive-sunflower, and olive-soybean oils were prepared and analyzed by (1)H NMR and (31)P NMR spectroscopy. Subsequent discriminant analysis of the data allowed detection of adulteration as low as 5% w/w, provided that fresh virgin olive oil samples were used, as reflected by their high 1,2-diglycerides to total diglycerides ratio (D > or = 0.90).
A national streamflow network gap analysis

USGS Publications Warehouse

Kiang, Julie E.; Stewart, David W.; Archfield, Stacey A.; Osborne, Emily B.; Eng, Ken

2013-01-01

The U.S. Geological Survey (USGS) conducted a gap analysis to evaluate how well the USGS streamgage network meets a variety of needs, focusing on the ability to calculate various statistics at locations that have streamgages (gaged) and that do not have streamgages (ungaged). This report presents the results of analysis to determine where there are gaps in the network of gaged locations, how accurately desired statistics can be calculated with a given length of record, and whether the current network allows for estimation of these statistics at ungaged locations. The analysis indicated that there is variability across the Nation’s streamflow data-collection network in terms of the spatial and temporal coverage of streamgages. In general, the Eastern United States has better coverage than the Western United States. The arid Southwestern United States, Alaska, and Hawaii were observed to have the poorest spatial coverage, using the dataset assembled for this study. Except in Hawaii, these areas also tended to have short streamflow records. Differences in hydrology lead to differences in the uncertainty of statistics calculated in different regions of the country. Arid and semiarid areas of the Central and Southwestern United States generally exhibited the highest levels of interannual variability in flow, leading to larger uncertainty in flow statistics. At ungaged locations, information can be transferred from nearby streamgages if there is sufficient similarity between the gaged watersheds and the ungaged watersheds of interest. Areas where streamgages exhibit high correlation are most likely to be suitable for this type of information transfer. The areas with the most highly correlated streamgages appear to coincide with mountainous areas of the United States. Lower correlations are found in the Central United States and coastal areas of the Southeastern United States. Information transfer from gaged basins to ungaged basins is also most likely to be successful when basin attributes show high similarity. At the scale of the analysis completed in this study, the attributes of basins upstream of USGS streamgages cover the full range of basin attributes observed at potential locations of interest fairly well. Some exceptions included very high or very low elevation areas and very arid areas.
Application of computer-aided diagnosis (CAD) in MR-mammography (MRM): do we really need whole lesion time curve distribution analysis?

PubMed

Baltzer, Pascal Andreas Thomas; Renz, Diane M; Kullnig, Petra E; Gajda, Mieczyslaw; Camara, Oumar; Kaiser, Werner A

2009-04-01

The identification of the most suspect enhancing part of a lesion is regarded as a major diagnostic criterion in dynamic magnetic resonance mammography. Computer-aided diagnosis (CAD) software allows the semi-automatic analysis of the kinetic characteristics of complete enhancing lesions, providing additional information about lesion vasculature. The diagnostic value of this information has not yet been quantified. Consecutive patients from routine diagnostic studies (1.5 T, 0.1 mmol gadopentetate dimeglumine, dynamic gradient-echo sequences at 1-minute intervals) were analyzed prospectively using CAD. Dynamic sequences were processed and reduced to a parametric map. Curve types were classified by initial signal increase (not significant, intermediate, and strong) and the delayed time course of signal intensity (continuous, plateau, and washout). Lesion enhancement was measured using CAD. The most suspect curve, the curve-type distribution percentage, and combined dynamic data were compared. Statistical analysis included logistic regression analysis and receiver-operating characteristic analysis. Fifty-one patients with 46 malignant and 44 benign lesions were enrolled. On receiver-operating characteristic analysis, the most suspect curve showed diagnostic accuracy of 76.7 +/- 5%. In comparison, the curve-type distribution percentage demonstrated accuracy of 80.2 +/- 4.9%. Combined dynamic data had the highest diagnostic accuracy (84.3 +/- 4.2%). These differences did not achieve statistical significance. With appropriate cutoff values, sensitivity and specificity, respectively, were found to be 80.4% and 72.7% for the most suspect curve, 76.1% and 83.6% for the curve-type distribution percentage, and 78.3% and 84.5% for both parameters. The integration of whole-lesion dynamic data tends to improve specificity. However, no statistical significance backs up this finding.
A Mokken scale analysis of the peer physical examination questionnaire.

PubMed

Vaughan, Brett; Grace, Sandra

2018-01-01

Peer physical examination (PPE) is a teaching and learning strategy utilised in most health profession education programs. Perceptions of participating in PPE have been described in the literature, focusing on areas of the body students are willing, or unwilling, to examine. A small number of questionnaires exist to evaluate these perceptions, however none have described the measurement properties that may allow them to be used longitudinally. The present study undertook a Mokken scale analysis of the Peer Physical Examination Questionnaire (PPEQ) to evaluate its dimensionality and structure when used with Australian osteopathy students. Students enrolled in Year 1 of the osteopathy programs at Victoria University (Melbourne, Australia) and Southern Cross University (Lismore, Australia) were invited to complete the PPEQ prior to their first practical skills examination class. R, an open-source statistics program, was used to generate the descriptive statistics and perform a Mokken scale analysis. Mokken scale analysis is a non-parametric item response theory approach that is used to cluster items measuring a latent construct. Initial analysis suggested the PPEQ did not form a single scale. Further analysis identified three subscales: 'comfort', 'concern', and 'professionalism and education'. The properties of each subscale suggested they were unidimensional with variable internal structures. The 'comfort' subscale was the strongest of the three identified. All subscales demonstrated acceptable reliability estimation statistics (McDonald's omega > 0.75) supporting the calculation of a sum score for each subscale. The subscales identified are consistent with the literature. The 'comfort' subscale may be useful to longitudinally evaluate student perceptions of PPE. Further research is required to evaluate changes with PPE and the utility of the questionnaire with other health profession education programs.
Integration and global analysis of isothermal titration calorimetry data for studying macromolecular interactions.

PubMed

Brautigam, Chad A; Zhao, Huaying; Vargas, Carolyn; Keller, Sandro; Schuck, Peter

2016-05-01

Isothermal titration calorimetry (ITC) is a powerful and widely used method to measure the energetics of macromolecular interactions by recording a thermogram of differential heating power during a titration. However, traditional ITC analysis is limited by stochastic thermogram noise and by the limited information content of a single titration experiment. Here we present a protocol for bias-free thermogram integration based on automated shape analysis of the injection peaks, followed by combination of isotherms from different calorimetric titration experiments into a global analysis, statistical analysis of binding parameters and graphical presentation of the results. This is performed using the integrated public-domain software packages NITPIC, SEDPHAT and GUSSI. The recently developed low-noise thermogram integration approach and global analysis allow for more precise parameter estimates and more reliable quantification of multisite and multicomponent cooperative and competitive interactions. Titration experiments typically take 1-2.5 h each, and global analysis usually takes 10-20 min.
Feasibility and Requirements Analysis of MIS for Operational Patrol Squadrons in the United States Navy.

DTIC Science & Technology

1981-12-01

with statistical data on the -est questions themselves and allows upgrading of the test ques-ion bank or changes in the method of presentation. (5...accessed to meet on-line inquires from users at the OMA, INA, and SSC levels, utilizing a number of different to access similar data. A query capability...technical directive and configured item capability * nonthly Maintanance Plan *Individual material Readiness List (I21RL) SPIE calibration Expand JCN4 Tracki
Fatigue criterion to system design, life and reliability

NASA Technical Reports Server (NTRS)

Zaretsky, E. V.

1985-01-01

A generalized methodology to structural life prediction, design, and reliability based upon a fatigue criterion is advanced. The life prediction methodology is based in part on work of W. Weibull and G. Lundberg and A. Palmgren. The approach incorporates the computed life of elemental stress volumes of a complex machine element to predict system life. The results of coupon fatigue testing can be incorporated into the analysis allowing for life prediction and component or structural renewal rates with reasonable statistical certainty.
Unlikely Fluctuations and Non-Equilibrium Work Theorems-A Simple Example.

PubMed

Muzikar, Paul

2016-06-30

An exciting development in statistical mechanics has been the elucidation of a series of surprising equalities involving the work done during a nonequilibrium process. Astumian has presented an elegant example of such an equality, involving a colloidal particle undergoing Brownian motion in the presence of gravity. We analyze this example; its simplicity, and its link to geometric Brownian motion, allows us to clarify the inner workings of the equality. Our analysis explicitly shows the important role played by large, unlikely fluctuations.
A Regional Analysis of Non-Methane Hydrocarbons And Meteorology of The Rural Southeast United States

DTIC Science & Technology

1996-01-01

Zt is an ARIMA time series. This is a typical regression model , except that it allows for autocorrelation in the error term Z. In this work, an ARMA...data=folder; var residual; run; II Statistical output of 1992 regression model on 1993 ozone data ARIMA Procedure Maximum Likelihood Estimation Approx...at each of the sites, and to show the effect of synoptic meteorology on high ozone by examining NOAA daily weather maps and climatic data
ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data

PubMed Central

Morgan, Martin; Anders, Simon; Lawrence, Michael; Aboyoun, Patrick; Pagès, Hervé; Gentleman, Robert

2009-01-01

Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: mtmorgan@fhcrc.org PMID:19654119
Assessing criticality in seismicity by entropy

NASA Astrophysics Data System (ADS)

Goltz, C.

2003-04-01

There is an ongoing discussion whether the Earth's crust is in a critical state and whether this state is permanent or intermittent. Intermittent criticality would allow specification of time-dependent hazard in principle. Analysis of a spatio-temporally evolving synthetic critical point phenomenon and of real seismicity using configurational entropy shows that the method is a suitable approach for the characterisation of critical point dynamics. Results obtained rather support the notion of intermittent criticality in earthquakes. Statistical significance of the findings is assessed by the method of surrogate data.

Transistor-like behavior of single metalloprotein junctions.

PubMed

Artés, Juan M; Díez-Pérez, Ismael; Gorostiza, Pau

2012-06-13

Single protein junctions consisting of azurin bridged between a gold substrate and the probe of an electrochemical tunneling microscope (ECSTM) have been obtained by two independent methods that allowed statistical analysis over a large number of measured junctions. Conductance measurements yield (7.3 ± 1.5) × 10(-6)G(0) in agreement with reported estimates using other techniques. Redox gating of the protein with an on/off ratio of 20 was demonstrated and constitutes a proof-of-principle of a single redox protein field-effect transistor.
Facilitating Student Experimentation with Statistical Concepts.

ERIC Educational Resources Information Center

Smith, Patricia K.

2002-01-01

Offers a Web page with seven Java applets allowing students to experiment with key concepts in an introductory statistics course. Indicates the applets can be used in three ways: to place links to the applets, to create in-class demonstrations of statistical concepts, and to lead students through experiments and discover statistical relationships.…
A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula.

PubMed

Ince, Robin A A; Giordano, Bruno L; Kayser, Christoph; Rousselet, Guillaume A; Gross, Joachim; Schyns, Philippe G

2017-03-01

We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541-1573, 2017. © 2016 Wiley Periodicals, Inc. 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
Cancer Statistics Animator

Cancer.gov

This tool allows users to animate cancer trends over time by cancer site and cause of death, race, and sex. Provides access to incidence, mortality, and survival. Select the type of statistic, variables, format, and then extract the statistics in a delimited format for further analyses.
Accounting for standard errors of vision-specific latent trait in regression models.

PubMed

Wong, Wan Ling; Li, Xiang; Li, Jialiang; Wong, Tien Yin; Cheng, Ching-Yu; Lamoureux, Ecosse L

2014-07-11

To demonstrate the effectiveness of Hierarchical Bayesian (HB) approach in a modeling framework for association effects that accounts for SEs of vision-specific latent traits assessed using Rasch analysis. A systematic literature review was conducted in four major ophthalmic journals to evaluate Rasch analysis performed on vision-specific instruments. The HB approach was used to synthesize the Rasch model and multiple linear regression model for the assessment of the association effects related to vision-specific latent traits. The effectiveness of this novel HB one-stage "joint-analysis" approach allows all model parameters to be estimated simultaneously and was compared with the frequently used two-stage "separate-analysis" approach in our simulation study (Rasch analysis followed by traditional statistical analyses without adjustment for SE of latent trait). Sixty-six reviewed articles performed evaluation and validation of vision-specific instruments using Rasch analysis, and 86.4% (n = 57) performed further statistical analyses on the Rasch-scaled data using traditional statistical methods; none took into consideration SEs of the estimated Rasch-scaled scores. The two models on real data differed for effect size estimations and the identification of "independent risk factors." Simulation results showed that our proposed HB one-stage "joint-analysis" approach produces greater accuracy (average of 5-fold decrease in bias) with comparable power and precision in estimation of associations when compared with the frequently used two-stage "separate-analysis" procedure despite accounting for greater uncertainty due to the latent trait. Patient-reported data, using Rasch analysis techniques, do not take into account the SE of latent trait in association analyses. The HB one-stage "joint-analysis" is a better approach, producing accurate effect size estimations and information about the independent association of exposure variables with vision-specific latent traits. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Space-time patterns in ignimbrite compositions revealed by GIS and R based statistical analysis

NASA Astrophysics Data System (ADS)

Brandmeier, Melanie; Wörner, Gerhard

2017-04-01

GIS-based multivariate statistical and geospatial analysis of a compilation of 890 geochemical and ca. 1,200 geochronological data for 194 mapped ignimbrites from Central Andes documents the compositional and temporal pattern of large volume ignimbrites (so-called "ignimbrite flare-ups") during Neogene times. Rapid advances in computational sciences during the past decade lead to a growing pool of algorithms for multivariate statistics on big datasets with many predictor variables. This study uses the potential of R and ArcGIS and applies cluster (CA) and linear discriminant analysis (LDA) on log-ratio transformed spatial data. CA on major and trace element data allows to group ignimbrites according to their geochemical characteristics into rhyolitic and a dacitic "end-members" and differentiates characteristic trace element signatures with respect to Eu anomaly, depletion of MREEs and variable enrichment in LREE. To highlight these distinct compositional signatures, we applied LDA to selected ignimbrites for which comprehensive data sets were available. The most important predictors for discriminating ignimbrites are La (LREE), Yb (HREE), Eu, Al2O3, K2O, P2O5, MgO, FeOt and TiO2. However, other REEs such as Gd, Pr, Tm, Sm and Er also contribute to the discrimination functions. Significant compositional differences were found between the older (>14 Ma) large-volume plateau-forming ignimbrites in northernmost Chile and southern Peru and the younger (< 10 Ma) Altiplano-Puna-Volcanic-Complex ignimbrites that are of similar volumes. Older ignimbrites are less depleted in HREEs and less radiogenic in Sr isotopes, indicating smaller crustal contributions during evolution in thinner and thermally less evolved crust. These compositional variations indicate a relation to crustal thickening with a "transition" from plagioclase to amphibole and garnet residual mineralogy between 13 to 9 Ma. We correlate compositional and volumetric variations to the N-S passage of the Juan-Fernandéz ridge and crustal shortening and thickening during the past 26 Ma. The value of GIS and multivariate statistics in comparison to traditional geochemical parameters are highlighted working with large datasets with many predictors in a spatial and temporal context. Algorithms implemented in R allow taking advantage of an n-dimensional space and, thus, of subtle compositional differences contained in the data, while space-time patterns can be analyzed easily in GIS.
PHAST: Protein-like heteropolymer analysis by statistical thermodynamics

NASA Astrophysics Data System (ADS)

Frigori, Rafael B.

2017-06-01

PHAST is a software package written in standard Fortran, with MPI and CUDA extensions, able to efficiently perform parallel multicanonical Monte Carlo simulations of single or multiple heteropolymeric chains, as coarse-grained models for proteins. The outcome data can be straightforwardly analyzed within its microcanonical Statistical Thermodynamics module, which allows for computing the entropy, caloric curve, specific heat and free energies. As a case study, we investigate the aggregation of heteropolymers bioinspired on Aβ25-33 fragments and their cross-seeding with IAPP20-29 isoforms. Excellent parallel scaling is observed, even under numerically difficult first-order like phase transitions, which are properly described by the built-in fully reconfigurable force fields. Still, the package is free and open source, this shall motivate users to readily adapt it to specific purposes.
Transitions between superstatistical regimes: Validity, breakdown and applications

NASA Astrophysics Data System (ADS)

Jizba, Petr; Korbel, Jan; Lavička, Hynek; Prokš, Martin; Svoboda, Václav; Beck, Christian

2018-03-01

Superstatistics is a widely employed tool of non-equilibrium statistical physics which plays an important rôle in analysis of hierarchical complex dynamical systems. Yet, its "canonical" formulation in terms of a single nuisance parameter is often too restrictive when applied to complex empirical data. Here we show that a multi-scale generalization of the superstatistics paradigm is more versatile, allowing to address such pertinent issues as transmutation of statistics or inter-scale stochastic behavior. To put some flesh on the bare bones, we provide a numerical evidence for a transition between two superstatistics regimes, by analyzing high-frequency (minute-tick) data for share-price returns of seven selected companies. Salient issues, such as breakdown of superstatistics in fractional diffusion processes or connection with Brownian subordination are also briefly discussed.
Barcoding T Cell Calcium Response Diversity with Methods for Automated and Accurate Analysis of Cell Signals (MAAACS)

PubMed Central

Sergé, Arnauld; Bernard, Anne-Marie; Phélipot, Marie-Claire; Bertaux, Nicolas; Fallet, Mathieu; Grenot, Pierre; Marguet, Didier; He, Hai-Tao; Hamon, Yannick

2013-01-01

We introduce a series of experimental procedures enabling sensitive calcium monitoring in T cell populations by confocal video-microscopy. Tracking and post-acquisition analysis was performed using Methods for Automated and Accurate Analysis of Cell Signals (MAAACS), a fully customized program that associates a high throughput tracking algorithm, an intuitive reconnection routine and a statistical platform to provide, at a glance, the calcium barcode of a population of individual T-cells. Combined with a sensitive calcium probe, this method allowed us to unravel the heterogeneity in shape and intensity of the calcium response in T cell populations and especially in naive T cells, which display intracellular calcium oscillations upon stimulation by antigen presenting cells. PMID:24086124
Description of a Portable Wireless Device for High-Frequency Body Temperature Acquisition and Analysis

PubMed Central

Cuesta-Frau, David; Varela, Manuel; Aboy, Mateo; Miró-Martínez, Pau

2009-01-01

We describe a device for dual channel body temperature monitoring. The device can operate as a real time monitor or as a data logger, and has Bluetooth capabilities to enable for wireless data download to the computer used for data analysis. The proposed device is capable of sampling temperature at a rate of 1 sample per minute with a resolution of 0.01 °C . The internal memory allows for stand-alone data logging of up to 10 days. The device has a battery life of 50 hours in continuous real-time mode. In addition to describing the proposed device in detail, we report the results of a statistical analysis conducted to assess its accuracy and reproducibility. PMID:22408473
Description of a portable wireless device for high-frequency body temperature acquisition and analysis.

PubMed

Cuesta-Frau, David; Varela, Manuel; Aboy, Mateo; Miró-Martínez, Pau

2009-01-01

We describe a device for dual channel body temperature monitoring. The device can operate as a real time monitor or as a data logger, and has Bluetooth capabilities to enable for wireless data download to the computer used for data analysis. The proposed device is capable of sampling temperature at a rate of 1 sample per minute with a resolution of 0.01 °C . The internal memory allows for stand-alone data logging of up to 10 days. The device has a battery life of 50 hours in continuous real-time mode. In addition to describing the proposed device in detail, we report the results of a statistical analysis conducted to assess its accuracy and reproducibility.
Cryobiopsy: should this be used in place of endobronchial forceps biopsies?

PubMed

Rubio, Edmundo R; le, Susanti R; Whatley, Ralph E; Boyd, Michael B

2013-01-01

Forceps biopsies of airway lesions have variable yields. The yield increases when combining techniques in order to collect more material. With the use of cryotherapy probes (cryobiopsy) larger specimens can be obtained, resulting in an increase in the diagnostic yield. However, the utility and safety of cryobiopsy with all types of lesions, including flat mucosal lesions, is not established. Demonstrate the utility/safety of cryobiopsy versus forceps biopsy to sample exophytic and flat airway lesions. Teaching hospital-based retrospective analysis. Retrospective analysis of patients undergoing cryobiopsies (singly or combined with forceps biopsies) from August 2008 through August 2010. Statistical Analysis. Wilcoxon signed-rank test. The comparative analysis of 22 patients with cryobiopsy and forceps biopsy of the same lesion showed the mean volumes of material obtained with cryobiopsy were significantly larger (0.696 cm(3) versus 0.0373 cm(3), P = 0.0014). Of 31 cryobiopsies performed, one had minor bleeding. Cryopbiopsy allowed sampling of exophytic and flat lesions that were located centrally or distally. Cryobiopsies were shown to be safe, free of artifact, and provided a diagnostic yield of 96.77%. Cryobiopsy allows safe sampling of exophytic and flat airway lesions, with larger specimens, excellent tissue preservation and high diagnostic accuracy.
Multiple Component Event-Related Potential (mcERP) Estimation

NASA Technical Reports Server (NTRS)

Knuth, K. H.; Clanton, S. T.; Shah, A. S.; Truccolo, W. A.; Ding, M.; Bressler, S. L.; Trejo, L. J.; Schroeder, C. E.; Clancy, Daniel (Technical Monitor)

2002-01-01

We show how model-based estimation of the neural sources responsible for transient neuroelectric signals can be improved by the analysis of single trial data. Previously, we showed that a multiple component event-related potential (mcERP) algorithm can extract the responses of individual sources from recordings of a mixture of multiple, possibly interacting, neural ensembles. McERP also estimated single-trial amplitudes and onset latencies, thus allowing more accurate estimation of ongoing neural activity during an experimental trial. The mcERP algorithm is related to informax independent component analysis (ICA); however, the underlying signal model is more physiologically realistic in that a component is modeled as a stereotypic waveshape varying both in amplitude and onset latency from trial to trial. The result is a model that reflects quantities of interest to the neuroscientist. Here we demonstrate that the mcERP algorithm provides more accurate results than more traditional methods such as factor analysis and the more recent ICA. Whereas factor analysis assumes the sources are orthogonal and ICA assumes the sources are statistically independent, the mcERP algorithm makes no such assumptions thus allowing investigators to examine interactions among components by estimating the properties of single-trial responses.
GenomeGraphs: integrated genomic data visualization with R.

PubMed

Durinck, Steffen; Bullard, James; Spellman, Paul T; Dudoit, Sandrine

2009-01-06

Biological studies involve a growing number of distinct high-throughput experiments to characterize samples of interest. There is a lack of methods to visualize these different genomic datasets in a versatile manner. In addition, genomic data analysis requires integrated visualization of experimental data along with constantly changing genomic annotation and statistical analyses. We developed GenomeGraphs, as an add-on software package for the statistical programming environment R, to facilitate integrated visualization of genomic datasets. GenomeGraphs uses the biomaRt package to perform on-line annotation queries to Ensembl and translates these to gene/transcript structures in viewports of the grid graphics package. This allows genomic annotation to be plotted together with experimental data. GenomeGraphs can also be used to plot custom annotation tracks in combination with different experimental data types together in one plot using the same genomic coordinate system. GenomeGraphs is a flexible and extensible software package which can be used to visualize a multitude of genomic datasets within the statistical programming environment R.
Score tests for independence in semiparametric competing risks models.

PubMed

Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul

2009-12-01

A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.
Statistical theory and applications of lock-in carrierographic image pixel brightness dependence on multi-crystalline Si solar cell efficiency and photovoltage

NASA Astrophysics Data System (ADS)

Mandelis, Andreas; Zhang, Yu; Melnikov, Alexander

2012-09-01

A solar cell lock-in carrierographic image generation theory based on the concept of non-equilibrium radiation chemical potential was developed. An optoelectronic diode expression was derived linking the emitted radiative recombination photon flux (current density), the solar conversion efficiency, and the external load resistance via the closed- and/or open-circuit photovoltage. The expression was shown to be of a structure similar to the conventional electrical photovoltaic I-V equation, thereby allowing the carrierographic image to be used in a quantitative statistical pixel brightness distribution analysis with outcome being the non-contacting measurement of mean values of these important parameters averaged over the entire illuminated solar cell surface. This is the optoelectronic equivalent of the electrical (contacting) measurement method using an external resistor circuit and the outputs of the solar cell electrode grid, the latter acting as an averaging distribution network over the surface. The statistical theory was confirmed using multi-crystalline Si solar cells.
The implementation of the Strategy Europe 2020 objectives in European Union countries: the concept analysis and statistical evaluation.

PubMed

Stec, Małgorzata; Grzebyk, Mariola

2018-01-01

The European Union (EU), striving to create economic dominance on the global market, has prepared a comprehensive development programme, which initially was the Lisbon Strategy and then the Strategy Europe 2020. The attainment of the strategic goals included in the prospective development programmes shall transform the EU into the most competitive economy in the world based on knowledge. This paper presents a statistical evaluation of progress being made by EU member states in meeting Europe 2020. For the basis of the assessment, the authors proposed a general synthetic measure in dynamic terms, which allows to objectively compare EU member states by 10 major statistical indicators. The results indicate that most of EU countries show average progress in realisation of Europe's development programme which may suggest that the goals may not be achieved in the prescribed time. It is particularly important to monitor the implementation of Europe 2020 to arrive at the right decisions which will guarantee the accomplishment of the EU's development strategy.
Non-rigid image registration using a statistical spline deformation model.

PubMed

Loeckx, Dirk; Maes, Frederik; Vandermeulen, Dirk; Suetens, Paul

2003-07-01

We propose a statistical spline deformation model (SSDM) as a method to solve non-rigid image registration. Within this model, the deformation is expressed using a statistically trained B-spline deformation mesh. The model is trained by principal component analysis of a training set. This approach allows to reduce the number of degrees of freedom needed for non-rigid registration by only retaining the most significant modes of variation observed in the training set. User-defined transformation components, like affine modes, are merged with the principal components into a unified framework. Optimization proceeds along the transformation components rather then along the individual spline coefficients. The concept of SSDM's is applied to the temporal registration of thorax CR-images using pattern intensity as the registration measure. Our results show that, using 30 training pairs, a reduction of 33% is possible in the number of degrees of freedom without deterioration of the result. The same accuracy as without SSDM's is still achieved after a reduction up to 66% of the degrees of freedom.
Organizational heterogeneity of vertebrate genomes.

PubMed

Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham

2012-01-01

Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
Texture and composition of the Rosa Marina beach sands (Adriatic coast, southern Italy): a sedimentological/ecological approach

NASA Astrophysics Data System (ADS)

Moretti, Massimo; Tropeano, Marcello; Loon, A. J. (Tom) van; Acquafredda, Pasquale; Baldacconi, Rossella; Festa, Vincenzo; Lisco, Stefania; Mastronuzzi, Giuseppe; Moretti, Vincenzo; Scotti, Rosa

2016-06-01

Beach sands from the Rosa Marina locality (Adriatic coast, southern Italy) were analysed mainly microscopically in order to trace the source areas of their lithoclastic and bioclastic components. The main cropping out sedimentary units were also studied with the objective to identify the potential source areas of lithoclasts. This allowed to establish how the various rock units contribute to the formation of beach sands. The analysis of the bioclastic components allows to estimate the actual role of organisms regarding the supply of this material to the beach. Identification of taxa that are present in the beach sands as shell fragments or other remains was carried out at the genus or family level. Ecological investigation of the same beach and the recognition of sub-environments (mainly distinguished on the basis of the nature of the substrate and of the water depth) was the key topic that allowed to establish the actual source areas of bioclasts in the Rosa Marina beach sands. The sedimentological analysis (including a physical study of the beach and the calculation of some statistical parameters concerning the grain-size curves) shows that the Rosa Marina beach is nowadays subject to erosion.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.