complex statistical analysis: Topics by Science.gov

Sample records for complex statistical analysis

Impulse Response Operators for Structural Complexes

DTIC Science & Technology

1990-05-12

systems of the complex. The statistical energy analysis (SEA) is one such a device [ 13, 14]. The rendering of SEA from equation (21) and/or (25) lies...Propagation.] 13. L. Cremer, M. Heckl, and E.E. Ungar 1973 Structure-Borne Sound (Springer Verlag). 14. R. H. Lyon 1975 Statistical Energy Analysis of
Statistical Analysis of Big Data on Pharmacogenomics

PubMed Central

Fan, Jianqing; Liu, Han

2013-01-01

This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Statistical Assessment of Variability of Terminal Restriction Fragment Length Polymorphism Analysis Applied to Complex Microbial Communities ▿ †

PubMed Central

Rossi, Pierre; Gillet, François; Rohrbach, Emmanuelle; Diaby, Nouhou; Holliger, Christof

2009-01-01

The variability of terminal restriction fragment polymorphism analysis applied to complex microbial communities was assessed statistically. Recent technological improvements were implemented in the successive steps of the procedure, resulting in a standardized procedure which provided a high level of reproducibility. PMID:19749066
Permutation entropy and statistical complexity analysis of turbulence in laboratory plasmas and the solar wind.

PubMed

Weck, P J; Schaffner, D A; Brown, M R; Wicks, R T

2015-02-01

The Bandt-Pompe permutation entropy and the Jensen-Shannon statistical complexity are used to analyze fluctuating time series of three different turbulent plasmas: the magnetohydrodynamic (MHD) turbulence in the plasma wind tunnel of the Swarthmore Spheromak Experiment (SSX), drift-wave turbulence of ion saturation current fluctuations in the edge of the Large Plasma Device (LAPD), and fully developed turbulent magnetic fluctuations of the solar wind taken from the Wind spacecraft. The entropy and complexity values are presented as coordinates on the CH plane for comparison among the different plasma environments and other fluctuation models. The solar wind is found to have the highest permutation entropy and lowest statistical complexity of the three data sets analyzed. Both laboratory data sets have larger values of statistical complexity, suggesting that these systems have fewer degrees of freedom in their fluctuations, with SSX magnetic fluctuations having slightly less complexity than the LAPD edge I(sat). The CH plane coordinates are compared to the shape and distribution of a spectral decomposition of the wave forms. These results suggest that fully developed turbulence (solar wind) occupies the lower-right region of the CH plane, and that other plasma systems considered to be turbulent have less permutation entropy and more statistical complexity. This paper presents use of this statistical analysis tool on solar wind plasma, as well as on an MHD turbulent experimental plasma.
Deconstructing Statistical Analysis

ERIC Educational Resources Information Center

Snell, Joel

2014-01-01

Using a very complex statistical analysis and research method for the sake of enhancing the prestige of an article or making a new product or service legitimate needs to be monitored and questioned for accuracy. 1) The more complicated the statistical analysis, and research the fewer the number of learned readers can understand it. This adds a…
The High Cost of Complexity in Experimental Design and Data Analysis: Type I and Type II Error Rates in Multiway ANOVA.

ERIC Educational Resources Information Center

Smith, Rachel A.; Levine, Timothy R.; Lachlan, Kenneth A.; Fediuk, Thomas A.

2002-01-01

Notes that the availability of statistical software packages has led to a sharp increase in use of complex research designs and complex statistical analyses in communication research. Reports a series of Monte Carlo simulations which demonstrate that this complexity may come at a heavier cost than many communication researchers realize. Warns…
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

PubMed

Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

2007-01-01

The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Complexity quantification of dense array EEG using sample entropy analysis.

PubMed

Ramanand, Pravitha; Nampoori, V P N; Sreenivasan, R

2004-09-01

In this paper, a time series complexity analysis of dense array electroencephalogram signals is carried out using the recently introduced Sample Entropy (SampEn) measure. This statistic quantifies the regularity in signals recorded from systems that can vary from the purely deterministic to purely stochastic realm. The present analysis is conducted with an objective of gaining insight into complexity variations related to changing brain dynamics for EEG recorded from the three cases of passive, eyes closed condition, a mental arithmetic task and the same mental task carried out after a physical exertion task. It is observed that the statistic is a robust quantifier of complexity suited for short physiological signals such as the EEG and it points to the specific brain regions that exhibit lowered complexity during the mental task state as compared to a passive, relaxed state. In the case of mental tasks carried out before and after the performance of a physical exercise, the statistic can detect the variations brought in by the intermediate fatigue inducing exercise period. This enhances its utility in detecting subtle changes in the brain state that can find wider scope for applications in EEG based brain studies.
Statistical assessment on a combined analysis of GRYN-ROMN-UCBN upland vegetation vital signs

USGS Publications Warehouse

Irvine, Kathryn M.; Rodhouse, Thomas J.

2014-01-01

As of 2013, Rocky Mountain and Upper Columbia Basin Inventory and Monitoring Networks have multiple years of vegetation data and Greater Yellowstone Network has three years of vegetation data and monitoring is ongoing in all three networks. Our primary objective is to assess whether a combined analysis of these data aimed at exploring correlations with climate and weather data is feasible. We summarize the core survey design elements across protocols and point out the major statistical challenges for a combined analysis at present. The dissimilarity in response designs between ROMN and UCBN-GRYN network protocols presents a statistical challenge that has not been resolved yet. However, the UCBN and GRYN data are compatible as they implement a similar response design; therefore, a combined analysis is feasible and will be pursued in future. When data collected by different networks are combined, the survey design describing the merged dataset is (likely) a complex survey design. A complex survey design is the result of combining datasets from different sampling designs. A complex survey design is characterized by unequal probability sampling, varying stratification, and clustering (see Lohr 2010 Chapter 7 for general overview). Statistical analysis of complex survey data requires modifications to standard methods, one of which is to include survey design weights within a statistical model. We focus on this issue for a combined analysis of upland vegetation from these networks, leaving other topics for future research. We conduct a simulation study on the possible effects of equal versus unequal probability selection of points on parameter estimates of temporal trend using available packages within the R statistical computing package. We find that, as written, using lmer or lm for trend detection in a continuous response and clm and clmm for visually estimated cover classes with “raw” GRTS design weights specified for the weight argument leads to substantially different results and/or computational instability. However, when only fixed effects are of interest, the survey package (svyglm and svyolr) may be suitable for a model-assisted analysis for trend. We provide possible directions for future research into combined analysis for ordinal and continuous vital sign indictors.
Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

NASA Astrophysics Data System (ADS)

Donges, Jonathan F.; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik V.; Marwan, Norbert; Dijkstra, Henk A.; Kurths, Jürgen

2015-11-01

We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology.
Statistics and bioinformatics in nutritional sciences: analysis of complex data in the era of systems biology⋆

PubMed Central

Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao

2009-01-01

Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650
Bayesian Statistics and Uncertainty Quantification for Safety Boundary Analysis in Complex Systems

NASA Technical Reports Server (NTRS)

He, Yuning; Davies, Misty Dawn

2014-01-01

The analysis of a safety-critical system often requires detailed knowledge of safe regions and their highdimensional non-linear boundaries. We present a statistical approach to iteratively detect and characterize the boundaries, which are provided as parameterized shape candidates. Using methods from uncertainty quantification and active learning, we incrementally construct a statistical model from only few simulation runs and obtain statistically sound estimates of the shape parameters for safety boundaries.
Multivariate analysis: greater insights into complex systems

USDA-ARS?s Scientific Manuscript database

Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...
Statistics without Tears: Complex Statistics with Simple Arithmetic

ERIC Educational Resources Information Center

Smith, Brian

2011-01-01

One of the often overlooked aspects of modern statistics is the analysis of time series data. Modern introductory statistics courses tend to rush to probabilistic applications involving risk and confidence. Rarely does the first level course linger on such useful and fascinating topics as time series decomposition, with its practical applications…
Statistical analysis of dimer formation in supersaturated metal vapor based on molecular dynamics simulation

NASA Astrophysics Data System (ADS)

Korenchenko, Anna E.; Vorontsov, Alexander G.; Gelchinski, Boris R.; Sannikov, Grigorii P.

2018-04-01

We discuss the problem of dimer formation during the homogeneous nucleation of atomic metal vapor in an inert gas environment. We simulated nucleation with molecular dynamics and carried out the statistical analysis of double- and triple-atomic collisions as the two ways of long-lived diatomic complex formation. Close pair of atoms with lifetime greater than the mean time interval between atom-atom collisions is called a long-lived diatomic complex. We found that double- and triple-atomic collisions gave approximately the same probabilities of long-lived diatomic complex formation, but internal energy of the resulted state was essentially lower in the second case. Some diatomic complexes formed in three-particle collisions are stable enough to be a critical nucleus.
Energy-density field approach for low- and medium-frequency vibroacoustic analysis of complex structures using a statistical computational model

NASA Astrophysics Data System (ADS)

Kassem, M.; Soize, C.; Gagliardini, L.

2009-06-01

In this paper, an energy-density field approach applied to the vibroacoustic analysis of complex industrial structures in the low- and medium-frequency ranges is presented. This approach uses a statistical computational model. The analyzed system consists of an automotive vehicle structure coupled with its internal acoustic cavity. The objective of this paper is to make use of the statistical properties of the frequency response functions of the vibroacoustic system observed from previous experimental and numerical work. The frequency response functions are expressed in terms of a dimensionless matrix which is estimated using the proposed energy approach. Using this dimensionless matrix, a simplified vibroacoustic model is proposed.
Analysis of spontaneous MEG activity in mild cognitive impairment and Alzheimer's disease using spectral entropies and statistical complexity measures

NASA Astrophysics Data System (ADS)

Bruña, Ricardo; Poza, Jesús; Gómez, Carlos; García, María; Fernández, Alberto; Hornero, Roberto

2012-06-01

Alzheimer's disease (AD) is the most common cause of dementia. Over the last few years, a considerable effort has been devoted to exploring new biomarkers. Nevertheless, a better understanding of brain dynamics is still required to optimize therapeutic strategies. In this regard, the characterization of mild cognitive impairment (MCI) is crucial, due to the high conversion rate from MCI to AD. However, only a few studies have focused on the analysis of magnetoencephalographic (MEG) rhythms to characterize AD and MCI. In this study, we assess the ability of several parameters derived from information theory to describe spontaneous MEG activity from 36 AD patients, 18 MCI subjects and 26 controls. Three entropies (Shannon, Tsallis and Rényi entropies), one disequilibrium measure (based on Euclidean distance ED) and three statistical complexities (based on Lopez Ruiz-Mancini-Calbet complexity LMC) were used to estimate the irregularity and statistical complexity of MEG activity. Statistically significant differences between AD patients and controls were obtained with all parameters (p < 0.01). In addition, statistically significant differences between MCI subjects and controls were achieved by ED and LMC (p < 0.05). In order to assess the diagnostic ability of the parameters, a linear discriminant analysis with a leave-one-out cross-validation procedure was applied. The accuracies reached 83.9% and 65.9% to discriminate AD and MCI subjects from controls, respectively. Our findings suggest that MCI subjects exhibit an intermediate pattern of abnormalities between normal aging and AD. Furthermore, the proposed parameters provide a new description of brain dynamics in AD and MCI.
Statistical Field Estimation for Complex Coastal Regions and Archipelagos (PREPRINT)

DTIC Science & Technology

2011-04-09

and study the computational properties of these schemes. Specifically, we extend a multiscale Objective Analysis (OA) approach to complex coastal...computational properties of these schemes. Specifically, we extend a multiscale Objective Analysis (OA) approach to complex coastal regions and... multiscale free-surface code builds on the primitive-equation model of the Harvard Ocean Predic- tion System (HOPS, Haley et al. (2009)). Additionally
Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

PubMed Central

Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

2015-01-01

Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175
Evidence of nonextensive statistical physics behavior in the watershed distribution in active tectonic areas: examples from Greece

NASA Astrophysics Data System (ADS)

Vallianatos, Filippos; Kouli, Maria

2013-08-01

The Digital Elevation Model (DEM) for the Crete Island with a resolution of approximately 20 meters was used in order to delineate watersheds by computing the flow direction and using it in the Watershed function. The Watershed function uses a raster of flow direction to determine contributing area. The Geographic Information Systems routine procedure was applied and the watersheds as well as the streams network (using a threshold of 2000 cells, i.e. the minimum number of cells that constitute a stream) were extracted from the hydrologically corrected (free of sinks) DEM. A number of a few thousand watersheds were delineated, and their areal extent was calculated. From these watersheds a number of 300 was finally selected for further analysis as the watersheds of extremely small area were excluded in order to avoid possible artifacts. Our analysis approach is based on the basic principles of Complexity theory and Tsallis Entropy introduces in the frame of non-extensive statistical physics. This concept has been successfully used for the analysis of a variety of complex dynamic systems including natural hazards, where fractality and long-range interactions are important. The analysis indicates that the statistical distribution of watersheds can be successfully described with the theoretical estimations of non-extensive statistical physics implying the complexity that characterizes the occurrences of them.

Non-extensivity and complexity in the earthquake activity at the West Corinth rift (Greece)

NASA Astrophysics Data System (ADS)

Michas, Georgios; Vallianatos, Filippos; Sammonds, Peter

2013-04-01

Earthquakes exhibit complex phenomenology that is revealed from the fractal structure in space, time and magnitude. For that reason other tools rather than the simple Poissonian statistics seem more appropriate to describe the statistical properties of the phenomenon. Here we use Non-Extensive Statistical Physics [NESP] to investigate the inter-event time distribution of the earthquake activity at the west Corinth rift (central Greece). This area is one of the most seismotectonically active areas in Europe, with an important continental N-S extension and high seismicity rates. NESP concept refers to the non-additive Tsallis entropy Sq that includes Boltzmann-Gibbs entropy as a particular case. This concept has been successfully used for the analysis of a variety of complex dynamic systems including earthquakes, where fractality and long-range interactions are important. The analysis indicates that the cumulative inter-event time distribution can be successfully described with NESP, implying the complexity that characterizes the temporal occurrences of earthquakes. Further on, we use the Tsallis entropy (Sq) and the Fischer Information Measure (FIM) to investigate the complexity that characterizes the inter-event time distribution through different time windows along the evolution of the seismic activity at the West Corinth rift. The results of this analysis reveal a different level of organization and clusterization of the seismic activity in time. Acknowledgments. GM wish to acknowledge the partial support of the Greek State Scholarships Foundation (IKY).
[Methods of statistical analysis in differential diagnostics of the degree of brain glioma anaplasia during preoperative stage].

PubMed

Glavatskiĭ, A Ia; Guzhovskaia, N V; Lysenko, S N; Kulik, A V

2005-12-01

The authors proposed a possible preoperative diagnostics of the degree of supratentorial brain gliom anaplasia using statistical analysis methods. It relies on a complex examination of 934 patients with I-IV degree anaplasias, which had been treated in the Institute of Neurosurgery from 1990 to 2004. The use of statistical analysis methods for differential diagnostics of the degree of brain gliom anaplasia may optimize a diagnostic algorithm, increase reliability of obtained data and in some cases avoid carrying out irrational operative intrusions. Clinically important signs for the use of statistical analysis methods directed to preoperative diagnostics of brain gliom anaplasia have been defined
Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arul Kumar, M.; Wroński, M.; McCabe, Rodney James

In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less
Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

DOE PAGES

Arul Kumar, M.; Wroński, M.; McCabe, Rodney James; ...

2018-02-01

In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less
Markov Logic Networks in the Analysis of Genetic Data

PubMed Central

Sakhanenko, Nikita A.

2010-01-01

Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Statistical methodology for the analysis of dye-switch microarray experiments

PubMed Central

Mary-Huard, Tristan; Aubert, Julie; Mansouri-Attia, Nadera; Sandra, Olivier; Daudin, Jean-Jacques

2008-01-01

Background In individually dye-balanced microarray designs, each biological sample is hybridized on two different slides, once with Cy3 and once with Cy5. While this strategy ensures an automatic correction of the gene-specific labelling bias, it also induces dependencies between log-ratio measurements that must be taken into account in the statistical analysis. Results We present two original statistical procedures for the statistical analysis of individually balanced designs. These procedures are compared with the usual ML and REML mixed model procedures proposed in most statistical toolboxes, on both simulated and real data. Conclusion The UP procedure we propose as an alternative to usual mixed model procedures is more efficient and significantly faster to compute. This result provides some useful guidelines for the analysis of complex designs. PMID:18271965
Molecular dynamics simulations and statistical coupling analysis reveal functional coevolution network of oncogenic mutations in the CDKN2A-CDK6 complex.

PubMed

Wang, Jingwen; Zhao, Yuqi; Wang, Yanjie; Huang, Jingfei

2013-01-16

Coevolution between proteins is crucial for understanding protein-protein interaction. Simultaneous changes allow a protein complex to maintain its overall structural-functional integrity. In this study, we combined statistical coupling analysis (SCA) and molecular dynamics simulations on the CDK6-CDKN2A protein complex to evaluate coevolution between proteins. We reconstructed an inter-protein residue coevolution network, consisting of 37 residues and 37 interactions. It shows that most of the coevolved residue pairs are spatially proximal. When the mutations happened, the stable local structures were broken up and thus the protein interaction was decreased or inhibited, with a following increased risk of melanoma. The identification of inter-protein coevolved residues in the CDK6-CDKN2A complex can be helpful for designing protein engineering experiments. Copyright © 2012 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Exploiting Complexity Information for Brain Activation Detection

PubMed Central

Zhang, Yan; Liang, Jiali; Lin, Qiang; Hu, Zhenghui

2016-01-01

We present a complexity-based approach for the analysis of fMRI time series, in which sample entropy (SampEn) is introduced as a quantification of the voxel complexity. Under this hypothesis the voxel complexity could be modulated in pertinent cognitive tasks, and it changes through experimental paradigms. We calculate the complexity of sequential fMRI data for each voxel in two distinct experimental paradigms and use a nonparametric statistical strategy, the Wilcoxon signed rank test, to evaluate the difference in complexity between them. The results are compared with the well known general linear model based Statistical Parametric Mapping package (SPM12), where a decided difference has been observed. This is because SampEn method detects brain complexity changes in two experiments of different conditions and the data-driven method SampEn evaluates just the complexity of specific sequential fMRI data. Also, the larger and smaller SampEn values correspond to different meanings, and the neutral-blank design produces higher predictability than threat-neutral. Complexity information can be considered as a complementary method to the existing fMRI analysis strategies, and it may help improving the understanding of human brain functions from a different perspective. PMID:27045838
Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

NASA Astrophysics Data System (ADS)

Donges, Jonathan; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik; Marwan, Norbert; Dijkstra, Henk; Kurths, Jürgen

2016-04-01

We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology. pyunicorn is available online at https://github.com/pik-copan/pyunicorn. Reference: J.F. Donges, J. Heitzig, B. Beronov, M. Wiedermann, J. Runge, Q.-Y. Feng, L. Tupikina, V. Stolbova, R.V. Donner, N. Marwan, H.A. Dijkstra, and J. Kurths, Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package, Chaos 25, 113101 (2015), DOI: 10.1063/1.4934554, Preprint: arxiv.org:1507.01571 [physics.data-an].
Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

PubMed

Zeng, Irene Sui Lan; Lumley, Thomas

2018-01-01

Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.
Capturing rogue waves by multi-point statistics

NASA Astrophysics Data System (ADS)

Hadjihosseini, A.; Wächter, Matthias; Hoffmann, N. P.; Peinke, J.

2016-01-01

As an example of a complex system with extreme events, we investigate ocean wave states exhibiting rogue waves. We present a statistical method of data analysis based on multi-point statistics which for the first time allows the grasping of extreme rogue wave events in a highly satisfactory statistical manner. The key to the success of the approach is mapping the complexity of multi-point data onto the statistics of hierarchically ordered height increments for different time scales, for which we can show that a stochastic cascade process with Markov properties is governed by a Fokker-Planck equation. Conditional probabilities as well as the Fokker-Planck equation itself can be estimated directly from the available observational data. With this stochastic description surrogate data sets can in turn be generated, which makes it possible to work out arbitrary statistical features of the complex sea state in general, and extreme rogue wave events in particular. The results also open up new perspectives for forecasting the occurrence probability of extreme rogue wave events, and even for forecasting the occurrence of individual rogue waves based on precursory dynamics.
Applied statistics in agricultural, biological, and environmental sciences.

USDA-ARS?s Scientific Manuscript database

Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...
Chemometric and multivariate statistical analysis of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulfides.

PubMed

Kalegowda, Yogesh; Harmer, Sarah L

2012-03-20

Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.
Geo-statistical analysis of Culicoides spp. distribution and abundance in Sicily, Italy.

PubMed

Blanda, Valeria; Blanda, Marcellocalogero; La Russa, Francesco; Scimeca, Rossella; Scimeca, Salvatore; D'Agostino, Rosalia; Auteri, Michelangelo; Torina, Alessandra

2018-02-01

Biting midges belonging to Culicoides imicola, Culicoides obsoletus complex and Culicoides pulicaris complex (Diptera: Ceratopogonidae) are increasingly implicated as vectors of bluetongue virus in Palaearctic regions. Culicoides obsoletus complex includes C. obsoletus (sensu stricto), C. scoticus, C. dewulfi and C. chiopterus. Culicoides pulicaris and C. lupicaris belong to the Culicoides pulicaris complex. The aim of this study was a geo-statistical analysis of the abundance and spatial distribution of Culicoides spp. involved in bluetongue virus transmission. As part of the national bluetongue surveillance plan 7081 catches were collected in 897 Sicilian farms from 2000 to 2013. Onderstepoort-type blacklight traps were used for sample collection and each catch was analysed for the presence of Culicoides spp. and for the presence and abundance of Culicoides vector species (C. imicola, C. pulicaris / C. obsoletus complexes). A geo-statistical analysis was carried out monthly via the interpolation of measured values based on the Inverse Distance Weighted method, using a GIS tool. Raster maps were reclassified into seven classes according to the presence and abundance of Culicoides, in order to obtain suitable maps for Map Algebra operations. Sicilian provinces showing a very high abundance of Culicoides vector species were Messina (80% of the whole area), Palermo (20%) and Catania (12%). A total of 5654 farms fell within the very high risk area for bluetongue (21% of the 26,676 farms active in Sicily); of these, 3483 farms were in Messina, 1567 in Palermo and 604 in Catania. Culicoides imicola was prevalent in Palermo, C. pulicaris in Messina and C. obsoletus complex was very abundant over the whole island with the highest abundance value in Messina. Our study reports the results of a geo-statistical analysis concerning the abundance and spatial distribution of Culicoides spp. in Sicily throughout the fourteen year study. It provides useful decision support in the field of epidemiology, allowing the identification of areas to be monitored as bases for improved surveillance plans. Moreover, this knowledge can become a tool for the evaluation of virus transmission risks, especially if related to vector competence.
graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture.

PubMed

Chung, Dongjun; Kim, Hang J; Zhao, Hongyu

2017-02-01

Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.
GWAMA: software for genome-wide association meta-analysis.

PubMed

Mägi, Reedik; Morris, Andrew P

2010-05-28

Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.
Diffusion tensor imaging in children with tuberous sclerosis complex: tract-based spatial statistics assessment of brain microstructural changes.

PubMed

Zikou, Anastasia K; Xydis, Vasileios G; Astrakas, Loukas G; Nakou, Iliada; Tzarouchi, Loukia C; Tzoufi, Meropi; Argyropoulou, Maria I

2016-07-01

There is evidence of microstructural changes in normal-appearing white matter of patients with tuberous sclerosis complex. To evaluate major white matter tracts in children with tuberous sclerosis complex using tract-based spatial statistics diffusion tensor imaging (DTI) analysis. Eight children (mean age ± standard deviation: 8.5 ± 5.5 years) with an established diagnosis of tuberous sclerosis complex and 8 age-matched controls were studied. The imaging protocol consisted of T1-weighted high-resolution 3-D spoiled gradient-echo sequence and a spin-echo, echo-planar diffusion-weighted sequence. Differences in the diffusion indices were evaluated using tract-based spatial statistics. Tract-based spatial statistics showed increased axial diffusivity in the children with tuberous sclerosis complex in the superior and anterior corona radiata, the superior longitudinal fascicle, the inferior fronto-occipital fascicle, the uncinate fascicle and the anterior thalamic radiation. No significant differences were observed in fractional anisotropy, mean diffusivity and radial diffusivity between patients and control subjects. No difference was found in the diffusion indices between the baseline and follow-up examination in the patient group. Patients with tuberous sclerosis complex have increased axial diffusivity in major white matter tracts, probably related to reduced axonal integrity.
Circularly-symmetric complex normal ratio distribution for scalar transmissibility functions. Part III: Application to statistical modal analysis

NASA Astrophysics Data System (ADS)

Yan, Wang-Ji; Ren, Wei-Xin

2018-01-01

This study applies the theoretical findings of circularly-symmetric complex normal ratio distribution Yan and Ren (2016) [1,2] to transmissibility-based modal analysis from a statistical viewpoint. A probabilistic model of transmissibility function in the vicinity of the resonant frequency is formulated in modal domain, while some insightful comments are offered. It theoretically reveals that the statistics of transmissibility function around the resonant frequency is solely dependent on 'noise-to-signal' ratio and mode shapes. As a sequel to the development of the probabilistic model of transmissibility function in modal domain, this study poses the process of modal identification in the context of Bayesian framework by borrowing a novel paradigm. Implementation issues unique to the proposed approach are resolved by Lagrange multiplier approach. Also, this study explores the possibility of applying Bayesian analysis in distinguishing harmonic components and structural ones. The approaches are verified through simulated data and experimentally testing data. The uncertainty behavior due to variation of different factors is also discussed in detail.
Advanced functional network analysis in the geosciences: The pyunicorn package

NASA Astrophysics Data System (ADS)

Donges, Jonathan F.; Heitzig, Jobst; Runge, Jakob; Schultz, Hanna C. H.; Wiedermann, Marc; Zech, Alraune; Feldhoff, Jan; Rheinwalt, Aljoscha; Kutza, Hannes; Radebach, Alexander; Marwan, Norbert; Kurths, Jürgen

2013-04-01

Functional networks are a powerful tool for analyzing large geoscientific datasets such as global fields of climate time series originating from observations or model simulations. pyunicorn (pythonic unified complex network and recurrence analysis toolbox) is an open-source, fully object-oriented and easily parallelizable package written in the language Python. It allows for constructing functional networks (aka climate networks) representing the structure of statistical interrelationships in large datasets and, subsequently, investigating this structure using advanced methods of complex network theory such as measures for networks of interacting networks, node-weighted statistics or network surrogates. Additionally, pyunicorn allows to study the complex dynamics of geoscientific systems as recorded by time series by means of recurrence networks and visibility graphs. The range of possible applications of the package is outlined drawing on several examples from climatology.
An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals.

PubMed

Parsons, Nick R; Price, Charlotte L; Hiskens, Richard; Achten, Juul; Costa, Matthew L

2012-04-25

The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10-26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30-49%) of studies a different analysis should have been undertaken and in 17% (10-26%) a different analysis could have made a difference to the overall conclusions. It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.

Statistical Analysis of Complexity Generators for Cost Estimation

NASA Technical Reports Server (NTRS)

Rowell, Ginger Holmes

1999-01-01

Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.
Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

NASA Astrophysics Data System (ADS)

Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

2017-11-01

The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.
Complex networks as a unified framework for descriptive analysis and predictive modeling in climate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R

The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less
Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.

PubMed

Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S

2017-01-05

The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.
SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

PubMed

Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

2011-02-01

Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.
SU-E-J-261: Statistical Analysis and Chaotic Dynamics of Respiratory Signal of Patients in BodyFix

DOE Office of Scientific and Technical Information (OSTI.GOV)

Michalski, D; Huq, M; Bednarz, G

Purpose: To quantify respiratory signal of patients in BodyFix undergoing 4DCT scan with and without immobilization cover. Methods: 20 pairs of respiratory tracks recorded with RPM system during 4DCT scan were analyzed. Descriptive statistic was applied to selected parameters of exhale-inhale decomposition. Standardized signals were used with the delay method to build orbits in embedded space. Nonlinear behavior was tested with surrogate data. Sample entropy SE, Lempel-Ziv complexity LZC and the largest Lyapunov exponents LLE were compared. Results: Statistical tests show difference between scans for inspiration time and its variability, which is bigger for scans without cover. The same ismore » for variability of the end of exhalation and inhalation. Other parameters fail to show the difference. For both scans respiratory signals show determinism and nonlinear stationarity. Statistical test on surrogate data reveals their nonlinearity. LLEs show signals chaotic nature and its correlation with breathing period and its embedding delay time. SE, LZC and LLE measure respiratory signal complexity. Nonlinear characteristics do not differ between scans. Conclusion: Contrary to expectation cover applied to patients in BodyFix appears to have limited effect on signal parameters. Analysis based on trajectories of delay vectors shows respiratory system nonlinear character and its sensitive dependence on initial conditions. Reproducibility of respiratory signal can be evaluated with measures of signal complexity and its predictability window. Longer respiratory period is conducive for signal reproducibility as shown by these gauges. Statistical independence of the exhale and inhale times is also supported by the magnitude of LLE. The nonlinear parameters seem more appropriate to gauge respiratory signal complexity since its deterministic chaotic nature. It contrasts with measures based on harmonic analysis that are blind for nonlinear features. Dynamics of breathing, so crucial for 4D-based clinical technologies, can be better controlled if nonlinear-based methodology, which reflects respiration characteristic, is applied. Funding provided by Varian Medical Systems via Investigator Initiated Research Project.« less
Identification of Major Histocompatibility Complex-Regulated Body Odorants by Statistical Analysis of a Comparative Gas Chromatography/Mass Spectrometry Experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Willse, Alan R.; Belcher, Ann; Preti, George

2005-04-15

Gas chromatography (GC), combined with mass spectrometry (MS) detection, is a powerful analytical technique that can be used to separate, quantify, and identify volatile compounds in complex mixtures. This paper examines the application of GC-MS in a comparative experiment to identify volatiles that differ in concentration between two groups. A complex mixture might comprise several hundred or even thousands of volatile compounds. Because their number and location in a chromatogram generally are unknown, and because components overlap in populous chromatograms, the statistical problems offer significant challenges beyond traditional two-group screening procedures. We describe a statistical procedure to compare two-dimensional GC-MSmore » profiles between groups, which entails (1) signal processing: baseline correction and peak detection in single ion chromatograms; (2) aligning chromatograms in time; (3) normalizing differences in overall signal intensities; and (4) detecting chromatographic regions that differ between groups. Compared to existing approaches, the proposed method is robust to errors made at earlier stages of analysis, such as missed peaks or slightly misaligned chromatograms. To illustrate the method, we identify differences in GC-MS chromatograms of ether-extracted urine collected from two nearly identical inbred groups of mice, to investigate the relationship between odor and genetics of the major histocompatibility complex.« less
Primer of statistics in dental research: part I.

PubMed

Shintani, Ayumi

2014-01-01

Statistics play essential roles in evidence-based dentistry (EBD) practice and research. It ranges widely from formulating scientific questions, designing studies, collecting and analyzing data to interpreting, reporting, and presenting study findings. Mastering statistical concepts appears to be an unreachable goal among many dental researchers in part due to statistical authorities' limitations of explaining statistical principles to health researchers without elaborating complex mathematical concepts. This series of 2 articles aim to introduce dental researchers to 9 essential topics in statistics to conduct EBD with intuitive examples. The part I of the series includes the first 5 topics (1) statistical graph, (2) how to deal with outliers, (3) p-value and confidence interval, (4) testing equivalence, and (5) multiplicity adjustment. Part II will follow to cover the remaining topics including (6) selecting the proper statistical tests, (7) repeated measures analysis, (8) epidemiological consideration for causal association, and (9) analysis of agreement. Copyright © 2014. Published by Elsevier Ltd.
SAP- FORTRAN STATIC SOURCE CODE ANALYZER PROGRAM (IBM VERSION)

NASA Technical Reports Server (NTRS)

Manteufel, R.

1994-01-01

The FORTRAN Static Source Code Analyzer program, SAP, was developed to automatically gather statistics on the occurrences of statements and structures within a FORTRAN program and to provide for the reporting of those statistics. Provisions have been made for weighting each statistic and to provide an overall figure of complexity. Statistics, as well as figures of complexity, are gathered on a module by module basis. Overall summed statistics are also accumulated for the complete input source file. SAP accepts as input syntactically correct FORTRAN source code written in the FORTRAN 77 standard language. In addition, code written using features in the following languages is also accepted: VAX-11 FORTRAN, IBM S/360 FORTRAN IV Level H Extended; and Structured FORTRAN. The SAP program utilizes two external files in its analysis procedure. A keyword file allows flexibility in classifying statements and in marking a statement as either executable or non-executable. A statistical weight file allows the user to assign weights to all output statistics, thus allowing the user flexibility in defining the figure of complexity. The SAP program is written in FORTRAN IV for batch execution and has been implemented on a DEC VAX series computer under VMS and on an IBM 370 series computer under MVS. The SAP program was developed in 1978 and last updated in 1985.
SAP- FORTRAN STATIC SOURCE CODE ANALYZER PROGRAM (DEC VAX VERSION)

NASA Technical Reports Server (NTRS)

Merwarth, P. D.

1994-01-01

The FORTRAN Static Source Code Analyzer program, SAP, was developed to automatically gather statistics on the occurrences of statements and structures within a FORTRAN program and to provide for the reporting of those statistics. Provisions have been made for weighting each statistic and to provide an overall figure of complexity. Statistics, as well as figures of complexity, are gathered on a module by module basis. Overall summed statistics are also accumulated for the complete input source file. SAP accepts as input syntactically correct FORTRAN source code written in the FORTRAN 77 standard language. In addition, code written using features in the following languages is also accepted: VAX-11 FORTRAN, IBM S/360 FORTRAN IV Level H Extended; and Structured FORTRAN. The SAP program utilizes two external files in its analysis procedure. A keyword file allows flexibility in classifying statements and in marking a statement as either executable or non-executable. A statistical weight file allows the user to assign weights to all output statistics, thus allowing the user flexibility in defining the figure of complexity. The SAP program is written in FORTRAN IV for batch execution and has been implemented on a DEC VAX series computer under VMS and on an IBM 370 series computer under MVS. The SAP program was developed in 1978 and last updated in 1985.
Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

PubMed

Nariai, N; Kim, S; Imoto, S; Miyano, S

2004-01-01

We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.
Quasi-Experimental Analysis: A Mixture of Methods and Judgment.

ERIC Educational Resources Information Center

Cordray, David S.

1986-01-01

The role of human judgment in the development and synthesis of evidence has not been adequately developed or acknowledged within quasi-experimental analysis. Corrective solutions need to confront the fact that causal analysis within complex environments will require a more active assessment that entails reasoning and statistical modeling.…
Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

PubMed Central

McDermott, Jason E.; Wang, Jing; Mitchell, Hugh; Webb-Robertson, Bobbie-Jo; Hafen, Ryan; Ramey, John; Rodland, Karin D.

2012-01-01

Introduction The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers. PMID:23335946
Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.

2013-01-01

The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less
RADSS: an integration of GIS, spatial statistics, and network service for regional data mining

NASA Astrophysics Data System (ADS)

Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing

2005-10-01

Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.
Quasi-experimental study designs series-paper 10: synthesizing evidence for effects collected from quasi-experimental studies presents surmountable challenges.

PubMed

Becker, Betsy Jane; Aloe, Ariel M; Duvendack, Maren; Stanley, T D; Valentine, Jeffrey C; Fretheim, Atle; Tugwell, Peter

2017-09-01

To outline issues of importance to analytic approaches to the synthesis of quasi-experiments (QEs) and to provide a statistical model for use in analysis. We drew on studies of statistics, epidemiology, and social-science methodology to outline methods for synthesis of QE studies. The design and conduct of QEs, effect sizes from QEs, and moderator variables for the analysis of those effect sizes were discussed. Biases, confounding, design complexities, and comparisons across designs offer serious challenges to syntheses of QEs. Key components of meta-analyses of QEs were identified, including the aspects of QE study design to be coded and analyzed. Of utmost importance are the design and statistical controls implemented in the QEs. Such controls and any potential sources of bias and confounding must be modeled in analyses, along with aspects of the interventions and populations studied. Because of such controls, effect sizes from QEs are more complex than those from randomized experiments. A statistical meta-regression model that incorporates important features of the QEs under review was presented. Meta-analyses of QEs provide particular challenges, but thorough coding of intervention characteristics and study methods, along with careful analysis, should allow for sound inferences. Copyright © 2017 Elsevier Inc. All rights reserved.
A clinicomicrobiological study to evaluate the efficacy of manual and powered toothbrushes among autistic patients

PubMed Central

Vajawat, Mayuri; Deepika, P. C.; Kumar, Vijay; Rajeshwari, P.

2015-01-01

Aim: To compare the efficacy of powered toothbrushes in improving gingival health and reducing salivary red complex counts as compared to manual toothbrushes, among autistic individuals. Materials and Methods: Forty autistics was selected. Test group received powered toothbrushes, and control group received manual toothbrushes. Plaque index and gingival index were recorded. Unstimulated saliva was collected for analysis of red complex organisms using polymerase chain reaction. Results: A statistically significant reduction in the plaque scores was seen over a period of 12 weeks in both the groups (P < 0.001 for tests and P = 0.002 for controls). This reduction was statistically more significant in the test group (P = 0.024). A statistically significant reduction in the gingival scores was seen over a period of 12 weeks in both the groups (P < 0.001 for tests and P = 0.001 for controls). This reduction was statistically more significant in the test group (P = 0.042). No statistically significant reduction in the detection rate of red complex organisms were seen at 4 weeks in both the groups. Conclusion: Powered toothbrushes result in a significant overall improvement in gingival health when constant reinforcement of oral hygiene instructions is given. PMID:26681855
OASIS 2: online application for survival analysis 2 with features for the analysis of maximal lifespan and healthspan in aging research.

PubMed

Han, Seong Kyu; Lee, Dongyeop; Lee, Heetak; Kim, Donghyo; Son, Heehwa G; Yang, Jae-Seong; Lee, Seung-Jae V; Kim, Sanguk

2016-08-30

Online application for survival analysis (OASIS) has served as a popular and convenient platform for the statistical analysis of various survival data, particularly in the field of aging research. With the recent advances in the fields of aging research that deal with complex survival data, we noticed a need for updates to the current version of OASIS. Here, we report OASIS 2 (http://sbi.postech.ac.kr/oasis2), which provides extended statistical tools for survival data and an enhanced user interface. In particular, OASIS 2 enables the statistical comparison of maximal lifespans, which is potentially useful for determining key factors that limit the lifespan of a population. Furthermore, OASIS 2 provides statistical and graphical tools that compare values in different conditions and times. That feature is useful for comparing age-associated changes in physiological activities, which can be used as indicators of "healthspan." We believe that OASIS 2 will serve as a standard platform for survival analysis with advanced and user-friendly statistical tools for experimental biologists in the field of aging research.
Comparisons of non-Gaussian statistical models in DNA methylation analysis.

PubMed

Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-06-16

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

PubMed Central

Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-01-01

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

PubMed Central

Marcum, Christopher Steven; Butts, Carter T.

2015-01-01

The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488
Sedimentological analysis and bed thickness statistics from a Carboniferous deep-water channel-levee complex: Myall Trough, SE Australia

NASA Astrophysics Data System (ADS)

Palozzi, Jason; Pantopoulos, George; Maravelis, Angelos G.; Nordsvan, Adam; Zelilidis, Avraam

2018-02-01

This investigation presents an outcrop-based integrated study of internal division analysis and statistical treatment of turbidite bed thickness applied to a Carboniferous deep-water channel-levee complex in the Myall Trough, southeast Australia. Turbidite beds of the studied succession are characterized by a range of sedimentary structures grouped into two main associations, a thick-bedded and a thin-bedded one, that reflect channel-fill and overbank/levee deposits, respectively. Three vertically stacked channel-levee cycles have been identified. Results of statistical analysis of bed thickness, grain-size and internal division patterns applied on the studied channel-levee succession, indicate that turbidite bed thickness data seem to be well characterized by a bimodal lognormal distribution, which is possibly reflecting the difference between deposition from lower-density flows (in a levee/overbank setting) and very high-density flows (in a channel fill setting). Power law and exponential distributions were observed to hold only for the thick-bedded parts of the succession and cannot characterize the whole bed thickness range of the studied sediments. The succession also exhibits non-random clustering of bed thickness and grain-size measurements. The studied sediments are also characterized by the presence of statistically detected fining-upward sandstone packets. A novel quantitative approach (change-point analysis) is proposed for the detection of those packets. Markov permutation statistics also revealed the existence of order in the alternation of internal divisions in the succession expressed by an optimal internal division cycle reflecting two main types of gravity flow events deposited within both thick-bedded conglomeratic and thin-bedded sandstone associations. The analytical methods presented in this study can be used as additional tools for quantitative analysis and recognition of depositional environments in hydrocarbon-bearing research of ancient deep-water channel-levee settings.
Monitoring the soil degradation by Metastatistical Analysis

NASA Astrophysics Data System (ADS)

Oleschko, K.; Gaona, C.; Tarquis, A.

2009-04-01

The effectiveness of fractal toolbox to capture the critical behavior of soil structural patterns during the chemical and physical degradation was documented by our numerous experiments (Oleschko et al., 2008 a; 2008 b). The spatio-temporal dynamics of these patterns was measured and mapped with high precision in terms of fractal descriptors. All tested fractal techniques were able to detect the statistically significant differences in structure between the perfect spongy and massive patterns of uncultivated and sodium-saline agricultural soils, respectively. For instance, the Hurst exponent, extracted from the Chernozeḿ micromorphological images and from the time series of its physical and mechanical properties measured in situ, detected the roughness decrease (and therefore the increase in H - from 0.17 to 0.30 for images) derived from the loss of original structure complexity. The combined use of different fractal descriptors brings statistical precision into the quantification of natural system degradation and provides a means for objective soil structure comparison (Oleschko et al., 2000). The ability of fractal parameters to capture critical behavior and phase transition was documented for different contrasting situations, including from Andosols deforestation and erosion, to Vertisols high fructuring and consolidation. The Hurst exponent is used to measure the type of persistence and degree of complexity of structure dynamics. We conclude that there is an urgent need to select and adopt a standardized toolbox for fractal analysis and complexity measures in Earth Sciences. We propose to use the second-order (meta-) statistics as subtle measures of complexity (Atmanspacher et al., 1997). The high degree of correlation was documented between the fractal and high-order statistical descriptors (four central moments of stochastic variable distribution) used to the system heterogeneity and variability analysis. We proposed to call this combined fractal/statistical toolbox Metastatistical Analysis and recommend it to the projects directed to soil degradation monitoring. References: 1. Oleschko, K., B.S. Figueroa, M.E. Miranda, M.A. Vuelvas and E.R. Solleiro, Soil & Till. Res. 55, 43 (2000). 2. Oleschko, K., Korvin, G., Figueroa S. B., Vuelvas, M.A., Balankin, A., Flores L., Carreño, D. Fractal radar scattering from soil. Physical Review E.67, 041403, 2003. 3. Zamora-Castro S., Oleschko, K. Flores, L., Ventura, E. Jr., Parrot, J.-F., 2008. Fractal mapping of pore and solids attributes. Vadose Zone Journal, v. 7, Issue2: 473-492. 4. Oleschko, K., Korvin, G., Muñoz, A., Velásquez, J., Miranda, M.E., Carreon, D., Flores, L., Martínez, M., Velásquez-Valle, M., Brambilla, F., Parrot, J.-F. Ronquillo, G., 2008. Fractal mapping of soil moisture content from remote sensed multi-scale data. Nonlinear Proceses in Geophysics Journal, 15: 711-725. 5. Atmanspacher, H., Räth, Ch., Wiedenmann, G., 1997. Statistics and meta-statistics in the concept of complexity. Physica A, 234: 819-829.
[The clinical economic analysis of application of immune correcting preparations to prevent respiratory infections and their complications in frequently ill children of early school age].

PubMed

Maiorov, R V; Derbenov, D P

2014-01-01

The article presents the results of clinical economic analysis of effect of different immune correcting preparations on rate of respiratory infections in 548 frequently ill children of early school age. It is established that preventive immune correction with lysates of bacteria or glucosaminyl muramyl dipeptide in aggregate with vitamin mineral complex results in statistically significant decreasing of rate of respiratory infections and dramatic decreasing of direct and indirect costs of treatment of infectious diseases of respiratory ways. The preventive application of juice of cone-flower herb or interferon in aggregate with vitamnin mineral complex statistically significantly decreases rate of respiratory infections and negligibly decreases direct and indirect costs of their treatment.
Docking studies on NSAID/COX-2 isozyme complexes using Contact Statistics analysis

NASA Astrophysics Data System (ADS)

Ermondi, Giuseppe; Caron, Giulia; Lawrence, Raelene; Longo, Dario

2004-11-01

The selective inhibition of COX-2 isozymes should lead to a new generation of NSAIDs with significantly reduced side effects; e.g. celecoxib (Celebrex®) and rofecoxib (Vioxx®). To obtain inhibitors with higher selectivity it has become essential to gain additional insight into the details of the interactions between COX isozymes and NSAIDs. Although X-ray structures of COX-2 complexed with a small number of ligands are available, experimental data are missing for two well-known selective COX-2 inhibitors (rofecoxib and nimesulide) and docking results reported are controversial. We use a combination of a traditional docking procedure with a new computational tool (Contact Statistics analysis) that identifies the best orientation among a number of solutions to shed some light on this topic.
Online Statistical Modeling (Regression Analysis) for Independent Responses

NASA Astrophysics Data System (ADS)

Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

2017-06-01

Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
EEG analysis using wavelet-based information tools.

PubMed

Rosso, O A; Martin, M T; Figliola, A; Keller, K; Plastino, A

2006-06-15

Wavelet-based informational tools for quantitative electroencephalogram (EEG) record analysis are reviewed. Relative wavelet energies, wavelet entropies and wavelet statistical complexities are used in the characterization of scalp EEG records corresponding to secondary generalized tonic-clonic epileptic seizures. In particular, we show that the epileptic recruitment rhythm observed during seizure development is well described in terms of the relative wavelet energies. In addition, during the concomitant time-period the entropy diminishes while complexity grows. This is construed as evidence supporting the conjecture that an epileptic focus, for this kind of seizures, triggers a self-organized brain state characterized by both order and maximal complexity.
Complex degree of mutual anisotropy in diagnostics of biological tissues physiological changes

NASA Astrophysics Data System (ADS)

Ushenko, Yu. A.; Dubolazov, O. V.; Karachevtcev, A. O.; Zabolotna, N. I.

2011-05-01

To characterize the degree of consistency of parameters of the optically uniaxial birefringent protein nets of blood plasma a new parameter - complex degree of mutual anisotropy is suggested. The technique of polarization measuring the coordinate distributions of the complex degree of mutual anisotropy of blood plasma is developed. It is shown that statistic approach to the analysis of complex degree of mutual anisotropy distributions of blood plasma is effective in the diagnosis and differentiation of acute inflammation - acute and gangrenous appendicitis.
Complex degree of mutual anisotropy in diagnostics of biological tissues physiological changes

NASA Astrophysics Data System (ADS)

Ushenko, Yu. A.; Dubolazov, A. V.; Karachevtcev, A. O.; Zabolotna, N. I.

2011-09-01

To characterize the degree of consistency of parameters of the optically uniaxial birefringent protein nets of blood plasma a new parameter - complex degree of mutual anisotropy is suggested. The technique of polarization measuring the coordinate distributions of the complex degree of mutual anisotropy of blood plasma is developed. It is shown that statistic approach to the analysis of complex degree of mutual anisotropy distributions of blood plasma is effective in the diagnosis and differentiation of acute inflammation - acute and gangrenous appendicitis.
Steganalysis based on reducing the differences of image statistical characteristics

NASA Astrophysics Data System (ADS)

Wang, Ran; Niu, Shaozhang; Ping, Xijian; Zhang, Tao

2018-04-01

Compared with the process of embedding, the image contents make a more significant impact on the differences of image statistical characteristics. This makes the image steganalysis to be a classification problem with bigger withinclass scatter distances and smaller between-class scatter distances. As a result, the steganalysis features will be inseparate caused by the differences of image statistical characteristics. In this paper, a new steganalysis framework which can reduce the differences of image statistical characteristics caused by various content and processing methods is proposed. The given images are segmented to several sub-images according to the texture complexity. Steganalysis features are separately extracted from each subset with the same or close texture complexity to build a classifier. The final steganalysis result is figured out through a weighted fusing process. The theoretical analysis and experimental results can demonstrate the validity of the framework.
Analysis of the quantitative dermatoglyphics of the digito-palmar complex in patients with multiple sclerosis.

PubMed

Supe, S; Milicić, J; Pavićević, R

1997-06-01

Recent studies on the etiopathogenesis of multiple sclerosis (MS) all point out that there is a polygenetical predisposition for this illness. The so called "MS Trait" determines the reactivity of the immunological system upon ecological factors. The development of the glyphological science and the study of the characteristics of the digito-palmar dermatoglyphic complex (for which it was established that they are polygenetically determined characteristics) all enable a better insight into the genetic development during early embriogenesis. The aim of this study was to estimate certain differences in the dermatoglyphics of digito-palmar complexes between the group with multiple sclerosis and the comparable, phenotypically healthy groups of both sexes. This study is based on the analysis of 18 quantitative characteristics of the digito-palmar complex in 125 patients with multiple sclerosis (41 males and 84 females) in comparison to a group of 400 phenotypically healthy patients (200 males and 200 females). The conducted analysis pointed towards a statistically significant decrease of the number of digital and palmar ridges, as well as with lower values of atd angles in a group of MS patients of both sexes. The main discriminators were the characteristic palmar dermatoglyphics with the possibility that the discriminate analysis classifies over 80% of the examinees which exceeds the statistical significance. The results of this study suggest a possible discrimination of patients with MS and the phenotypically health population through the analysis of the dermatoglyphic status, and therefore the possibility that multiple sclerosis is genetically predisposed disease.
Identification of Nanoparticle Prototypes and Archetypes.

PubMed

Fernandez, Michael; Barnard, Amanda S

2015-12-22

High-throughput (HT) computational characterization of nanomaterials is poised to accelerate novel material breakthroughs. The number of possible nanomaterials is increasing exponentially along with their complexity, and so statistical and information technology will play a fundamental role in rationalizing nanomaterials HT data. We demonstrate that multivariate statistical analysis of heterogeneous ensembles can identify the truly significant nanoparticles and their most relevant properties. Virtual samples of diamond nanoparticles and graphene nanoflakes are characterized using clustering and archetypal analysis, where we find that saturated particles are defined by their geometry, while nonsaturated nanoparticles are defined by their carbon chemistry. At the complex hull of the nanostructure spaces, a combination of complex archetypes can efficiency describe a large number of members of the ensembles, whereas the regular shapes that are typically assumed to be representative can only describe a small set of the most regular morphologies. This approach provides a route toward the characterization of computationally intractable virtual nanomaterial spaces, which can aid nanomaterials discovery in the foreseen big data scenario.
Regional Morphology Analysis Package (RMAP): Empirical Orthogonal Function Analysis, Background and Examples

DTIC Science & Technology

2007-10-01

1984. Complex principal component analysis : Theory and examples. Journal of Climate and Applied Meteorology 23: 1660-1673. Hotelling, H. 1933...Sediments 99. ASCE: 2,566-2,581. Von Storch, H., and A. Navarra. 1995. Analysis of climate variability. Applications of statistical techniques. Berlin...ERDC TN-SWWRP-07-9 October 2007 Regional Morphology Empirical Analysis Package (RMAP): Orthogonal Function Analysis , Background and Examples by
A critique of the usefulness of inferential statistics in applied behavior analysis

PubMed Central

Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.

1998-01-01

Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304
Application of multivariate statistical techniques for differentiation of ripe banana flour based on the composition of elements.

PubMed

Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat

2009-01-01

Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.
Potentiation Following Ballistic and Nonballistic Complexes: The Effect of Strength Level.

PubMed

Suchomel, Timothy J; Sato, Kimitake; DeWeese, Brad H; Ebben, William P; Stone, Michael H

2016-07-01

Suchomel, TJ, Sato, K, DeWeese, BH, Ebben, WP, and Stone, MH. Potentiation following ballistic and nonballistic complexes: the effect of strength level. J Strength Cond Res 30(7): 1825-1833, 2016-The purpose of this study was to compare the temporal profile of strong and weak subjects during ballistic and nonballistic potentiation complexes. Eight strong (relative back squat = 2.1 ± 0.1 times body mass) and 8 weak (relative back squat = 1.6 ± 0.2 times body mass) males performed squat jumps immediately and every minute up to 10 minutes following potentiation complexes that included ballistic or nonballistic concentric-only half-squat (COHS) performed at 90% of their 1 repetition maximum COHS. Jump height (JH) and allometrically scaled peak power (PPa) were compared using a series of 2 × 12 repeated measures analyses of variance. No statistically significant strength level main effects for JH (p = 0.442) or PPa (p = 0.078) existed during the ballistic condition. In contrast, statistically significant main effects for time existed for both JH (p = 0.014) and PPa (p < 0.001); however, no statistically significant pairwise comparisons were present (p > 0.05). Statistically significant strength level main effects existed for PPa (p = 0.039) but not for JH (p = 0.137) during the nonballistic condition. Post hoc analysis revealed that the strong subjects produced statistically greater PPa than the weaker subjects (p = 0.039). Statistically significant time main effects existed for time existed for PPa (p = 0.015), but not for JH (p = 0.178). No statistically significant strength level × time interaction effects for JH (p = 0.319) or PPa (p = 0.203) were present for the ballistic or nonballistic conditions. Practical significance indicated by effect sizes and the relationships between maximum potentiation and relative strength suggest that stronger subjects potentiate earlier and to a greater extent than weaker subjects during ballistic and nonballistic potentiation complexes.
A graph theory approach to identify resonant and non-resonant transmission paths in statistical modal energy distribution analysis

NASA Astrophysics Data System (ADS)

Aragonès, Àngels; Maxit, Laurent; Guasch, Oriol

2015-08-01

Statistical modal energy distribution analysis (SmEdA) extends classical statistical energy analysis (SEA) to the mid frequency range by establishing power balance equations between modes in different subsystems. This circumvents the SEA requirement of modal energy equipartition and enables applying SmEdA to the cases of low modal overlap, locally excited subsystems and to deal with complex heterogeneous subsystems as well. Yet, widening the range of application of SEA is done at a price with large models because the number of modes per subsystem can become considerable when the frequency increases. Therefore, it would be worthwhile to have at one's disposal tools for a quick identification and ranking of the resonant and non-resonant paths involved in modal energy transmission between subsystems. It will be shown that previously developed graph theory algorithms for transmission path analysis (TPA) in SEA can be adapted to SmEdA and prove useful for that purpose. The case of airborne transmission between two cavities separated apart by homogeneous and ribbed plates will be first addressed to illustrate the potential of the graph approach. A more complex case representing transmission between non-contiguous cavities in a shipbuilding structure will be also presented.
Event coincidence analysis for quantifying statistical interrelationships between event time series. On the role of flood events as triggers of epidemic outbreaks

NASA Astrophysics Data System (ADS)

Donges, J. F.; Schleussner, C.-F.; Siegmund, J. F.; Donner, R. V.

2016-05-01

Studying event time series is a powerful approach for analyzing the dynamics of complex dynamical systems in many fields of science. In this paper, we describe the method of event coincidence analysis to provide a framework for quantifying the strength, directionality and time lag of statistical interrelationships between event series. Event coincidence analysis allows to formulate and test null hypotheses on the origin of the observed interrelationships including tests based on Poisson processes or, more generally, stochastic point processes with a prescribed inter-event time distribution and other higher-order properties. Applying the framework to country-level observational data yields evidence that flood events have acted as triggers of epidemic outbreaks globally since the 1950s. Facing projected future changes in the statistics of climatic extreme events, statistical techniques such as event coincidence analysis will be relevant for investigating the impacts of anthropogenic climate change on human societies and ecosystems worldwide.
Incorporating Budget Impact Analysis in the Implementation of Complex Interventions: A Case of an Integrated Intervention for Multimorbid Patients within the CareWell Study.

PubMed

Soto-Gordoa, Myriam; Arrospide, Arantzazu; Merino Hernández, Marisa; Mora Amengual, Joana; Fullaondo Zabala, Ane; Larrañaga, Igor; de Manuel, Esteban; Mar, Javier

2017-01-01

To develop a framework for the management of complex health care interventions within the Deming continuous improvement cycle and to test the framework in the case of an integrated intervention for multimorbid patients in the Basque Country within the CareWell project. Statistical analysis alone, although necessary, may not always represent the practical significance of the intervention. Thus, to ascertain the true economic impact of the intervention, the statistical results can be integrated into the budget impact analysis. The intervention of the case study consisted of a comprehensive approach that integrated new provider roles and new technological infrastructure for multimorbid patients, with the aim of reducing patient decompensations by 10% over 5 years. The study period was 2012 to 2020. Given the aging of the general population, the conventional scenario predicts an increase of 21% in the health care budget for care of multimorbid patients during the study period. With a successful intervention, this figure should drop to 18%. The statistical analysis, however, showed no significant differences in costs either in primary care or in hospital care between 2012 and 2014. The real costs in 2014 were by far closer to those in the conventional scenario than to the reductions expected in the objective scenario. The present implementation should be reappraised, because the present expenditure did not move closer to the objective budget. This work demonstrates the capacity of budget impact analysis to enhance the implementation of complex interventions. Its integration in the context of the continuous improvement cycle is transferable to other contexts in which implementation depth and time are important. Copyright © 2017 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Statistical assessment of the learning curves of health technologies.

PubMed

Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T

2001-01-01

(1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)

Cryptic or pseudocryptic: can morphological methods inform copepod taxonomy? An analysis of publications and a case study of the Eurytemora affinis species complex

PubMed Central

Lajus, Dmitry; Sukhikh, Natalia; Alekseev, Victor

2015-01-01

Interest in cryptic species has increased significantly with current progress in genetic methods. The large number of cryptic species suggests that the resolution of traditional morphological techniques may be insufficient for taxonomical research. However, some species now considered to be cryptic may, in fact, be designated pseudocryptic after close morphological examination. Thus the “cryptic or pseudocryptic” dilemma speaks to the resolution of morphological analysis and its utility for identifying species. We address this dilemma first by systematically reviewing data published from 1980 to 2013 on cryptic species of Copepoda and then by performing an in-depth morphological study of the former Eurytemora affinis complex of cryptic species. Analyzing the published data showed that, in 5 of 24 revisions eligible for systematic review, cryptic species assignment was based solely on the genetic variation of forms without detailed morphological analysis to confirm the assignment. Therefore, some newly described cryptic species might be designated pseudocryptic under more detailed morphological analysis as happened with Eurytemora affinis complex. Recent genetic analyses of the complex found high levels of heterogeneity without morphological differences; it is argued to be cryptic. However, next detailed morphological analyses allowed to describe a number of valid species. Our study, using deep statistical analyses usually not applied for new species describing, of this species complex confirmed considerable differences between former cryptic species. In particular, fluctuating asymmetry (FA), the random variation of left and right structures, was significantly different between forms and provided independent information about their status. Our work showed that multivariate statistical approaches, such as principal component analysis, can be powerful techniques for the morphological discrimination of cryptic taxons. Despite increasing cryptic species designations, morphological techniques have great potential in determining copepod taxonomy. PMID:26120427
A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

PubMed

Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

2017-12-07

Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Mathematics and statistics research progress report, period ending June 30, 1983

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beauchamp, J. J.; Denson, M. V.; Heath, M. T.

1983-08-01

This report is the twenty-sixth in the series of progress reports of Mathematics and Statistics Research of the Computer Sciences organization, Union Carbide Corporation Nuclear Division. Part A records research progress in analysis of large data sets, applied analysis, biometrics research, computational statistics, materials science applications, numerical linear algebra, and risk analysis. Collaboration and consulting with others throughout the Oak Ridge Department of Energy complex are recorded in Part B. Included are sections on biological sciences, energy, engineering, environmental sciences, health and safety, and safeguards. Part C summarizes the various educational activities in which the staff was engaged. Part Dmore » lists the presentations of research results, and Part E records the staff's other professional activities during the report period.« less
Phase locking route behind complex periodic windows in a forced oscillator

NASA Astrophysics Data System (ADS)

Jan, Hengtai; Tsai, Kuo-Ting; Kuo, Li-wei

2013-09-01

Chaotic systems have complex reactions against an external driving force; even in cases with low-dimension oscillators, the routes to synchronization are diverse. We proposed a stroboscope-based method for analyzing driven chaotic systems in their phase space. According to two statistic quantities generated from time series, we could realize the system state and the driving behavior simultaneously. We demonstrated our method in a driven bi-stable system, which showed complex period windows under a proper driving force. With increasing periodic driving force, a route from interior periodic oscillation to phase synchronization through the chaos state could be found. Periodic windows could also be identified and the circumstances under which they occurred distinguished. Statistical results were supported by conditional Lyapunov exponent analysis to show the power in analyzing the unknown time series.
Detection of crossover time scales in multifractal detrended fluctuation analysis

NASA Astrophysics Data System (ADS)

Ge, Erjia; Leung, Yee

2013-04-01

Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.
Progress of statistical analysis in biomedical research through the historical review of the development of the Framingham score.

PubMed

Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija

2017-12-02

The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.
Dendrimer-paclitaxel complexes for efficient treatment in ovarian cancer: study on OVCAR-3 and HEK293T cells.

PubMed

Yao, Hua; Ma, Jinqi

2018-01-01

The present paper investigates the enhancement of the therapeutic effect of Paclitaxel (a potent anticancer drug) by increasing its cellular uptake in the cancerous cells with subsequent reduction in its cytotoxic effects. To fulfill these goals the Paclitaxel (PTX)-Biotinylated PAMAM dendrimer complexes were prepared using biotinylation method. The primary parameter of Biotinylated PAMAM with a terminal HN 2 group - the degree of biotinylation - was evaluated using HABA assay. The basic integrity of the complex was studied using DSC. The Drug Loading (DL) and Drug Release (DR) parameters of Biotinylated PAMAM dendrimer-PTX complexes were also examined. Cellular uptake study was performed in OVCAR-3 and HEK293T cells using fluorescence technique. The statistical analysis was also performed to support the experimental data. The results obtained from HABA assay showed the complete biotinylation of PAMAM dendrimer. DSC study confirmed the integrity of the complex as compared with pure drug, biotinylated complex and their physical mixture. Batch 9 showed the highest DL (12.09%) and DR (70%) for 72 h as compared to different concentrations of drug and biotinylated complex. The OVCAR-3 (cancerous) cells were characterized by more intensive cellular uptake of the complexes than HEK293T (normal) cells. The obtained experimental results were supported by the statistical data. The results obtained from both experimental and statistical evaluation confirmed that the biotinylated PAMAM NH 2 dendrimer-PTX complex not only displays increased cellular uptake but has also enhanced release up to 72 h with the reduction in cytotoxicity.
Cost-Effectiveness Analysis: a proposal of new reporting standards in statistical analysis

PubMed Central

Bang, Heejung; Zhao, Hongwei

2014-01-01

Cost-effectiveness analysis (CEA) is a method for evaluating the outcomes and costs of competing strategies designed to improve health, and has been applied to a variety of different scientific fields. Yet, there are inherent complexities in cost estimation and CEA from statistical perspectives (e.g., skewness, bi-dimensionality, and censoring). The incremental cost-effectiveness ratio that represents the additional cost per one unit of outcome gained by a new strategy has served as the most widely accepted methodology in the CEA. In this article, we call for expanded perspectives and reporting standards reflecting a more comprehensive analysis that can elucidate different aspects of available data. Specifically, we propose that mean and median-based incremental cost-effectiveness ratios and average cost-effectiveness ratios be reported together, along with relevant summary and inferential statistics as complementary measures for informed decision making. PMID:24605979
Measuring Circulation Desk Activities Using a Random Alarm Mechanism.

ERIC Educational Resources Information Center

Mosborg, Stella Frank

1980-01-01

Reports a job analysis methodology to gather meaningful data related to circulation desk activity. The technique is designed to give librarians statistical data on actual time expenditures for complex and varying activities. (Author/RAA)
Evaluating measurement models in clinical research: covariance structure analysis of latent variable models of self-conception.

PubMed

Hoyle, R H

1991-02-01

Indirect measures of psychological constructs are vital to clinical research. On occasion, however, the meaning of indirect measures of psychological constructs is obfuscated by statistical procedures that do not account for the complex relations between items and latent variables and among latent variables. Covariance structure analysis (CSA) is a statistical procedure for testing hypotheses about the relations among items that indirectly measure a psychological construct and relations among psychological constructs. This article introduces clinical researchers to the strengths and limitations of CSA as a statistical procedure for conceiving and testing structural hypotheses that are not tested adequately with other statistical procedures. The article is organized around two empirical examples that illustrate the use of CSA for evaluating measurement models with correlated error terms, higher-order factors, and measured and latent variables.
Statistical benchmark for BosonSampling

NASA Astrophysics Data System (ADS)

Walschaers, Mattia; Kuipers, Jack; Urbina, Juan-Diego; Mayer, Klaus; Tichy, Malte Christopher; Richter, Klaus; Buchleitner, Andreas

2016-03-01

Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church-Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects.
Multilevel Structural Equation Models for the Analysis of Comparative Data on Educational Performance

ERIC Educational Resources Information Center

Goldstein, Harvey; Bonnet, Gerard; Rocher, Thierry

2007-01-01

The Programme for International Student Assessment comparative study of reading performance among 15-year-olds is reanalyzed using statistical procedures that allow the full complexity of the data structures to be explored. The article extends existing multilevel factor analysis and structural equation models and shows how this can extract richer…
A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data

PubMed Central

Zhu, Yun; Fan, Ruzong; Xiong, Momiao

2017-01-01

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274
Validating an Air Traffic Management Concept of Operation Using Statistical Modeling

NASA Technical Reports Server (NTRS)

He, Yuning; Davies, Misty Dawn

2013-01-01

Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis
Exploratory Visual Analysis of Statistical Results from Microarray Experiments Comparing High and Low Grade Glioma

PubMed Central

Reif, David M.; Israel, Mark A.; Moore, Jason H.

2007-01-01

The biological interpretation of gene expression microarray results is a daunting challenge. For complex diseases such as cancer, wherein the body of published research is extensive, the incorporation of expert knowledge provides a useful analytical framework. We have previously developed the Exploratory Visual Analysis (EVA) software for exploring data analysis results in the context of annotation information about each gene, as well as biologically relevant groups of genes. We present EVA as a flexible combination of statistics and biological annotation that provides a straightforward visual interface for the interpretation of microarray analyses of gene expression in the most commonly occuring class of brain tumors, glioma. We demonstrate the utility of EVA for the biological interpretation of statistical results by analyzing publicly available gene expression profiles of two important glial tumors. The results of a statistical comparison between 21 malignant, high-grade glioblastoma multiforme (GBM) tumors and 19 indolent, low-grade pilocytic astrocytomas were analyzed using EVA. By using EVA to examine the results of a relatively simple statistical analysis, we were able to identify tumor class-specific gene expression patterns having both statistical and biological significance. Our interactive analysis highlighted the potential importance of genes involved in cell cycle progression, proliferation, signaling, adhesion, migration, motility, and structure, as well as candidate gene loci on a region of Chromosome 7 that has been implicated in glioma. Because EVA does not require statistical or computational expertise and has the flexibility to accommodate any type of statistical analysis, we anticipate EVA will prove a useful addition to the repertoire of computational methods used for microarray data analysis. EVA is available at no charge to academic users and can be found at http://www.epistasis.org. PMID:19390666
Success rates of a skeletal anchorage system in orthodontics: A retrospective analysis.

PubMed

Lam, Raymond; Goonewardene, Mithran S; Allan, Brent P; Sugawara, Junji

2018-01-01

To evaluate the premise that skeletal anchorage with SAS miniplates are highly successful and predictable for a range of complex orthodontic movements. This retrospective cross-sectional analysis consisted of 421 bone plates placed by one clinician in 163 patients (95 female, 68 male, mean age 29.4 years ± 12.02). Simple descriptive statistics were performed for a wide range of malocclusions and desired movements to obtain success, complication, and failure rates. The success rate of skeletal anchorage system miniplates was 98.6%, where approximately 40% of cases experienced mild complications. The most common complication was soft tissue inflammation, which was amenable to focused oral hygiene and antiseptic rinses. Infection occurred in approximately 15% of patients where there was a statistically significant correlation with poor oral hygiene. The most common movements were distalization and intrusion of teeth. More than a third of the cases involved complex movements in more than one plane of space. The success rate of skeletal anchorage system miniplates is high and predictable for a wide range of complex orthodontic movements.
Reproducible research in vadose zone sciences

USDA-ARS?s Scientific Manuscript database

A significant portion of present-day soil and Earth science research is computational, involving complex data analysis pipelines, advanced mathematical and statistical models, and sophisticated computer codes. Opportunities for scientific progress are greatly diminished if reproducing and building o...
Visual analysis of geocoded twin data puts nature and nurture on the map.

PubMed

Davis, O S P; Haworth, C M A; Lewis, C M; Plomin, R

2012-09-01

Twin studies allow us to estimate the relative contributions of nature and nurture to human phenotypes by comparing the resemblance of identical and fraternal twins. Variation in complex traits is a balance of genetic and environmental influences; these influences are typically estimated at a population level. However, what if the balance of nature and nurture varies depending on where we grow up? Here we use statistical and visual analysis of geocoded data from over 6700 families to show that genetic and environmental contributions to 45 childhood cognitive and behavioral phenotypes vary geographically in the United Kingdom. This has implications for detecting environmental exposures that may interact with the genetic influences on complex traits, and for the statistical power of samples recruited for genetic association studies. More broadly, our experience demonstrates the potential for collaborative exploratory visualization to act as a lingua franca for large-scale interdisciplinary research.
Multivariate statistical analysis software technologies for astrophysical research involving large data bases

NASA Technical Reports Server (NTRS)

Djorgovski, George

1993-01-01

The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.
Statistical strategy for anisotropic adventitia modelling in IVUS.

PubMed

Gil, Debora; Hernández, Aura; Rodriguez, Oriol; Mauri, Josepa; Radeva, Petia

2006-06-01

Vessel plaque assessment by analysis of intravascular ultrasound sequences is a useful tool for cardiac disease diagnosis and intervention. Manual detection of luminal (inner) and media-adventitia (external) vessel borders is the main activity of physicians in the process of lumen narrowing (plaque) quantification. Difficult definition of vessel border descriptors, as well as, shades, artifacts, and blurred signal response due to ultrasound physical properties trouble automated adventitia segmentation. In order to efficiently approach such a complex problem, we propose blending advanced anisotropic filtering operators and statistical classification techniques into a vessel border modelling strategy. Our systematic statistical analysis shows that the reported adventitia detection achieves an accuracy in the range of interobserver variability regardless of plaque nature, vessel geometry, and incomplete vessel borders.

Multivariate statistical analysis software technologies for astrophysical research involving large data bases

NASA Technical Reports Server (NTRS)

Djorgovski, Stanislav

1992-01-01

The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.
DICON: interactive visual analysis of multidimensional clusters.

PubMed

Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin

2011-12-01

Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
Software for the Integration of Multiomics Experiments in Bioconductor.

PubMed

Ramos, Marcel; Schiffer, Lucas; Re, Angela; Azhar, Rimsha; Basunia, Azfar; Rodriguez, Carmen; Chan, Tiffany; Chapman, Phil; Davis, Sean R; Gomez-Cabrero, David; Culhane, Aedin C; Haibe-Kains, Benjamin; Hansen, Kasper D; Kodali, Hanish; Louis, Marie S; Mer, Arvind S; Riester, Markus; Morgan, Martin; Carey, Vince; Waldron, Levi

2017-11-01

Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR . ©2017 American Association for Cancer Research.
New dimensions from statistical graphics for GIS (geographic information system) analysis and interpretation

DOE Office of Scientific and Technical Information (OSTI.GOV)

McCord, R.A.; Olson, R.J.

1988-01-01

Environmental research and assessment activities at Oak Ridge National Laboratory (ORNL) include the analysis of spatial and temporal patterns of ecosystem response at a landscape scale. Analysis through use of geographic information system (GIS) involves an interaction between the user and thematic data sets frequently expressed as maps. A portion of GIS analysis has a mathematical or statistical aspect, especially for the analysis of temporal patterns. ARC/INFO is an excellent tool for manipulating GIS data and producing the appropriate map graphics. INFO also has some limited ability to produce statistical tabulation. At ORNL we have extended our capabilities by graphicallymore » interfacing ARC/INFO and SAS/GRAPH to provide a combined mapping and statistical graphics environment. With the data management, statistical, and graphics capabilities of SAS added to ARC/INFO, we have expanded the analytical and graphical dimensions of the GIS environment. Pie or bar charts, frequency curves, hydrographs, or scatter plots as produced by SAS can be added to maps from attribute data associated with ARC/INFO coverages. Numerous, small, simplified graphs can also become a source of complex map ''symbols.'' These additions extend the dimensions of GIS graphics to include time, details of the thematic composition, distribution, and interrelationships. 7 refs., 3 figs.« less
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures

NASA Astrophysics Data System (ADS)

Li, Quanbao; Wei, Fajie; Zhou, Shenghan

2017-05-01

The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Evidence of non-extensivity and complexity in the seismicity observed during 2011-2012 at the Santorini volcanic complex, Greece

NASA Astrophysics Data System (ADS)

Vallianatos, F.; Tzanis, A.; Michas, G.; Papadakis, G.

2012-04-01

Since the middle of summer 2011, an increase in the seismicity rates of the volcanic complex system of Santorini Island, Greece, was observed. In the present work, the temporal distribution of seismicity, as well as the magnitude distribution of earthquakes, have been studied using the concept of Non-Extensive Statistical Physics (NESP; Tsallis, 2009) along with the evolution of Shanon entropy H (also called information entropy). The analysis is based on the earthquake catalogue of the Geodynamic Institute of the National Observatory of Athens for the period July 2011-January 2012 (http://www.gein.noa.gr/). Non-Extensive Statistical Physics, which is a generalization of Boltzmann-Gibbs statistical physics, seems a suitable framework for studying complex systems. The observed distributions of seismicity rates at Santorini can be described (fitted) with NESP models to exceptionally well. This implies the inherent complexity of the Santorini volcanic seismicity, the applicability of NESP concepts to volcanic earthquake activity and the usefulness of NESP in investigating phenomena exhibiting multifractality and long-range coupling effects. Acknowledgments. This work was supported in part by the THALES Program of the Ministry of Education of Greece and the European Union in the framework of the project entitled "Integrated understanding of Seismicity, using innovative Methodologies of Fracture mechanics along with Earthquake and non extensive statistical physics - Application to the geodynamic system of the Hellenic Arc. SEISMO FEAR HELLARC". GM and GP wish to acknowledge the partial support of the Greek State Scholarships Foundation (ΙΚΥ).
Characterization of time series via Rényi complexity-entropy curves

NASA Astrophysics Data System (ADS)

Jauregui, M.; Zunino, L.; Lenzi, E. K.; Mendes, R. S.; Ribeiro, H. V.

2018-05-01

One of the most useful tools for distinguishing between chaotic and stochastic time series is the so-called complexity-entropy causality plane. This diagram involves two complexity measures: the Shannon entropy and the statistical complexity. Recently, this idea has been generalized by considering the Tsallis monoparametric generalization of the Shannon entropy, yielding complexity-entropy curves. These curves have proven to enhance the discrimination among different time series related to stochastic and chaotic processes of numerical and experimental nature. Here we further explore these complexity-entropy curves in the context of the Rényi entropy, which is another monoparametric generalization of the Shannon entropy. By combining the Rényi entropy with the proper generalization of the statistical complexity, we associate a parametric curve (the Rényi complexity-entropy curve) with a given time series. We explore this approach in a series of numerical and experimental applications, demonstrating the usefulness of this new technique for time series analysis. We show that the Rényi complexity-entropy curves enable the differentiation among time series of chaotic, stochastic, and periodic nature. In particular, time series of stochastic nature are associated with curves displaying positive curvature in a neighborhood of their initial points, whereas curves related to chaotic phenomena have a negative curvature; finally, periodic time series are represented by vertical straight lines.
Use of Statistical Analyses in the Ophthalmic Literature

PubMed Central

Lisboa, Renato; Meira-Freitas, Daniel; Tatham, Andrew J.; Marvasti, Amir H.; Sharpsten, Lucie; Medeiros, Felipe A.

2014-01-01

Purpose To identify the most commonly used statistical analyses in the ophthalmic literature and to determine the likely gain in comprehension of the literature that readers could expect if they were to sequentially add knowledge of more advanced techniques to their statistical repertoire. Design Cross-sectional study Methods All articles published from January 2012 to December 2012 in Ophthalmology, American Journal of Ophthalmology and Archives of Ophthalmology were reviewed. A total of 780 peer-reviewed articles were included. Two reviewers examined each article and assigned categories to each one depending on the type of statistical analyses used. Discrepancies between reviewers were resolved by consensus. Main Outcome Measures Total number and percentage of articles containing each category of statistical analysis were obtained. Additionally we estimated the accumulated number and percentage of articles that a reader would be expected to be able to interpret depending on their statistical repertoire. Results Readers with little or no statistical knowledge would be expected to be able to interpret the statistical methods presented in only 20.8% of articles. In order to understand more than half (51.4%) of the articles published, readers were expected to be familiar with at least 15 different statistical methods. Knowledge of 21 categories of statistical methods was necessary to comprehend 70.9% of articles, while knowledge of more than 29 categories was necessary to comprehend more than 90% of articles. Articles in retina and glaucoma subspecialties showed a tendency for using more complex analysis when compared to cornea. Conclusions Readers of clinical journals in ophthalmology need to have substantial knowledge of statistical methodology to understand the results of published studies in the literature. The frequency of use of complex statistical analyses also indicates that those involved in the editorial peer-review process must have sound statistical knowledge in order to critically appraise articles submitted for publication. The results of this study could provide guidance to direct the statistical learning of clinical ophthalmologists, researchers and educators involved in the design of courses for residents and medical students. PMID:24612977
A Preliminary Bayesian Analysis of Incomplete Longitudinal Data from a Small Sample: Methodological Advances in an International Comparative Study of Educational Inequality

ERIC Educational Resources Information Center

Hsieh, Chueh-An; Maier, Kimberly S.

2009-01-01

The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection.…
Temporal Comparisons of Internet Topology

DTIC Science & Technology

2014-06-01

Number CAIDA Cooperative Association of Internet Data Analysis CDN Content Delivery Network CI Confidence Interval DoS denial of service GMT Greenwich...the CAIDA data. Our methods include analysis of graph theoretical measures as well as complex network and statistical measures that will quantify the...tool that probes the Internet for topology analysis and performance [26]. Scamper uses network diagnostic tools, such as traceroute and ping, to probe
Application of a data-mining method based on Bayesian networks to lesion-deficit analysis

NASA Technical Reports Server (NTRS)

Herskovits, Edward H.; Gerring, Joan P.

2003-01-01

Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.
Analysis and meta-analysis of single-case designs: an introduction.

PubMed

Shadish, William R

2014-04-01

The last 10 years have seen great progress in the analysis and meta-analysis of single-case designs (SCDs). This special issue includes five articles that provide an overview of current work on that topic, including standardized mean difference statistics, multilevel models, Bayesian statistics, and generalized additive models. Each article analyzes a common example across articles and presents syntax or macros for how to do them. These articles are followed by commentaries from single-case design researchers and journal editors. This introduction briefly describes each article and then discusses several issues that must be addressed before we can know what analyses will eventually be best to use in SCD research. These issues include modeling trend, modeling error covariances, computing standardized effect size estimates, assessing statistical power, incorporating more accurate models of outcome distributions, exploring whether Bayesian statistics can improve estimation given the small samples common in SCDs, and the need for annotated syntax and graphical user interfaces that make complex statistics accessible to SCD researchers. The article then discusses reasons why SCD researchers are likely to incorporate statistical analyses into their research more often in the future, including changing expectations and contingencies regarding SCD research from outside SCD communities, changes and diversity within SCD communities, corrections of erroneous beliefs about the relationship between SCD research and statistics, and demonstrations of how statistics can help SCD researchers better meet their goals. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Habitat Complexity in Aquatic Microcosms Affects Processes Driven by Detritivores

PubMed Central

Flores, Lorea; Bailey, R. A.; Elosegi, Arturo; Larrañaga, Aitor; Reiss, Julia

2016-01-01

Habitat complexity can influence predation rates (e.g. by providing refuge) but other ecosystem processes and species interactions might also be modulated by the properties of habitat structure. Here, we focussed on how complexity of artificial habitat (plastic plants), in microcosms, influenced short-term processes driven by three aquatic detritivores. The effects of habitat complexity on leaf decomposition, production of fine organic matter and pH levels were explored by measuring complexity in three ways: 1. as the presence vs. absence of habitat structure; 2. as the amount of structure (3 or 4.5 g of plastic plants); and 3. as the spatial configuration of structures (measured as fractal dimension). The experiment also addressed potential interactions among the consumers by running all possible species combinations. In the experimental microcosms, habitat complexity influenced how species performed, especially when comparing structure present vs. structure absent. Treatments with structure showed higher fine particulate matter production and lower pH compared to treatments without structures and this was probably due to higher digestion and respiration when structures were present. When we explored the effects of the different complexity levels, we found that the amount of structure added explained more than the fractal dimension of the structures. We give a detailed overview of the experimental design, statistical models and R codes, because our statistical analysis can be applied to other study systems (and disciplines such as restoration ecology). We further make suggestions of how to optimise statistical power when artificially assembling, and analysing, ‘habitat complexity’ by not confounding complexity with the amount of structure added. In summary, this study highlights the importance of habitat complexity for energy flow and the maintenance of ecosystem processes in aquatic ecosystems. PMID:27802267
Power-law statistics of neurophysiological processes analyzed using short signals

NASA Astrophysics Data System (ADS)

Pavlova, Olga N.; Runnova, Anastasiya E.; Pavlov, Alexey N.

2018-04-01

We discuss the problem of quantifying power-law statistics of complex processes from short signals. Based on the analysis of electroencephalograms (EEG) we compare three interrelated approaches which enable characterization of the power spectral density (PSD) and show that an application of the detrended fluctuation analysis (DFA) or the wavelet-transform modulus maxima (WTMM) method represents a useful way of indirect characterization of the PSD features from short data sets. We conclude that despite DFA- and WTMM-based measures can be obtained from the estimated PSD, these tools outperform the standard spectral analysis when characterization of the analyzed regime should be provided based on a very limited amount of data.
A comparison of spectral magnitude and phase-locking value analyses of the frequency-following response to complex tones

PubMed Central

Zhu, Li; Bharadwaj, Hari; Xia, Jing; Shinn-Cunningham, Barbara

2013-01-01

Two experiments, both presenting diotic, harmonic tone complexes (100 Hz fundamental), were conducted to explore the envelope-related component of the frequency-following response (FFRENV), a measure of synchronous, subcortical neural activity evoked by a periodic acoustic input. Experiment 1 directly compared two common analysis methods, computing the magnitude spectrum and the phase-locking value (PLV). Bootstrapping identified which FFRENV frequency components were statistically above the noise floor for each metric and quantified the statistical power of the approaches. Across listeners and conditions, the two methods produced highly correlated results. However, PLV analysis required fewer processing stages to produce readily interpretable results. Moreover, at the fundamental frequency of the input, PLVs were farther above the metric's noise floor than spectral magnitudes. Having established the advantages of PLV analysis, the efficacy of the approach was further demonstrated by investigating how different acoustic frequencies contribute to FFRENV, analyzing responses to complex tones composed of different acoustic harmonics of 100 Hz (Experiment 2). Results show that the FFRENV response is dominated by peripheral auditory channels responding to unresolved harmonics, although low-frequency channels driven by resolved harmonics also contribute. These results demonstrate the utility of the PLV for quantifying the strength of FFRENV across conditions. PMID:23862815
Finite Element Analysis of Reverberation Chambers

NASA Technical Reports Server (NTRS)

Bunting, Charles F.; Nguyen, Duc T.

2000-01-01

The primary motivating factor behind the initiation of this work was to provide a deterministic means of establishing the validity of the statistical methods that are recommended for the determination of fields that interact in -an avionics system. The application of finite element analysis to reverberation chambers is the initial step required to establish a reasonable course of inquiry in this particularly data-intensive study. The use of computational electromagnetics provides a high degree of control of the "experimental" parameters that can be utilized in a simulation of reverberating structures. As the work evolved there were four primary focus areas they are: 1. The eigenvalue problem for the source free problem. 2. The development of a complex efficient eigensolver. 3. The application of a source for the TE and TM fields for statistical characterization. 4. The examination of shielding effectiveness in a reverberating environment. One early purpose of this work was to establish the utility of finite element techniques in the development of an extended low frequency statistical model for reverberation phenomena. By employing finite element techniques, structures of arbitrary complexity can be analyzed due to the use of triangular shape functions in the spatial discretization. The effects of both frequency stirring and mechanical stirring are presented. It is suggested that for the low frequency operation the typical tuner size is inadequate to provide a sufficiently random field and that frequency stirring should be used. The results of the finite element analysis of the reverberation chamber illustrate io-W the potential utility of a 2D representation for enhancing the basic statistical characteristics of the chamber when operating in a low frequency regime. The basic field statistics are verified for frequency stirring over a wide range of frequencies. Mechanical stirring is shown to provide an effective frequency deviation.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

PubMed

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-07-01

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

PubMed Central

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-01-01

Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689
Developing a complex independent component analysis technique to extract non-stationary patterns from geophysical time-series

NASA Astrophysics Data System (ADS)

Forootan, Ehsan; Kusche, Jürgen

2016-04-01

Geodetic/geophysical observations, such as the time series of global terrestrial water storage change or sea level and temperature change, represent samples of physical processes and therefore contain information about complex physical interactionswith many inherent time scales. Extracting relevant information from these samples, for example quantifying the seasonality of a physical process or its variability due to large-scale ocean-atmosphere interactions, is not possible by rendering simple time series approaches. In the last decades, decomposition techniques have found increasing interest for extracting patterns from geophysical observations. Traditionally, principal component analysis (PCA) and more recently independent component analysis (ICA) are common techniques to extract statistical orthogonal (uncorrelated) and independent modes that represent the maximum variance of observations, respectively. PCA and ICA can be classified as stationary signal decomposition techniques since they are based on decomposing the auto-covariance matrix or diagonalizing higher (than two)-order statistical tensors from centered time series. However, the stationary assumption is obviously not justifiable for many geophysical and climate variables even after removing cyclic components e.g., the seasonal cycles. In this paper, we present a new decomposition method, the complex independent component analysis (CICA, Forootan, PhD-2014), which can be applied to extract to non-stationary (changing in space and time) patterns from geophysical time series. Here, CICA is derived as an extension of real-valued ICA (Forootan and Kusche, JoG-2012), where we (i) define a new complex data set using a Hilbert transformation. The complex time series contain the observed values in their real part, and the temporal rate of variability in their imaginary part. (ii) An ICA algorithm based on diagonalization of fourth-order cumulants is then applied to decompose the new complex data set in (i). (iii) Dominant non-stationary patterns are recognized as independent complex patterns that can be used to represent the space and time amplitude and phase propagations. We present the results of CICA on simulated and real cases e.g., for quantifying the impact of large-scale ocean-atmosphere interaction on global mass changes. Forootan (PhD-2014) Statistical signal decomposition techniques for analyzing time-variable satellite gravimetry data, PhD Thesis, University of Bonn, http://hss.ulb.uni-bonn.de/2014/3766/3766.htm Forootan and Kusche (JoG-2012) Separation of global time-variable gravity signals into maximally independent components, Journal of Geodesy 86 (7), 477-497, doi: 10.1007/s00190-011-0532-5
Classical Statistics and Statistical Learning in Imaging Neuroscience

PubMed Central

Bzdok, Danilo

2017-01-01

Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

Data mining and statistical inference in selective laser melting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kamath, Chandrika

Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Data mining and statistical inference in selective laser melting

DOE PAGES

Kamath, Chandrika

2016-01-11

Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Managing Complexity in Evidence Analysis: A Worked Example in Pediatric Weight Management.

PubMed

Parrott, James Scott; Henry, Beverly; Thompson, Kyle L; Ziegler, Jane; Handu, Deepa

2018-05-02

Nutrition interventions are often complex and multicomponent. Typical approaches to meta-analyses that focus on individual causal relationships to provide guideline recommendations are not sufficient to capture this complexity. The objective of this study is to describe the method of meta-analysis used for the Pediatric Weight Management (PWM) Guidelines update and provide a worked example that can be applied in other areas of dietetics practice. The effects of PWM interventions were examined for body mass index (BMI), body mass index z-score (BMIZ), and waist circumference at four different time periods. For intervention-level effects, intervention types were identified empirically using multiple correspondence analysis paired with cluster analysis. Pooled effects of identified types were examined using random effects meta-analysis models. Differences in effects among types were examined using meta-regression. Context-level effects are examined using qualitative comparative analysis. Three distinct types (or families) of PWM interventions were identified: medical nutrition, behavioral, and missing components. Medical nutrition and behavioral types showed statistically significant improvements in BMIZ across all time points. Results were less consistent for BMI and waist circumference, although four distinct patterns of weight status change were identified. These varied by intervention type as well as outcome measure. Meta-regression indicated statistically significant differences between the medical nutrition and behavioral types vs the missing component type for both BMIZ and BMI, although the pattern varied by time period and intervention type. Qualitative comparative analysis identified distinct configurations of context characteristics at each time point that were consistent with positive outcomes among the intervention types. Although analysis of individual causal relationships is invaluable, this approach is inadequate to capture the complexity of dietetics practice. An alternative approach that integrates intervention-level with context-level meta-analyses may provide deeper understanding in the development of practice guidelines. Copyright © 2018 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

PubMed

Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

2013-03-23

Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Compositional Solution Space Quantification for Probabilistic Software Analysis

NASA Technical Reports Server (NTRS)

Borges, Mateus; Pasareanu, Corina S.; Filieri, Antonio; d'Amorim, Marcelo; Visser, Willem

2014-01-01

Probabilistic software analysis aims at quantifying how likely a target event is to occur during program execution. Current approaches rely on symbolic execution to identify the conditions to reach the target event and try to quantify the fraction of the input domain satisfying these conditions. Precise quantification is usually limited to linear constraints, while only approximate solutions can be provided in general through statistical approaches. However, statistical approaches may fail to converge to an acceptable accuracy within a reasonable time. We present a compositional statistical approach for the efficient quantification of solution spaces for arbitrarily complex constraints over bounded floating-point domains. The approach leverages interval constraint propagation to improve the accuracy of the estimation by focusing the sampling on the regions of the input domain containing the sought solutions. Preliminary experiments show significant improvement on previous approaches both in results accuracy and analysis time.
Characterizing Tityus discrepans scorpion venom from a fractal perspective: Venom complexity, effects of captivity, sexual dimorphism, differences among species.

PubMed

D'Suze, Gina; Sandoval, Moisés; Sevcik, Carlos

2015-12-15

A characteristic of venom elution patterns, shared with many other complex systems, is that many their features cannot be properly described with statistical or euclidean concepts. The understanding of such systems became possible with Mandelbrot's fractal analysis. Venom elution patterns were produced using the reversed phase high performance liquid chromatography (HPLC) with 1 mg of venom. One reason for the lack of quantitative analyses of the sources of venom variability is parametrizing the venom chromatograms' complexity. We quantize this complexity by means of an algorithm which estimates the contortedness (Q) of a waveform. Fractal analysis was used to compare venoms and to measure inter- and intra-specific venom variability. We studied variations in venom complexity derived from gender, seasonal and environmental factors, duration of captivity in the laboratory, technique used to milk venom. Copyright © 2015 Elsevier Ltd. All rights reserved.
Analysis of Immune Complex Structure by Statistical Mechanics and Light Scattering Techniques.

NASA Astrophysics Data System (ADS)

Busch, Nathan Adams

1995-01-01

The size and structure of immune complexes determine their behavior in the immune system. The chemical physics of the complex formation is not well understood; this is due in part to inadequate characterization of the proteins involved, and in part by lack of sufficiently well developed theoretical techniques. Understanding the complex formation will permit rational design of strategies for inhibiting tissue deposition of the complexes. A statistical mechanical model of the proteins based upon the theory of associating fluids was developed. The multipole electrostatic potential for each protein used in this study was characterized for net protein charge, dipole moment magnitude, and dipole moment direction. The binding sites, between the model antigen and antibodies, were characterized for their net surface area, energy, and position relative to the dipole moment of the protein. The equilibrium binding graphs generated with the protein statistical mechanical model compares favorably with experimental data obtained from radioimmunoassay results. The isothermal compressibility predicted by the model agrees with results obtained from dynamic light scattering. The statistical mechanics model was used to investigate association between the model antigen and selected pairs of antibodies. It was found that, in accordance to expectations from thermodynamic arguments, the highest total binding energy yielded complex distributions which were skewed to higher complex size. From examination of the simulated formation of ring structures from linear chain complexes, and from the joint shape probability surfaces, it was found that ring configurations were formed by the "folding" of linear chains until the ends are within binding distance. By comparing the single antigen/two antibody system which differ only in their respective binding site locations, it was found that binding site location influences complex size and shape distributions only when ring formation occurs. The internal potential energy of a ring complex is considerably less than that of the non-associating system; therefore the ring complexes are quite stable and show no evidence of breaking, and collapsing into smaller complexes. The ring formation will occur only in systems where the total free energy of each complex may be minimized. Thus, ring formation will occur even though entropically unfavorable conformations result if the total free energy can be minimized by doing so.
On the Measurement of Morphology and Its Change.

DTIC Science & Technology

1982-03-01

that transitions within the ceratopsian dinosaurs necessitated overly complex or impossible grid deformations. THETA-RHO ANALYSIS An alternative...1982, A robust comparison of the three dimensional configurations of protein molelcules. Tech. Rpt. 224, Ser. 2, Dept. Statistics Princeton Univ., 18 p
Physics-based statistical learning approach to mesoscopic model selection.

PubMed

Taverniers, Søren; Haut, Terry S; Barros, Kipton; Alexander, Francis J; Lookman, Turab

2015-11-01

In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.
DnaSAM: Software to perform neutrality testing for large datasets with complex null models.

PubMed

Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B

2010-05-01

Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.
Valid Statistical Analysis for Logistic Regression with Multiple Sources

NASA Astrophysics Data System (ADS)

Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
Rare-Variant Association Analysis: Study Designs and Statistical Tests

PubMed Central

Lee, Seunggeung; Abecasis, Gonçalo R.; Boehnke, Michael; Lin, Xihong

2014-01-01

Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions. PMID:24995866
Probabilistic Graphical Model Representation in Phylogenetics

PubMed Central

Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

2014-01-01

Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
Temporal scaling and spatial statistical analyses of groundwater level fluctuations

NASA Astrophysics Data System (ADS)

Sun, H.; Yuan, L., Sr.; Zhang, Y.

2017-12-01

Natural dynamics such as groundwater level fluctuations can exhibit multifractionality and/or multifractality due likely to multi-scale aquifer heterogeneity and controlling factors, whose statistics requires efficient quantification methods. This study explores multifractionality and non-Gaussian properties in groundwater dynamics expressed by time series of daily level fluctuation at three wells located in the lower Mississippi valley, after removing the seasonal cycle in the temporal scaling and spatial statistical analysis. First, using the time-scale multifractional analysis, a systematic statistical method is developed to analyze groundwater level fluctuations quantified by the time-scale local Hurst exponent (TS-LHE). Results show that the TS-LHE does not remain constant, implying the fractal-scaling behavior changing with time and location. Hence, we can distinguish the potentially location-dependent scaling feature, which may characterize the hydrology dynamic system. Second, spatial statistical analysis shows that the increment of groundwater level fluctuations exhibits a heavy tailed, non-Gaussian distribution, which can be better quantified by a Lévy stable distribution. Monte Carlo simulations of the fluctuation process also show that the linear fractional stable motion model can well depict the transient dynamics (i.e., fractal non-Gaussian property) of groundwater level, while fractional Brownian motion is inadequate to describe natural processes with anomalous dynamics. Analysis of temporal scaling and spatial statistics therefore may provide useful information and quantification to understand further the nature of complex dynamics in hydrology.
Statistical Analysis of the First Passage Path Ensemble of Jump Processes

NASA Astrophysics Data System (ADS)

von Kleist, Max; Schütte, Christof; Zhang, Wei

2018-02-01

The transition mechanism of jump processes between two different subsets in state space reveals important dynamical information of the processes and therefore has attracted considerable attention in the past years. In this paper, we study the first passage path ensemble of both discrete-time and continuous-time jump processes on a finite state space. The main approach is to divide each first passage path into nonreactive and reactive segments and to study them separately. The analysis can be applied to jump processes which are non-ergodic, as well as continuous-time jump processes where the waiting time distributions are non-exponential. In the particular case that the jump processes are both Markovian and ergodic, our analysis elucidates the relations between the study of the first passage paths and the study of the transition paths in transition path theory. We provide algorithms to numerically compute statistics of the first passage path ensemble. The computational complexity of these algorithms scales with the complexity of solving a linear system, for which efficient methods are available. Several examples demonstrate the wide applicability of the derived results across research areas.
An Investigation of the Variety and Complexity of Statistical Methods Used in Current Internal Medicine Literature.

PubMed

Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth

2015-10-01

Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations. These papers used 128 statistical terms and context-defined concepts, including some from data analysis (56), epidemiology-biostatistics (31), modeling (24), data collection (12), and meta-analysis (5). Ten different software programs were used in these articles. Based on usual undergraduate and graduate statistics curricula, 64.3% of the concepts and methods used in these papers required at least a master's degree-level statistics education. The interpretation of the current medical literature can require an extensive background in statistical methods at an education level exceeding the material and resources provided to most medical students and residents. Given the complexity and time pressure of medical education, these deficiencies will be hard to correct, but this project can serve as a basis for developing a curriculum in study design and statistical methods needed by physicians-in-training.
A simple and fast representation space for classifying complex time series

NASA Astrophysics Data System (ADS)

Zunino, Luciano; Olivares, Felipe; Bariviera, Aurelio F.; Rosso, Osvaldo A.

2017-03-01

In the context of time series analysis considerable effort has been directed towards the implementation of efficient discriminating statistical quantifiers. Very recently, a simple and fast representation space has been introduced, namely the number of turning points versus the Abbe value. It is able to separate time series from stationary and non-stationary processes with long-range dependences. In this work we show that this bidimensional approach is useful for distinguishing complex time series: different sets of financial and physiological data are efficiently discriminated. Additionally, a multiscale generalization that takes into account the multiple time scales often involved in complex systems has been also proposed. This multiscale analysis is essential to reach a higher discriminative power between physiological time series in health and disease.
Recurrence Density Enhanced Complex Networks for Nonlinear Time Series Analysis

NASA Astrophysics Data System (ADS)

Costa, Diego G. De B.; Reis, Barbara M. Da F.; Zou, Yong; Quiles, Marcos G.; Macau, Elbert E. N.

We introduce a new method, which is entitled Recurrence Density Enhanced Complex Network (RDE-CN), to properly analyze nonlinear time series. Our method first transforms a recurrence plot into a figure of a reduced number of points yet preserving the main and fundamental recurrence properties of the original plot. This resulting figure is then reinterpreted as a complex network, which is further characterized by network statistical measures. We illustrate the computational power of RDE-CN approach by time series by both the logistic map and experimental fluid flows, which show that our method distinguishes different dynamics sufficiently well as the traditional recurrence analysis. Therefore, the proposed methodology characterizes the recurrence matrix adequately, while using a reduced set of points from the original recurrence plots.
Socioscape: Real-Time Analysis of Dynamic Heterogeneous Networks In Complex Socio-Cultural Systems

DTIC Science & Technology

2015-10-22

Cluster Mixed-Membership Blockmodel for Time-Evolving Networks, Proceedings of the 14th International Conference on Artifical Intelligence and...Learning With Simultaneous Orthogonal Matching Pursuit, Proceedings of the 13th International Conference on Artifical Intelligence and Statistics
Macro-Econophysics

NASA Astrophysics Data System (ADS)

Aoyama, Hideaki; Fujiwara, Yoshi; Ikeda, Yuichi; Iyetomi, Hiroshi; Souma, Wataru; Yoshikawa, Hiroshi

2017-07-01

Preface; Foreword, Acknowledgements, List of tables; List of figures, prologue, 1. Introduction: reconstructing macroeconomics; 2. Basic concepts in statistical physics and stochastic models; 3. Income and firm-size distributions; 4. Productivity distribution and related topics; 5. Multivariate time-series analysis; 6. Business cycles; 7. Price dynamics and inflation/deflation; 8. Complex network, community analysis, visualization; 9. Systemic risks; Appendix A: computer program for beginners; Epilogue; Bibliography; Index.

Methods of Information Geometry to model complex shapes

NASA Astrophysics Data System (ADS)

De Sanctis, A.; Gattone, S. A.

2016-09-01

In this paper, a new statistical method to model patterns emerging in complex systems is proposed. A framework for shape analysis of 2- dimensional landmark data is introduced, in which each landmark is represented by a bivariate Gaussian distribution. From Information Geometry we know that Fisher-Rao metric endows the statistical manifold of parameters of a family of probability distributions with a Riemannian metric. Thus this approach allows to reconstruct the intermediate steps in the evolution between observed shapes by computing the geodesic, with respect to the Fisher-Rao metric, between the corresponding distributions. Furthermore, the geodesic path can be used for shape predictions. As application, we study the evolution of the rat skull shape. A future application in Ophthalmology is introduced.
mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.

PubMed

Teo, Guoshou; Kim, Sinae; Tsou, Chih-Chiang; Collins, Ben; Gingras, Anne-Claude; Nesvizhskii, Alexey I; Choi, Hyungwon

2015-11-03

Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Technical Note: The Initial Stages of Statistical Data Analysis

PubMed Central

Tandy, Richard D.

1998-01-01

Objective: To provide an overview of several important data-related considerations in the design stage of a research project and to review the levels of measurement and their relationship to the statistical technique chosen for the data analysis. Background: When planning a study, the researcher must clearly define the research problem and narrow it down to specific, testable questions. The next steps are to identify the variables in the study, decide how to group and treat subjects, and determine how to measure, and the underlying level of measurement of, the dependent variables. Then the appropriate statistical technique can be selected for data analysis. Description: The four levels of measurement in increasing complexity are nominal, ordinal, interval, and ratio. Nominal data are categorical or “count” data, and the numbers are treated as labels. Ordinal data can be ranked in a meaningful order by magnitude. Interval data possess the characteristics of ordinal data and also have equal distances between levels. Ratio data have a natural zero point. Nominal and ordinal data are analyzed with nonparametric statistical techniques and interval and ratio data with parametric statistical techniques. Advantages: Understanding the four levels of measurement and when it is appropriate to use each is important in determining which statistical technique to use when analyzing data. PMID:16558489
Semantic Annotation of Complex Text Structures in Problem Reports

NASA Technical Reports Server (NTRS)

Malin, Jane T.; Throop, David R.; Fleming, Land D.

2011-01-01

Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.
Use of Multiscale Entropy to Facilitate Artifact Detection in Electroencephalographic Signals

PubMed Central

Mariani, Sara; Borges, Ana F. T.; Henriques, Teresa; Goldberger, Ary L.; Costa, Madalena D.

2016-01-01

Electroencephalographic (EEG) signals present a myriad of challenges to analysis, beginning with the detection of artifacts. Prior approaches to noise detection have utilized multiple techniques, including visual methods, independent component analysis and wavelets. However, no single method is broadly accepted, inviting alternative ways to address this problem. Here, we introduce a novel approach based on a statistical physics method, multiscale entropy (MSE) analysis, which quantifies the complexity of a signal. We postulate that noise corrupted EEG signals have lower information content, and, therefore, reduced complexity compared with their noise free counterparts. We test the new method on an open-access database of EEG signals with and without added artifacts due to electrode motion. PMID:26738116
A functional U-statistic method for association analysis of sequencing data.

PubMed

Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

2017-11-01

Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.
Comparison of diagnostic capability of macular ganglion cell complex and retinal nerve fiber layer among primary open angle glaucoma, ocular hypertension, and normal population using Fourier-domain optical coherence tomography and determining their functional correlation in Indian population

PubMed Central

Barua, Nabanita; Sitaraman, Chitra; Goel, Sonu; Chakraborti, Chandana; Mukherjee, Sonai; Parashar, Hemandra

2016-01-01

Context: Analysis of diagnostic ability of macular ganglionic cell complex and retinal nerve fiber layer (RNFL) in glaucoma. Aim: To correlate functional and structural parameters and comparing predictive value of each of the structural parameters using Fourier-domain (FD) optical coherence tomography (OCT) among primary open angle glaucoma (POAG) and ocular hypertension (OHT) versus normal population. Setting and Design: Single centric, cross-sectional study done in 234 eyes. Materials and Methods: Patients were enrolled in three groups: POAG, ocular hypertensive and normal (40 patients in each group). After comprehensive ophthalmological examination, patients underwent standard automated perimetry and FD-OCT scan in optic nerve head and ganglion cell mode. The relationship was assessed by correlating ganglion cell complex (GCC) parameters with mean deviation. Results were compared with RNFL parameters. Statistical Analysis: Data were analyzed with SPSS, analysis of variance, t-test, Pearson's coefficient, and receiver operating curve. Results: All parameters showed strong correlation with visual field (P < 0.001). Inferior GCC had highest area under curve (AUC) for detecting glaucoma (0.827) in POAG from normal population. However, the difference was not statistically significant (P > 0.5) when compared with other parameters. None of the parameters showed significant diagnostic capability to detect OHT from normal population. In diagnosing early glaucoma from OHT and normal population, only inferior GCC had statistically significant AUC value (0.715). Conclusion: In this study, GCC and RNFL parameters showed equal predictive capability in perimetric versus normal group. In early stage, inferior GCC was the best parameter. In OHT population, single day cross-sectional imaging was not valuable. PMID:27221682
A Longitudinal Analysis of the Influence of a Peer Run Warm Line Phone Service on Psychiatric Recovery.

PubMed

Dalgin, Rebecca Spirito; Dalgin, M Halim; Metzger, Scott J

2018-05-01

This article focuses on the impact of a peer run warm line as part of the psychiatric recovery process. It utilized data including the Recovery Assessment Scale, community integration measures and crisis service usage. Longitudinal statistical analysis was completed on 48 sets of data from 2011, 2012, and 2013. Although no statistically significant differences were observed for the RAS score, community integration data showed increases in visits to primary care doctors, leisure/recreation activities and socialization with others. This study highlights the complexity of psychiatric recovery and that nonclinical peer services like peer run warm lines may be critical to the process.
Evaluating Cellular Polyfunctionality with a Novel Polyfunctionality Index

PubMed Central

Larsen, Martin; Sauce, Delphine; Arnaud, Laurent; Fastenackels, Solène; Appay, Victor; Gorochov, Guy

2012-01-01

Functional evaluation of naturally occurring or vaccination-induced T cell responses in mice, men and monkeys has in recent years advanced from single-parameter (e.g. IFN-γ-secretion) to much more complex multidimensional measurements. Co-secretion of multiple functional molecules (such as cytokines and chemokines) at the single-cell level is now measurable due primarily to major advances in multiparametric flow cytometry. The very extensive and complex datasets generated by this technology raise the demand for proper analytical tools that enable the analysis of combinatorial functional properties of T cells, hence polyfunctionality. Presently, multidimensional functional measures are analysed either by evaluating all combinations of parameters individually or by summing frequencies of combinations that include the same number of simultaneous functions. Often these evaluations are visualized as pie charts. Whereas pie charts effectively represent and compare average polyfunctionality profiles of particular T cell subsets or patient groups, they do not document the degree or variation of polyfunctionality within a group nor does it allow more sophisticated statistical analysis. Here we propose a novel polyfunctionality index that numerically evaluates the degree and variation of polyfuntionality, and enable comparative and correlative parametric and non-parametric statistical tests. Moreover, it allows the usage of more advanced statistical approaches, such as cluster analysis. We believe that the polyfunctionality index will render polyfunctionality an appropriate end-point measure in future studies of T cell responsiveness. PMID:22860124
Use of Multivariate Linkage Analysis for Dissection of a Complex Cognitive Trait

PubMed Central

Marlow, Angela J.; Fisher, Simon E.; Francks, Clyde; MacPhie, I. Laurence; Cherny, Stacey S.; Richardson, Alex J.; Talcott, Joel B.; Stein, John F.; Monaco, Anthony P.; Cardon, Lon R.

2003-01-01

Replication of linkage results for complex traits has been exceedingly difficult, owing in part to the inability to measure the precise underlying phenotype, small sample sizes, genetic heterogeneity, and statistical methods employed in analysis. Often, in any particular study, multiple correlated traits have been collected, yet these have been analyzed independently or, at most, in bivariate analyses. Theoretical arguments suggest that full multivariate analysis of all available traits should offer more power to detect linkage; however, this has not yet been evaluated on a genomewide scale. Here, we conduct multivariate genomewide analyses of quantitative-trait loci that influence reading- and language-related measures in families affected with developmental dyslexia. The results of these analyses are substantially clearer than those of previous univariate analyses of the same data set, helping to resolve a number of key issues. These outcomes highlight the relevance of multivariate analysis for complex disorders for dissection of linkage results in correlated traits. The approach employed here may aid positional cloning of susceptibility genes in a wide spectrum of complex traits. PMID:12587094
Complex Network Analysis for Characterizing Global Value Chains in Equipment Manufacturing.

PubMed

Xiao, Hao; Sun, Tianyang; Meng, Bo; Cheng, Lihong

2017-01-01

The rise of global value chains (GVCs) characterized by the so-called "outsourcing", "fragmentation production", and "trade in tasks" has been considered one of the most important phenomena for the 21st century trade. GVCs also can play a decisive role in trade policy making. However, due to the increasing complexity and sophistication of international production networks, especially in the equipment manufacturing industry, conventional trade statistics and the corresponding trade indicators may give us a distorted picture of trade. This paper applies various network analysis tools to the new GVC accounting system proposed by Koopman et al. (2014) and Wang et al. (2013) in which gross exports can be decomposed into value-added terms through various routes along GVCs. This helps to divide the equipment manufacturing-related GVCs into some sub-networks with clear visualization. The empirical results of this paper significantly improve our understanding of the topology of equipment manufacturing-related GVCs as well as the interdependency of countries in these GVCs that is generally invisible from the traditional trade statistics.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

PubMed

Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

2013-08-08

Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Multivariate analysis of heavy metal contamination using river sediment cores of Nankan River, northern Taiwan

NASA Astrophysics Data System (ADS)

Lee, An-Sheng; Lu, Wei-Li; Huang, Jyh-Jaan; Chang, Queenie; Wei, Kuo-Yen; Lin, Chin-Jung; Liou, Sofia Ya Hsuan

2016-04-01

Through the geology and climate characteristic in Taiwan, generally rivers carry a lot of suspended particles. After these particles settled, they become sediments which are good sorbent for heavy metals in river system. Consequently, sediments can be found recording contamination footprint at low flow energy region, such as estuary. Seven sediment cores were collected along Nankan River, northern Taiwan, which is seriously contaminated by factory, household and agriculture input. Physico-chemical properties of these cores were derived from Itrax-XRF Core Scanner and grain size analysis. In order to interpret these complex data matrices, the multivariate statistical techniques (cluster analysis, factor analysis and discriminant analysis) were introduced to this study. Through the statistical determination, the result indicates four types of sediment. One of them represents contamination event which shows high concentration of Cu, Zn, Pb, Ni and Fe, and low concentration of Si and Zr. Furthermore, three possible contamination sources of this type of sediment were revealed by Factor Analysis. The combination of sediment analysis and multivariate statistical techniques used provides new insights into the contamination depositional history of Nankan River and could be similarly applied to other river systems to determine the scale of anthropogenic contamination.
Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions.

PubMed

Vernon, Ian; Liu, Junli; Goldstein, Michael; Rowe, James; Topping, Jen; Lindsey, Keith

2018-01-02

Many mathematical models have now been employed across every area of systems biology. These models increasingly involve large numbers of unknown parameters, have complex structure which can result in substantial evaluation time relative to the needs of the analysis, and need to be compared to observed data of various forms. The correct analysis of such models usually requires a global parameter search, over a high dimensional parameter space, that incorporates and respects the most important sources of uncertainty. This can be an extremely difficult task, but it is essential for any meaningful inference or prediction to be made about any biological system. It hence represents a fundamental challenge for the whole of systems biology. Bayesian statistical methodology for the uncertainty analysis of complex models is introduced, which is designed to address the high dimensional global parameter search problem. Bayesian emulators that mimic the systems biology model but which are extremely fast to evaluate are embeded within an iterative history match: an efficient method to search high dimensional spaces within a more formal statistical setting, while incorporating major sources of uncertainty. The approach is demonstrated via application to a model of hormonal crosstalk in Arabidopsis root development, which has 32 rate parameters, for which we identify the sets of rate parameter values that lead to acceptable matches between model output and observed trend data. The multiple insights into the model's structure that this analysis provides are discussed. The methodology is applied to a second related model, and the biological consequences of the resulting comparison, including the evaluation of gene functions, are described. Bayesian uncertainty analysis for complex models using both emulators and history matching is shown to be a powerful technique that can greatly aid the study of a large class of systems biology models. It both provides insight into model behaviour and identifies the sets of rate parameters of interest.
Visual wetness perception based on image color statistics.

PubMed

Sawayama, Masataka; Adelson, Edward H; Nishida, Shin'ya

2017-05-01

Color vision provides humans and animals with the abilities to discriminate colors based on the wavelength composition of light and to determine the location and identity of objects of interest in cluttered scenes (e.g., ripe fruit among foliage). However, we argue that color vision can inform us about much more than color alone. Since a trichromatic image carries more information about the optical properties of a scene than a monochromatic image does, color can help us recognize complex material qualities. Here we show that human vision uses color statistics of an image for the perception of an ecologically important surface condition (i.e., wetness). Psychophysical experiments showed that overall enhancement of chromatic saturation, combined with a luminance tone change that increases the darkness and glossiness of the image, tended to make dry scenes look wetter. Theoretical analysis along with image analysis of real objects indicated that our image transformation, which we call the wetness enhancing transformation, is consistent with actual optical changes produced by surface wetting. Furthermore, we found that the wetness enhancing transformation operator was more effective for the images with many colors (large hue entropy) than for those with few colors (small hue entropy). The hue entropy may be used to separate surface wetness from other surface states having similar optical properties. While surface wetness and surface color might seem to be independent, there are higher order color statistics that can influence wetness judgments, in accord with the ecological statistics. The present findings indicate that the visual system uses color image statistics in an elegant way to help estimate the complex physical status of a scene.
A clinical research analytics toolkit for cohort study.

PubMed

Yu, Yiqin; Zhu, Yu; Sun, Xingzhi; Tao, Ying; Zhang, Shuo; Xu, Linhao; Pan, Yue

2012-01-01

This paper presents a clinical informatics toolkit that can assist physicians to conduct cohort studies effectively and efficiently. The toolkit has three key features: 1) support of procedures defined in epidemiology, 2) recommendation of statistical methods in data analysis, and 3) automatic generation of research reports. On one hand, our system can help physicians control research quality by leveraging the integrated knowledge of epidemiology and medical statistics; on the other hand, it can improve productivity by reducing the complexities for physicians during their cohort studies.
GLIMMPSE Lite: Calculating Power and Sample Size on Smartphone Devices

PubMed Central

Munjal, Aarti; Sakhadeo, Uttara R.; Muller, Keith E.; Glueck, Deborah H.; Kreidler, Sarah M.

2014-01-01

Researchers seeking to develop complex statistical applications for mobile devices face a common set of difficult implementation issues. In this work, we discuss general solutions to the design challenges. We demonstrate the utility of the solutions for a free mobile application designed to provide power and sample size calculations for univariate, one-way analysis of variance (ANOVA), GLIMMPSE Lite. Our design decisions provide a guide for other scientists seeking to produce statistical software for mobile platforms. PMID:25541688
Mathematics and Statistics Research Department progress report, period ending June 30, 1982

DOE Office of Scientific and Technical Information (OSTI.GOV)

Denson, M.V.; Funderlic, R.E.; Gosslee, D.G.

1982-08-01

This report is the twenty-fifth in the series of progress reports of the Mathematics and Statistics Research Department of the Computer Sciences Division, Union Carbide Corporation Nuclear Division (UCC-ND). Part A records research progress in analysis of large data sets, biometrics research, computational statistics, materials science applications, moving boundary problems, numerical linear algebra, and risk analysis. Collaboration and consulting with others throughout the UCC-ND complex are recorded in Part B. Included are sections on biology, chemistry, energy, engineering, environmental sciences, health and safety, materials science, safeguards, surveys, and the waste storage program. Part C summarizes the various educational activities inmore » which the staff was engaged. Part D lists the presentations of research results, and Part E records the staff's other professional activities during the report period.« less
Analysis of complex-type chromosome exchanges in astronauts' lymphocytes after space flight as a biomarker of high-LET exposure

NASA Technical Reports Server (NTRS)

George, Kerry; Wu, Honglu; Willingham, Veronica; Cucinotta, Francis A.

2002-01-01

High-LET radiation is more efficient in producing complex-type chromosome exchanges than sparsely ionizing radiation, and this can potentially be used as a biomarker of radiation quality. To investigate if complex chromosome exchanges are induced by the high-LET component of space radiation exposure, damage was assessed in astronauts' blood lymphocytes before and after long duration missions of 3-4 months. The frequency of simple translocations increased significantly for most of the crewmembers studied. However, there were few complex exchanges detected and only one crewmember had a significant increase after flight. It has been suggested that the yield of complex chromosome damage could be underestimated when analyzing metaphase cells collected at one time point after irradiation, and analysis of chemically-induced PCC may be more accurate since problems with complicated cell-cycle delays are avoided. However, in this case the yields of chromosome damage were similar for metaphase and PCC analysis of astronauts' lymphocytes. It appears that the use of complex-type exchanges as biomarker of radiation quality in vivo after low-dose chronic exposure in mixed radiation fields is hampered by statistical uncertainties.
Complexity and dynamics of topological and community structure in complex networks

NASA Astrophysics Data System (ADS)

Berec, Vesna

2017-07-01

Complexity is highly susceptible to variations in the network dynamics, reflected on its underlying architecture where topological organization of cohesive subsets into clusters, system's modular structure and resulting hierarchical patterns, are cross-linked with functional dynamics of the system. Here we study connection between hierarchical topological scales of the simplicial complexes and the organization of functional clusters - communities in complex networks. The analysis reveals the full dynamics of different combinatorial structures of q-th-dimensional simplicial complexes and their Laplacian spectra, presenting spectral properties of resulting symmetric and positive semidefinite matrices. The emergence of system's collective behavior from inhomogeneous statistical distribution is induced by hierarchically ordered topological structure, which is mapped to simplicial complex where local interactions between the nodes clustered into subcomplexes generate flow of information that characterizes complexity and dynamics of the full system.

Visibility graph analysis on heartbeat dynamics of meditation training

NASA Astrophysics Data System (ADS)

Jiang, Sen; Bian, Chunhua; Ning, Xinbao; Ma, Qianli D. Y.

2013-06-01

We apply the visibility graph analysis to human heartbeat dynamics by constructing the complex networks of heartbeat interval time series and investigating the statistical properties of the network before and during chi and yoga meditation. The experiment results show that visibility graph analysis can reveal the dynamical changes caused by meditation training manifested as regular heartbeat, which is closely related to the adjustment of autonomous neural system, and visibility graph analysis is effective to evaluate the effect of meditation.
Multiobjective Optimal Control Methodology for the Analysis of Certain Sociodynamic Problems

DTIC Science & Technology

2009-03-01

but less expensive in both time and memory. 137 References [1] R. Albert and A-L Barabasi. Statistical mechanics of complex networks. Reviews of Modern...Review, E(51):4282–4286, 1995. [24] D. Helbing, P. Molnar, and F. Schweitzer . Computer simulation of pedestrian dynamics and trail formation. May 1998...Patterson AFB, OH, 2001. [49] F. Schweitzer . Brownian Agents and Active Particles. Springer, Santa Fe, NM, 2003. [50] P. Sen. Complexities of social
Magnetic storms and solar flares: can be analysed within similar mathematical framework with other extreme events?

NASA Astrophysics Data System (ADS)

Balasis, Georgios; Potirakis, Stelios M.; Papadimitriou, Constantinos; Zitis, Pavlos I.; Eftaxias, Konstantinos

2015-04-01

The field of study of complex systems considers that the dynamics of complex systems are founded on universal principles that may be used to describe a great variety of scientific and technological approaches of different types of natural, artificial, and social systems. We apply concepts of the nonextensive statistical physics, on time-series data of observable manifestations of the underlying complex processes ending up to different extreme events, in order to support the suggestion that a dynamical analogy characterizes the generation of a single magnetic storm, solar flare, earthquake (in terms of pre-seismic electromagnetic signals) , epileptic seizure, and economic crisis. The analysis reveals that all the above mentioned different extreme events can be analyzed within similar mathematical framework. More precisely, we show that the populations of magnitudes of fluctuations included in all the above mentioned pulse-like-type time series follow the traditional Gutenberg-Richter law as well as a nonextensive model for earthquake dynamics, with similar nonextensive q-parameter values. Moreover, based on a multidisciplinary statistical analysis we show that the extreme events are characterized by crucial common symptoms, namely: (i) high organization, high compressibility, low complexity, high information content; (ii) strong persistency; and (iii) existence of clear preferred direction of emerged activities. These symptoms clearly discriminate the appearance of the extreme events under study from the corresponding background noise.
Econometrics of joint production: another approach. [Petroleum refining and petrochemicals production

DOE Office of Scientific and Technical Information (OSTI.GOV)

Griffin, J.M.

1977-11-01

The pseudo data approach to the joint production of petroleum refining and chemicals is described as an alternative that avoids the multicollinearity of time series data and allows a complex technology to be characterized in a statistical price possibility frontier. Intended primarily for long-range analysis, the pseudo data method can be used as a source of elasticity estimate for policy analysis. 19 references.
Performance analysis of Integrated Communication and Control System networks

NASA Technical Reports Server (NTRS)

Halevi, Y.; Ray, A.

1990-01-01

This paper presents statistical analysis of delays in Integrated Communication and Control System (ICCS) networks that are based on asynchronous time-division multiplexing. The models are obtained in closed form for analyzing control systems with randomly varying delays. The results of this research are applicable to ICCS design for complex dynamical processes like advanced aircraft and spacecraft, autonomous manufacturing plants, and chemical and processing plants.
The Statistical Consulting Center for Astronomy (SCCA)

NASA Technical Reports Server (NTRS)

Akritas, Michael

2001-01-01

The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
Control entropy identifies differential changes in complexity of walking and running gait patterns with increasing speed in highly trained runners

NASA Astrophysics Data System (ADS)

McGregor, Stephen J.; Busa, Michael A.; Skufca, Joseph; Yaggie, James A.; Bollt, Erik M.

2009-06-01

Regularity statistics have been previously applied to walking gait measures in the hope of gaining insight into the complexity of gait under different conditions and in different populations. Traditional regularity statistics are subject to the requirement of stationarity, a limitation for examining changes in complexity under dynamic conditions such as exhaustive exercise. Using a novel measure, control entropy (CE), applied to triaxial continuous accelerometry, we report changes in complexity of walking and running during increasing speeds up to exhaustion in highly trained runners. We further apply Karhunen-Loeve analysis in a new and novel way to the patterns of CE responses in each of the three axes to identify dominant modes of CE responses in the vertical, mediolateral, and anterior/posterior planes. The differential CE responses observed between the different axes in this select population provide insight into the constraints of walking and running in those who may have optimized locomotion. Future comparisons between athletes, healthy untrained, and clinical populations using this approach may help elucidate differences between optimized and diseased locomotor control.
Robotic partial nephrectomy shortens warm ischemia time, reducing suturing time kinetics even for an experienced laparoscopic surgeon: a comparative analysis.

PubMed

Faria, Eliney F; Caputo, Peter A; Wood, Christopher G; Karam, Jose A; Nogueras-González, Graciela M; Matin, Surena F

2014-02-01

Laparoscopic and robotic partial nephrectomy (LPN and RPN) are strongly related to influence of tumor complexity and learning curve. We analyzed a consecutive experience between RPN and LPN to discern if warm ischemia time (WIT) is in fact improved while accounting for these two confounding variables and if so by which particular aspect of WIT. This is a retrospective analysis of consecutive procedures performed by a single surgeon between 2002-2008 (LPN) and 2008-2012 (RPN). Specifically, individual steps, including tumor excision, suturing of intrarenal defect, and parenchyma, were recorded at the time of surgery. Multivariate and univariate analyzes were used to evaluate influence of learning curve, tumor complexity, and time kinetics of individual steps during WIT, to determine their influence in WIT. Additionally, we considered the effect of RPN on the learning curve. A total of 146 LPNs and 137 RPNs were included. Considering renal function, WIT, suturing time, renorrhaphy time were found statistically significant differences in favor of RPN (p < 0.05). In the univariate analysis, surgical procedure, learning curve, clinical tumor size, and RENAL nephrometry score were statistically significant predictors for WIT (p < 0.05). RPN decreased the WIT on average by approximately 7 min compared to LPN even when adjusting for learning curve, tumor complexity, and both together (p < 0.001). We found RPN was associated with a shorter WIT when controlling for influence of the learning curve and tumor complexity. The time required for tumor excision was not shortened but the time required for suturing steps was significantly shortened.
The degree of mutual anisotropy of biological liquids polycrystalline nets as a parameter in diagnostics and differentiations of hominal inflammatory processes

NASA Astrophysics Data System (ADS)

Angelsky, O. V.; Ushenko, Yu. A.; Balanetska, V. O.

2011-09-01

To characterize the degree of consistency of parameters of the optically uniaxial birefringent protein nets of blood plasma a new parameter - complex degree of mutual anisotropy is suggested. The technique of polarization measuring the coordinate distributions of the complex degree of mutual anisotropy of blood plasma is developed. It is shown that statistic approach to the analysis of the complex degree of mutual anisotropy distributions of blood plasma is effective during the diagnostics and differentiation of an acute inflammatory processes as well as acute and gangrenous appendicitis.
Alignment of RNA molecules: Binding energy and statistical properties of random sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Valba, O. V., E-mail: valbaolga@gmail.com; Nechaev, S. K., E-mail: sergei.nechaev@gmail.com; Tamm, M. V., E-mail: thumm.m@gmail.com

2012-02-15

A new statistical approach to the problem of pairwise alignment of RNA sequences is proposed. The problem is analyzed for a pair of interacting polymers forming an RNA-like hierarchical cloverleaf structures. An alignment is characterized by the numbers of matches, mismatches, and gaps. A weight function is assigned to each alignment; this function is interpreted as a free energy taking into account both direct monomer-monomer interactions and a combinatorial contribution due to formation of various cloverleaf secondary structures. The binding free energy is determined for a pair of RNA molecules. Statistical properties are discussed, including fluctuations of the binding energymore » between a pair of RNA molecules and loop length distribution in a complex. Based on an analysis of the free energy per nucleotide pair complexes of random RNAs as a function of the number of nucleotide types c, a hypothesis is put forward about the exclusivity of the alphabet c = 4 used by nature.« less
Evaluation of Lightning Incidence to Elements of a Complex Structure: A Monte Carlo Approach

NASA Technical Reports Server (NTRS)

Mata, Carlos T.; Rakov, V. A.

2008-01-01

There are complex structures for which the installation and positioning of the lightning protection system (LPS) cannot be done using the lightning protection standard guidelines. As a result, there are some "unprotected" or "exposed" areas. In an effort to quantify the lightning threat to these areas, a Monte Carlo statistical tool has been developed. This statistical tool uses two random number generators: a uniform distribution to generate origins of downward propagating leaders and a lognormal distribution to generate returns stroke peak currents. Downward leaders propagate vertically downward and their striking distances are defined by the polarity and peak current. Following the electrogeometrical concept, we assume that the leader attaches to the closest object within its striking distance. The statistical analysis is run for 10,000 years with an assumed ground flash density and peak current distributions, and the output of the program is the probability of direct attachment to objects of interest with its corresponding peak current distribution.
Evaluation of Lightning Incidence to Elements of a Complex Structure: A Monte Carlo Approach

NASA Technical Reports Server (NTRS)

Mata, Carlos T.; Rakov, V. A.

2008-01-01

There are complex structures for which the installation and positioning of the lightning protection system (LPS) cannot be done using the lightning protection standard guidelines. As a result, there are some "unprotected" or "exposed" areas. In an effort to quantify the lightning threat to these areas, a Monte Carlo statistical tool has been developed. This statistical tool uses two random number generators: a uniform distribution to generate the origin of downward propagating leaders and a lognormal distribution to generate the corresponding returns stroke peak currents. Downward leaders propagate vertically downward and their striking distances are defined by the polarity and peak current. Following the electrogeometrical concept, we assume that the leader attaches to the closest object within its striking distance. The statistical analysis is run for N number of years with an assumed ground flash density and the output of the program is the probability of direct attachment to objects of interest with its corresponding peak current distribution.
Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation.

PubMed

Arenas, Miguel

2015-04-01

NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.
Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines

NASA Astrophysics Data System (ADS)

Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.

2016-12-01

Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.
Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation.

PubMed

Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don

2015-10-01

Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.
CRAX. Cassandra Exoskeleton

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robinson, D.G.; Eubanks, L.

1998-03-01

This software assists the engineering designer in characterizing the statistical uncertainty in the performance of complex systems as a result of variations in manufacturing processes, material properties, system geometry or operating environment. The software is composed of a graphical user interface that provides the user with easy access to Cassandra uncertainty analysis routines. Together this interface and the Cassandra routines are referred to as CRAX (CassandRA eXoskeleton). The software is flexible enough, that with minor modification, it is able to interface with large modeling and analysis codes such as heat transfer or finite element analysis software. The current version permitsmore » the user to manually input a performance function, the number of random variables and their associated statistical characteristics: density function, mean, coefficients of variation. Additional uncertainity analysis modules are continuously being added to the Cassandra core.« less
Cassandra Exoskeleton

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robiinson, David G.

1999-02-20

This software assists the engineering designer in characterizing the statistical uncertainty in the performance of complex systems as a result of variations in manufacturing processes, material properties, system geometry or operating environment. The software is composed of a graphical user interface that provides the user with easy access to Cassandra uncertainty analysis routines. Together this interface and the Cassandra routines are referred to as CRAX (CassandRA eXoskeleton). The software is flexible enough, that with minor modification, it is able to interface with large modeling and analysis codes such as heat transfer or finite element analysis software. The current version permitsmore » the user to manually input a performance function, the number of random variables and their associated statistical characteristics: density function, mean, coefficients of variation. Additional uncertainity analysis modules are continuously being added to the Cassandra core.« less
Transitions between superstatistical regimes: Validity, breakdown and applications

NASA Astrophysics Data System (ADS)

Jizba, Petr; Korbel, Jan; Lavička, Hynek; Prokš, Martin; Svoboda, Václav; Beck, Christian

2018-03-01

Superstatistics is a widely employed tool of non-equilibrium statistical physics which plays an important rôle in analysis of hierarchical complex dynamical systems. Yet, its "canonical" formulation in terms of a single nuisance parameter is often too restrictive when applied to complex empirical data. Here we show that a multi-scale generalization of the superstatistics paradigm is more versatile, allowing to address such pertinent issues as transmutation of statistics or inter-scale stochastic behavior. To put some flesh on the bare bones, we provide a numerical evidence for a transition between two superstatistics regimes, by analyzing high-frequency (minute-tick) data for share-price returns of seven selected companies. Salient issues, such as breakdown of superstatistics in fractional diffusion processes or connection with Brownian subordination are also briefly discussed.
Graphical tools for network meta-analysis in STATA.

PubMed

Chaimani, Anna; Higgins, Julian P T; Mavridis, Dimitris; Spyridonos, Panagiota; Salanti, Georgia

2013-01-01

Network meta-analysis synthesizes direct and indirect evidence in a network of trials that compare multiple interventions and has the potential to rank the competing treatments according to the studied outcome. Despite its usefulness network meta-analysis is often criticized for its complexity and for being accessible only to researchers with strong statistical and computational skills. The evaluation of the underlying model assumptions, the statistical technicalities and presentation of the results in a concise and understandable way are all challenging aspects in the network meta-analysis methodology. In this paper we aim to make the methodology accessible to non-statisticians by presenting and explaining a series of graphical tools via worked examples. To this end, we provide a set of STATA routines that can be easily employed to present the evidence base, evaluate the assumptions, fit the network meta-analysis model and interpret its results.
Graphical Tools for Network Meta-Analysis in STATA

PubMed Central

Chaimani, Anna; Higgins, Julian P. T.; Mavridis, Dimitris; Spyridonos, Panagiota; Salanti, Georgia

2013-01-01

Network meta-analysis synthesizes direct and indirect evidence in a network of trials that compare multiple interventions and has the potential to rank the competing treatments according to the studied outcome. Despite its usefulness network meta-analysis is often criticized for its complexity and for being accessible only to researchers with strong statistical and computational skills. The evaluation of the underlying model assumptions, the statistical technicalities and presentation of the results in a concise and understandable way are all challenging aspects in the network meta-analysis methodology. In this paper we aim to make the methodology accessible to non-statisticians by presenting and explaining a series of graphical tools via worked examples. To this end, we provide a set of STATA routines that can be easily employed to present the evidence base, evaluate the assumptions, fit the network meta-analysis model and interpret its results. PMID:24098547

The correlation of initial radiographic characteristics of distal radius fractures and injuries of the triangular fibrocartilage complex.

PubMed

Kasapinova, K; Kamiloski, V

2016-06-01

Our purpose was to determine the correlation of initial radiographic parameters of a distal radius fracture with an injury of the triangular fibrocartilage complex. In a prospective study, 85 patients with surgically treated distal radius fractures were included. Wrist arthroscopy was used to identify and classify triangular fibrocartilage complex lesions. The initial radial length and angulation, dorsal angulation, ulnar variance and distal radioulnar distance were measured. Wrist arthroscopy identified a triangular fibrocartilage complex lesion in 45 patients. Statistical analysis did not identify a correlation with any single radiographic parameter of the distal radius fractures with the associated triangular fibrocartilage complex injuries. The initial radiograph of a distal radius fracture does not predict a triangular fibrocartilage complex injury. III. © The Author(s) 2016.
The Role of IQGAP1 in Breast Carcinoma

DTIC Science & Technology

2012-10-01

and"-tubulin expression was measured as described above. Statistical Analysis —All experiments were repeated inde- pendently at least three times...IQGAP1 Binds HER2—In vitro analysis with pure proteins was used to examine a possible interaction between IQGAP1 and HER2. GST alone or GST-HER2 was...incubated with puri- fied IQGAP1, and complexes were isolated with glutathione- Sepharose. Analysis by Western blotting reveals that IQGAP1 bindsHER2
Planning representation for automated exploratory data analysis

NASA Astrophysics Data System (ADS)

St. Amant, Robert; Cohen, Paul R.

1994-03-01

Igor is a knowledge-based system for exploratory statistical analysis of complex systems and environments. Igor has two related goals: to help automate the search for interesting patterns in data sets, and to help develop models that capture significant relationships in the data. We outline a language for Igor, based on techniques of opportunistic planning, which balances control and opportunism. We describe the application of Igor to the analysis of the behavior of Phoenix, an artificial intelligence planning system.
Quantifying the Energy Landscape Statistics in Proteins - a Relaxation Mode Analysis

NASA Astrophysics Data System (ADS)

Cai, Zhikun; Zhang, Yang

Energy landscape, the hypersurface in the configurational space, has been a useful concept in describing complex processes that occur over a very long time scale, such as the multistep slow relaxations of supercooled liquids and folding of polypeptide chains into structured proteins. Despite extensive simulation studies, its experimental characterization still remains a challenge. To address this challenge, we developed a relaxation mode analysis (RMA) for liquids under a framework analogous to the normal mode analysis for solids. Using RMA, important statistics of the activation barriers of the energy landscape becomes accessible from experimentally measurable two-point correlation functions, e.g. using quasi-elastic and inelastic scattering experiments. We observed a prominent coarsening effect of the energy landscape. The results were further confirmed by direct sampling of the energy landscape using a metadynamics-like adaptive autonomous basin climbing computation. We first demonstrate RMA in a supercooled liquid when dynamical cooperativity emerges in the landscape-influenced regime. Then we show this framework reveals encouraging energy landscape statistics when applied to proteins.
Automated Selection of Regions of Interest for Intensity-based FRET Analysis of Transferrin Endocytic Trafficking in Normal vs. Cancer Cells

PubMed Central

Talati, Ronak; Vanderpoel, Andrew; Eladdadi, Amina; Anderson, Kate; Abe, Ken; Barroso, Margarida

2013-01-01

The overexpression of certain membrane-bound receptors is a hallmark of cancer progression and it has been suggested to affect the organization, activation, recycling and down-regulation of receptor-ligand complexes in human cancer cells. Thus, comparing receptor trafficking pathways in normal vs. cancer cells requires the ability to image cells expressing dramatically different receptor expression levels. Here, we have presented a significant technical advance to the analysis and processing of images collected using intensity based Förster resonance energy transfer (FRET) confocal microscopy. An automated Image J macro was developed to select region of interests (ROI) based on intensity and statistical-based thresholds within cellular images with reduced FRET signal. Furthermore, SSMD (strictly standardized mean differences), a statistical signal-to-noise ratio (SNR) evaluation parameter, was used to validate the quality of FRET analysis, in particular of ROI database selection. The Image J ROI selection macro together with SSMD as an evaluation parameter of SNR levels, were used to investigate the endocytic recycling of Tfn-TFR complexes at nanometer range resolution in human normal vs. breast cancer cells expressing significantly different levels of endogenous TFR. Here, the FRET-based assay demonstrates that Tfn-TFR complexes in normal epithelial vs. breast cancer cells show a significantly different E% behavior during their endocytic recycling pathway. Since E% is a relative measure of distance, we propose that these changes in E% levels represent conformational changes in Tfn-TFR complexes during endocytic pathway. Thus, our results indicate that Tfn-TFR complexes undergo different conformational changes in normal vs. cancer cells, indicating that the organization of Tfn-TFR complexes at the nanometer range is significantly altered during the endocytic recycling pathway in cancer cells. In summary, improvements in the automated selection of FRET ROI datasets allowed us to detect significant changes in E% with potential biological significance in human normal vs. cancer cells. PMID:23994873
A primer on the study of transitory dynamics in ecological series using the scale-dependent correlation analysis.

PubMed

Rodríguez-Arias, Miquel Angel; Rodó, Xavier

2004-03-01

Here we describe a practical, step-by-step primer to scale-dependent correlation (SDC) analysis. The analysis of transitory processes is an important but often neglected topic in ecological studies because only a few statistical techniques appear to detect temporary features accurately enough. We introduce here the SDC analysis, a statistical and graphical method to study transitory processes at any temporal or spatial scale. SDC analysis, thanks to the combination of conventional procedures and simple well-known statistical techniques, becomes an improved time-domain analogue of wavelet analysis. We use several simple synthetic series to describe the method, a more complex example, full of transitory features, to compare SDC and wavelet analysis, and finally we analyze some selected ecological series to illustrate the methodology. The SDC analysis of time series of copepod abundances in the North Sea indicates that ENSO primarily is the main climatic driver of short-term changes in population dynamics. SDC also uncovers some long-term, unexpected features in the population. Similarly, the SDC analysis of Nicholson's blowflies data locates where the proposed models fail and provides new insights about the mechanism that drives the apparent vanishing of the population cycle during the second half of the series.
Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

PubMed

Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

2015-01-08

Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Statistical mechanics of complex neural systems and high dimensional data

NASA Astrophysics Data System (ADS)

Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya

2013-03-01

Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
Air-flow distortion and turbulence statistics near an animal facility

NASA Astrophysics Data System (ADS)

Prueger, J. H.; Eichinger, W. E.; Hipps, L. E.; Hatfield, J. L.; Cooper, D. I.

The emission and dispersion of particulates and gases from concentrated animal feeding operations (CAFO) at local to regional scales is a current issue in science and society. The transport of particulates, odors and toxic chemical species from the source into the local and eventually regional atmosphere is largely determined by turbulence. Any models that attempt to simulate the dispersion of particles must either specify or assume various statistical properties of the turbulence field. Statistical properties of turbulence are well documented for idealized boundary layers above uniform surfaces. However, an animal production facility is a complex surface with structures that act as bluff bodies that distort the turbulence intensity near the buildings. As a result, the initial release and subsequent dispersion of effluents in the region near a facility will be affected by the complex nature of the surface. Previous Lidar studies of plume dispersion over the facility used in this study indicated that plumes move in complex yet organized patterns that would not be explained by the properties of turbulence generally assumed in models. The objective of this study was to characterize the near-surface turbulence statistics in the flow field around an array of animal confinement buildings. Eddy covariance towers were erected in the upwind, within the building array and downwind regions of the flow field. Substantial changes in turbulence intensity statistics and turbulence-kinetic energy (TKE) were observed as the mean wind flow encountered the building structures. Spectra analysis demonstrated unique distribution of the spectral energy in the vertical profile above the buildings.
HDBStat!: a platform-independent software suite for statistical analysis of high dimensional biology data.

PubMed

Trivedi, Prinal; Edwards, Jode W; Wang, Jelai; Gadbury, Gary L; Srinivasasainagendra, Vinodh; Zakharkin, Stanislav O; Kim, Kyoungmi; Mehta, Tapan; Brand, Jacob P L; Patki, Amit; Page, Grier P; Allison, David B

2005-04-06

Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.
Systematic review of learning curves for minimally invasive abdominal surgery: a review of the methodology of data collection, depiction of outcomes, and statistical analysis.

PubMed

Harrysson, Iliana J; Cook, Jonathan; Sirimanna, Pramudith; Feldman, Liane S; Darzi, Ara; Aggarwal, Rajesh

2014-07-01

To determine how minimally invasive surgical learning curves are assessed and define an ideal framework for this assessment. Learning curves have implications for training and adoption of new procedures and devices. In 2000, a review of the learning curve literature was done by Ramsay et al and it called for improved reporting and statistical evaluation of learning curves. Since then, a body of literature is emerging on learning curves but the presentation and analysis vary. A systematic search was performed of MEDLINE, EMBASE, ISI Web of Science, ERIC, and the Cochrane Library from 1985 to August 2012. The inclusion criteria are minimally invasive abdominal surgery formally analyzing the learning curve and English language. 592 (11.1%) of the identified studies met the selection criteria. Time is the most commonly used proxy for the learning curve (508, 86%). Intraoperative outcomes were used in 316 (53%) of the articles, postoperative outcomes in 306 (52%), technical skills in 102 (17%), and patient-oriented outcomes in 38 (6%) articles. Over time, there was evidence of an increase in the relative amount of laparoscopic and robotic studies (P < 0.001) without statistical evidence of a change in the complexity of analysis (P = 0.121). Assessment of learning curves is needed to inform surgical training and evaluate new clinical procedures. An ideal analysis would account for the degree of complexity of individual cases and the inherent differences between surgeons. There is no single proxy that best represents the success of surgery, and hence multiple outcomes should be collected.
Majorana-Fermions, Their-Own Antiparticles, Following Non-Abelian Anyon/Semion Quantum-Statistics : Solid-State MEETS Particle Physics Neutrinos: Spin-Orbit-Coupled Superconductors and/or Superfluids to Neutrinos; Insulator-Heisenberg-Antiferromagnet MnF2 Majorana-Siegel-Birgenau-Keimer - Effect

NASA Astrophysics Data System (ADS)

Majorana-Fermi-Segre, E.-L.; Antonoff-Overhauser-Salam, Marvin-Albert-Abdus; Siegel, Edward Carl-Ludwig

2013-03-01

Majorana-fermions, being their own antiparticles, following non-Abelian anyon/semion quantum-statistics: in Zhang et.al.-...-Detwiler et.al.-...``Worlds-in-Collision'': solid-state/condensed-matter - physics spin-orbit - coupled topological-excitations in superconductors and/or superfluids -to- particle-physics neutrinos: ``When `Worlds' Collide'', analysis via Siegel[Schrodinger Centenary Symp., Imperial College, London (1987); in The Copenhagen-Interpretation Fifty-Years After the Como-Lecture, Symp. Fdns. Mod.-Phys., Joensu(1987); Symp. on Fractals, MRS Fall-Mtg., Boston(1989)-5-papers!!!] ``complex quantum-statistics in fractal-dimensions'', which explains hidden-dark-matter(HDM) IN Siegel ``Sephirot'' scenario for The Creation, uses Takagi[Prog.Theo.Phys. Suppl.88,1(86)]-Ooguri[PR D33,357(85)] - Picard-Lefschetz-Arnol'd-Vassil'ev[``Principia Read After 300 Years'', Not.AMS(1989); quantum-theory caveats comment-letters(1990); Applied Picard-Lefschetz Theory, AMS(2006)] - theorem quantum-statistics, which via Euler- formula becomes which via de Moivre- -formula further becomes which on unit-circle is only real for only, i.e, for, versus complex with imaginary-damping denominator for, i.e, for, such that Fermi-Dirac quantum-statistics for
Forecasting daily source air quality using multivariate statistical analysis and radial basis function networks.

PubMed

Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A

2008-12-01

It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.
The Role of Probability-Based Inference in an Intelligent Tutoring System.

ERIC Educational Resources Information Center

Mislevy, Robert J.; Gitomer, Drew H.

Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring…
Transnational Partnerships in Higher Education in China: The Diversity and Complexity of Elite Strategic Alliances

ERIC Educational Resources Information Center

Montgomery, Catherine

2016-01-01

Transnational partnerships between universities can illustrate the changing political,social, and cultural terrain of global higher education. Drawing on secondary data analysis of government educational statistics, university web pages, and a comprehensive literature review, this article focuses on transnational partnerships with particular…
Joint QTL linkage mapping for multiple-cross mating design sharing one common parent

USDA-ARS?s Scientific Manuscript database

Nested association mapping (NAM) is a novel genetic mating design that combines the advantages of linkage analysis and association mapping. This design provides opportunities to study the inheritance of complex traits, but also requires more advanced statistical methods. In this paper, we present th...
Averaging Models: Parameters Estimation with the R-Average Procedure

ERIC Educational Resources Information Center

Vidotto, G.; Massidda, D.; Noventa, S.

2010-01-01

The Functional Measurement approach, proposed within the theoretical framework of Information Integration Theory (Anderson, 1981, 1982), can be a useful multi-attribute analysis tool. Compared to the majority of statistical models, the averaging model can account for interaction effects without adding complexity. The R-Average method (Vidotto &…
Engaging Business Students in Quantitative Skills Development

ERIC Educational Resources Information Center

Cronin, Anthony; Carroll, Paula

2015-01-01

In this paper the complex problems of developing quantitative and analytical skills in undergraduate first year, first semester business students are addressed. An action research project, detailing how first year business students perceive the relevance of data analysis and inferential statistics in light of the economic downturn and the…
Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

DTIC Science & Technology

2015-09-12

AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a. CONTRACT NUMBER 5b. GRANT NUMBER FA9550-11-1-0239 5c. PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY
A simulator for evaluating methods for the detection of lesion-deficit associations

NASA Technical Reports Server (NTRS)

Megalooikonomou, V.; Davatzikos, C.; Herskovits, E. H.

2000-01-01

Although much has been learned about the functional organization of the human brain through lesion-deficit analysis, the variety of statistical and image-processing methods developed for this purpose precludes a closed-form analysis of the statistical power of these systems. Therefore, we developed a lesion-deficit simulator (LDS), which generates artificial subjects, each of which consists of a set of functional deficits, and a brain image with lesions; the deficits and lesions conform to predefined distributions. We used probability distributions to model the number, sizes, and spatial distribution of lesions, to model the structure-function associations, and to model registration error. We used the LDS to evaluate, as examples, the effects of the complexities and strengths of lesion-deficit associations, and of registration error, on the power of lesion-deficit analysis. We measured the numbers of recovered associations from these simulated data, as a function of the number of subjects analyzed, the strengths and number of associations in the statistical model, the number of structures associated with a particular function, and the prior probabilities of structures being abnormal. The number of subjects required to recover the simulated lesion-deficit associations was found to have an inverse relationship to the strength of associations, and to the smallest probability in the structure-function model. The number of structures associated with a particular function (i.e., the complexity of associations) had a much greater effect on the performance of the analysis method than did the total number of associations. We also found that registration error of 5 mm or less reduces the number of associations discovered by approximately 13% compared to perfect registration. The LDS provides a flexible framework for evaluating many aspects of lesion-deficit analysis.

Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

PubMed

Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

2016-04-01

To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.
Systems Engineering Metrics: Organizational Complexity and Product Quality Modeling

NASA Technical Reports Server (NTRS)

Mog, Robert A.

1997-01-01

Innovative organizational complexity and product quality models applicable to performance metrics for NASA-MSFC's Systems Analysis and Integration Laboratory (SAIL) missions and objectives are presented. An intensive research effort focuses on the synergistic combination of stochastic process modeling, nodal and spatial decomposition techniques, organizational and computational complexity, systems science and metrics, chaos, and proprietary statistical tools for accelerated risk assessment. This is followed by the development of a preliminary model, which is uniquely applicable and robust for quantitative purposes. Exercise of the preliminary model using a generic system hierarchy and the AXAF-I architectural hierarchy is provided. The Kendall test for positive dependence provides an initial verification and validation of the model. Finally, the research and development of the innovation is revisited, prior to peer review. This research and development effort results in near-term, measurable SAIL organizational and product quality methodologies, enhanced organizational risk assessment and evolutionary modeling results, and 91 improved statistical quantification of SAIL productivity interests.
Navigating complex sample analysis using national survey data.

PubMed

Saylor, Jennifer; Friedmann, Erika; Lee, Hyeon Joo

2012-01-01

The National Center for Health Statistics conducts the National Health and Nutrition Examination Survey and other national surveys with probability-based complex sample designs. Goals of national surveys are to provide valid data for the population of the United States. Analyses of data from population surveys present unique challenges in the research process but are valuable avenues to study the health of the United States population. The aim of this study was to demonstrate the importance of using complex data analysis techniques for data obtained with complex multistage sampling design and provide an example of analysis using the SPSS Complex Samples procedure. Illustration of challenges and solutions specific to secondary data analysis of national databases are described using the National Health and Nutrition Examination Survey as the exemplar. Oversampling of small or sensitive groups provides necessary estimates of variability within small groups. Use of weights without complex samples accurately estimates population means and frequency from the sample after accounting for over- or undersampling of specific groups. Weighting alone leads to inappropriate population estimates of variability, because they are computed as if the measures were from the entire population rather than a sample in the data set. The SPSS Complex Samples procedure allows inclusion of all sampling design elements, stratification, clusters, and weights. Use of national data sets allows use of extensive, expensive, and well-documented survey data for exploratory questions but limits analysis to those variables included in the data set. The large sample permits examination of multiple predictors and interactive relationships. Merging data files, availability of data in several waves of surveys, and complex sampling are techniques used to provide a representative sample but present unique challenges. In sophisticated data analysis techniques, use of these data is optimized.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilbert, Richard O.

The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less
Robust hierarchical state-space models reveal diel variation in travel rates of migrating leatherback turtles.

PubMed

Jonsen, Ian D; Myers, Ransom A; James, Michael C

2006-09-01

1. Biological and statistical complexity are features common to most ecological data that hinder our ability to extract meaningful patterns using conventional tools. Recent work on implementing modern statistical methods for analysis of such ecological data has focused primarily on population dynamics but other types of data, such as animal movement pathways obtained from satellite telemetry, can also benefit from the application of modern statistical tools. 2. We develop a robust hierarchical state-space approach for analysis of multiple satellite telemetry pathways obtained via the Argos system. State-space models are time-series methods that allow unobserved states and biological parameters to be estimated from data observed with error. We show that the approach can reveal important patterns in complex, noisy data where conventional methods cannot. 3. Using the largest Atlantic satellite telemetry data set for critically endangered leatherback turtles, we show that the diel pattern in travel rates of these turtles changes over different phases of their migratory cycle. While foraging in northern waters the turtles show similar travel rates during day and night, but on their southward migration to tropical waters travel rates are markedly faster during the day. These patterns are generally consistent with diving data, and may be related to changes in foraging behaviour. Interestingly, individuals that migrate southward to breed generally show higher daytime travel rates than individuals that migrate southward in a non-breeding year. 4. Our approach is extremely flexible and can be applied to many ecological analyses that use complex, sequential data.
An online sleep apnea detection method based on recurrence quantification analysis.

PubMed

Nguyen, Hoa Dinh; Wilkins, Brek A; Cheng, Qi; Benjamin, Bruce Allen

2014-07-01

This paper introduces an online sleep apnea detection method based on heart rate complexity as measured by recurrence quantification analysis (RQA) statistics of heart rate variability (HRV) data. RQA statistics can capture nonlinear dynamics of a complex cardiorespiratory system during obstructive sleep apnea. In order to obtain a more robust measurement of the nonstationarity of the cardiorespiratory system, we use different fixed amount of neighbor thresholdings for recurrence plot calculation. We integrate a feature selection algorithm based on conditional mutual information to select the most informative RQA features for classification, and hence, to speed up the real-time classification process without degrading the performance of the system. Two types of binary classifiers, i.e., support vector machine and neural network, are used to differentiate apnea from normal sleep. A soft decision fusion rule is developed to combine the results of these classifiers in order to improve the classification performance of the whole system. Experimental results show that our proposed method achieves better classification results compared with the previous recurrence analysis-based approach. We also show that our method is flexible and a strong candidate for a real efficient sleep apnea detection system.
Seeking a fingerprint: analysis of point processes in actigraphy recording

NASA Astrophysics Data System (ADS)

Gudowska-Nowak, Ewa; Ochab, Jeremi K.; Oleś, Katarzyna; Beldzik, Ewa; Chialvo, Dante R.; Domagalik, Aleksandra; Fąfrowicz, Magdalena; Marek, Tadeusz; Nowak, Maciej A.; Ogińska, Halszka; Szwed, Jerzy; Tyburczyk, Jacek

2016-05-01

Motor activity of humans displays complex temporal fluctuations which can be characterised by scale-invariant statistics, thus demonstrating that structure and fluctuations of such kinetics remain similar over a broad range of time scales. Previous studies on humans regularly deprived of sleep or suffering from sleep disorders predicted a change in the invariant scale parameters with respect to those for healthy subjects. In this study we investigate the signal patterns from actigraphy recordings by means of characteristic measures of fractional point processes. We analyse spontaneous locomotor activity of healthy individuals recorded during a week of regular sleep and a week of chronic partial sleep deprivation. Behavioural symptoms of lack of sleep can be evaluated by analysing statistics of duration times during active and resting states, and alteration of behavioural organisation can be assessed by analysis of power laws detected in the event count distribution, distribution of waiting times between consecutive movements and detrended fluctuation analysis of recorded time series. We claim that among different measures characterising complexity of the actigraphy recordings and their variations implied by chronic sleep distress, the exponents characterising slopes of survival functions in resting states are the most effective biomarkers distinguishing between healthy and sleep-deprived groups.
Spectroscopic studies on Solvatochromism of mixed-chelate copper(II) complexes using MLR technique

NASA Astrophysics Data System (ADS)

Golchoubian, Hamid; Moayyedi, Golasa; Fazilati, Hakimeh

2012-01-01

Mixed-chelate copper(II) complexes with a general formula [Cu(acac)(diamine)]X where acac = acetylacetonate ion, diamine = N,N-dimethyl,N'-benzyl-1,2-diaminoethane and X = BPh 4-, PF 6-, ClO 4- and BF 4- have been prepared. The complexes were characterized on the basis of elemental analysis, molar conductance, UV-vis and IR spectroscopies. The complexes are solvatochromic and their solvatochromism were investigated by visible spectroscopy. All complexes demonstrated the positive solvatochromism and among the complexes [Cu(acac)(diamine)]BPh 4·H 2O showed the highest Δ νmax value. To explore the mechanism of interaction between solvent molecules and the complexes, different solvent parameters such as DN, AN, α and β using multiple linear regression (MLR) method were employed. The statistical results suggested that the DN parameter of the solvent plays a dominate contribution to the shift of the d-d absorption band of the complexes.
Using Genome-Wide Expression Profiling to Define Gene Networks Relevant to the Study of Complex Traits: From RNA Integrity to Network Topology

PubMed Central

O'Brien, M.A.; Costin, B.N.; Miles, M.F.

2014-01-01

Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313
High-Density Signal Interface Electromagnetic Radiation Prediction for Electromagnetic Compatibility Evaluation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halligan, Matthew

Radiated power calculation approaches for practical scenarios of incomplete high- density interface characterization information and incomplete incident power information are presented. The suggested approaches build upon a method that characterizes power losses through the definition of power loss constant matrices. Potential radiated power estimates include using total power loss information, partial radiated power loss information, worst case analysis, and statistical bounding analysis. A method is also proposed to calculate radiated power when incident power information is not fully known for non-periodic signals at the interface. Incident data signals are modeled from a two-state Markov chain where bit state probabilities aremore » derived. The total spectrum for windowed signals is postulated as the superposition of spectra from individual pulses in a data sequence. Statistical bounding methods are proposed as a basis for the radiated power calculation due to the statistical calculation complexity to find a radiated power probability density function.« less
Predicting Physical Interactions between Protein Complexes*

PubMed Central

Clancy, Trevor; Rødland, Einar Andreas; Nygard, Ståle; Hovig, Eivind

2013-01-01

Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships. PMID:23438732
Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

PubMed

Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

2013-01-01

Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

PubMed

De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

2014-06-01

Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.
Effects of complex aural stimuli on mental performance.

PubMed

Vij, Mohit; Aghazadeh, Fereydoun; Ray, Thomas G; Hatipkarasulu, Selen

2003-06-01

The objective of this study is to investigate the effect of complex aural stimuli on mental performance. A series of experiments were designed to obtain data for two different analyses. The first analysis is a "Stimulus" versus "No-stimulus" comparison for each of the four dependent variables, i.e. quantitative ability, reasoning ability, spatial ability and memory of an individual, by comparing the control treatment with the rest of the treatments. The second set of analysis is a multi-variant analysis of variance for component level main effects and interactions. The two component factors are tempo of the complex aural stimuli and sound volume level, each administered at three discrete levels for all four dependent variables. Ten experiments were conducted on eleven subjects. It was found that complex aural stimuli influence the quantitative and spatial aspect of the mind, while the reasoning ability was unaffected by the stimuli. Although memory showed a trend to be worse with the presence of complex aural stimuli, the effect was statistically insignificant. Variation in tempo and sound volume level of an aural stimulus did not significantly affect the mental performance of an individual. The results of these experiments can be effectively used in designing work environments.
Microsurgical management of basilar artery apex aneurysms: a single surgeon's experience from Louisiana State University, Shreveport.

PubMed

Nanda, Anil; Sonig, Ashish; Banerjee, Anirban Deep; Javalkar, Vijay Kumar

2014-01-01

Basilar artery apex aneurysms continue to generate technical challenges and management controversy. Endovascular intervention is becoming the mainstay in the management of these formidable aneurysms, but it has limitations, especially with large/giant or wide neck basilar apex aneurysms. There is paucity of data in the available literature pertaining to the successful management of large/giant, wide neck, and calcified/thrombosed basilar apex aneurysms. We present our experience with consecutively operated complex basilar apex aneurysms so as to present the role of microneurosurgery as a viable management option for these aneurysms. Ours is a retrospective analysis of case-records for operated cases of basilar artery aneurysms spanning 18 years. Basilar apex aneurysms >10 cm, calcified or thrombosed, neck ≥4 mm posterior direction, and retro/subsellar were considered as complex anatomy aneurysms. Basilar apex aneurysms with favorable anatomy were included in the study as a reference group for statistical analysis. Patient demographics, complex features of aneurysms, clinical grade, and outcomes were analyzed. A total of 33 (53.2%) patients had complex anatomy: large (>10 mm) in eight (24.2%); giant aneurysms (>25 mm) in seven (21.2%); wide-neck in 22 (66.7%); and calcified/thrombosed morphology in five (15.1%). The mean age was 48.5 years, and 22 (66.67%) were women. All aneurysms were clipped by the use of various skull base approaches. A total of 71.9% of patients harboring complex aneurysm had good outcomes. If only unruptured and good grade complex aneurysms also are considered, then 86.9% (n = 20) patients had good outcomes. Statistically there was no significant difference in the outcomes of complex and noncomplex aneurysm. Although concerning, the management of large/giant, wide neck, and calcified/thrombosed aneurysms with microneurosurgery is still a competitive alternative to endovascular therapy. After careful selection of appropriate skull base approaches based on the complexity of the basilar apex aneurysm, microneurosurgery can achieve acceptable results. Copyright © 2014 Elsevier Inc. All rights reserved.
The power to detect linkage in complex disease by means of simple LOD-score analyses.

PubMed Central

Greenberg, D A; Abreu, P; Hodge, S E

1998-01-01

Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328
The power to detect linkage in complex disease by means of simple LOD-score analyses.

PubMed

Greenberg, D A; Abreu, P; Hodge, S E

1998-09-01

Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.
AtomicChargeCalculator: interactive web-based calculation of atomic charges in large biomolecular complexes and drug-like molecules.

PubMed

Ionescu, Crina-Maria; Sehnal, David; Falginella, Francesco L; Pant, Purbaj; Pravda, Lukáš; Bouchal, Tomáš; Svobodová Vařeková, Radka; Geidl, Stanislav; Koča, Jaroslav

2015-01-01

Partial atomic charges are a well-established concept, useful in understanding and modeling the chemical behavior of molecules, from simple compounds, to large biomolecular complexes with many reactive sites. This paper introduces AtomicChargeCalculator (ACC), a web-based application for the calculation and analysis of atomic charges which respond to changes in molecular conformation and chemical environment. ACC relies on an empirical method to rapidly compute atomic charges with accuracy comparable to quantum mechanical approaches. Due to its efficient implementation, ACC can handle any type of molecular system, regardless of size and chemical complexity, from drug-like molecules to biomacromolecular complexes with hundreds of thousands of atoms. ACC writes out atomic charges into common molecular structure files, and offers interactive facilities for statistical analysis and comparison of the results, in both tabular and graphical form. Due to high customizability and speed, easy streamlining and the unified platform for calculation and analysis, ACC caters to all fields of life sciences, from drug design to nanocarriers. ACC is freely available via the Internet at http://ncbr.muni.cz/ACC.
Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments

PubMed Central

Welch, Rene; Chung, Dongjun; Grass, Jeffrey; Landick, Robert

2017-01-01

Abstract ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data. PMID:28911122
Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments.

PubMed

Welch, Rene; Chung, Dongjun; Grass, Jeffrey; Landick, Robert; Keles, Sündüz

2017-09-06

ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

PubMed Central

Moretti, Stefano; van Leeuwen, Danitsja; Gmuender, Hans; Bonassi, Stefano; van Delft, Joost; Kleinjans, Jos; Patrone, Fioravante; Merlo, Domenico Franco

2008-01-01

Background In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. Results In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. Conclusion CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways. PMID:18764936
Discrimination of complex mixtures by a colorimetric sensor array: coffee aromas.

PubMed

Suslick, Benjamin A; Feng, Liang; Suslick, Kenneth S

2010-03-01

The analysis of complex mixtures presents a difficult challenge even for modern analytical techniques, and the ability to discriminate among closely similar such mixtures often remains problematic. Coffee provides a readily available archetype of such highly multicomponent systems. The use of a low-cost, sensitive colorimetric sensor array for the detection and identification of coffee aromas is reported. The color changes of the sensor array were used as a digital representation of the array response and analyzed with standard statistical methods, including principal component analysis (PCA) and hierarchical clustering analysis (HCA). PCA revealed that the sensor array has exceptionally high dimensionality with 18 dimensions required to define 90% of the total variance. In quintuplicate runs of 10 commercial coffees and controls, no confusions or errors in classification by HCA were observed in 55 trials. In addition, the effects of temperature and time in the roasting of green coffee beans were readily observed and distinguishable with a resolution better than 10 degrees C and 5 min, respectively. Colorimetric sensor arrays demonstrate excellent potential for complex systems analysis in real-world applications and provide a novel method for discrimination among closely similar complex mixtures.
Discrimination of Complex Mixtures by a Colorimetric Sensor Array: Coffee Aromas

PubMed Central

Suslick, Benjamin A.; Feng, Liang; Suslick, Kenneth S.

2010-01-01

The analysis of complex mixtures presents a difficult challenge even for modern analytical techniques, and the ability to discriminate among closely similar such mixtures often remains problematic. Coffee provides a readily available archetype of such highly multicomponent systems. The use of a low-cost, sensitive colorimetric sensor array for the detection and identification of coffee aromas is reported. The color changes of the sensor array were used as a digital representation of the array response and analyzed with standard statistical methods, including principal component analysis (PCA) and hierarchical clustering analysis (HCA). PCA revealed that the sensor array has exceptionally high dimensionality with 18 dimensions required to define 90% of the total variance. In quintuplicate runs of 10 commercial coffees and controls, no confusions or errors in classification by HCA were observed in 55 trials. In addition, the effects of temperature and time in the roasting of green coffee beans were readily observed and distinguishable with a resolution better than 10 °C and 5 min, respectively. Colorimetric sensor arrays demonstrate excellent potential for complex systems analysis in real-world applications and provide a novel method for discrimination among closely similar complex mixtures. PMID:20143838
Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC

PubMed Central

Templeton, Alan R.

2009-01-01

Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182
Small sample estimation of the reliability function for technical products

NASA Astrophysics Data System (ADS)

Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.

2017-12-01

It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.
Parallel processing of genomics data

NASA Astrophysics Data System (ADS)

Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

2016-10-01

The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Complexities and potential pitfalls of clinical study design and data analysis in assisted reproduction.

PubMed

Patounakis, George; Hill, Micah J

2018-06-01

The purpose of the current review is to describe the common pitfalls in design and statistical analysis of reproductive medicine studies. It serves to guide both authors and reviewers toward reducing the incidence of spurious statistical results and erroneous conclusions. The large amount of data gathered in IVF cycles leads to problems with multiplicity, multicollinearity, and over fitting of regression models. Furthermore, the use of the word 'trend' to describe nonsignificant results has increased in recent years. Finally, methods to accurately account for female age in infertility research models are becoming more common and necessary. The pitfalls of study design and analysis reviewed provide a framework for authors and reviewers to approach clinical research in the field of reproductive medicine. By providing a more rigorous approach to study design and analysis, the literature in reproductive medicine will have more reliable conclusions that can stand the test of time.
An Overview of R in Health Decision Sciences.

PubMed

Jalal, Hawre; Pechlivanoglou, Petros; Krijkamp, Eline; Alarid-Escudero, Fernando; Enns, Eva; Hunink, M G Myriam

2017-10-01

As the complexity of health decision science applications increases, high-level programming languages are increasingly adopted for statistical analyses and numerical computations. These programming languages facilitate sophisticated modeling, model documentation, and analysis reproducibility. Among the high-level programming languages, the statistical programming framework R is gaining increased recognition. R is freely available, cross-platform compatible, and open source. A large community of users who have generated an extensive collection of well-documented packages and functions supports it. These functions facilitate applications of health decision science methodology as well as the visualization and communication of results. Although R's popularity is increasing among health decision scientists, methodological extensions of R in the field of decision analysis remain isolated. The purpose of this article is to provide an overview of existing R functionality that is applicable to the various stages of decision analysis, including model design, input parameter estimation, and analysis of model outputs.
Application of higher-order cepstral techniques in problems of fetal heart signal extraction

NASA Astrophysics Data System (ADS)

Sabry-Rizk, Madiha; Zgallai, Walid; Hardiman, P.; O'Riordan, J.

1996-10-01

Recently, cepstral analysis based on second order statistics and homomorphic filtering techniques have been used in the adaptive decomposition of overlapping, or otherwise, and noise contaminated ECG complexes of mothers and fetals obtained by a transabdominal surface electrodes connected to a monitoring instrument, an interface card, and a PC. Differential time delays of fetal heart beats measured from a reference point located on the mother complex after transformation to cepstra domains are first obtained and this is followed by fetal heart rate variability computations. Homomorphic filtering in the complex cepstral domain and the subuent transformation to the time domain results in fetal complex recovery. However, three problems have been identified with second-order based cepstral techniques that needed rectification in this paper. These are (1) errors resulting from the phase unwrapping algorithms and leading to fetal complex perturbation, (2) the unavoidable conversion of noise statistics from Gaussianess to non-Gaussianess due to the highly non-linear nature of homomorphic transform does warrant stringent noise cancellation routines, (3) due to the aforementioned problems in (1) and (2), it is difficult to adaptively optimize windows to include all individual fetal complexes in the time domain based on amplitude thresholding routines in the complex cepstral domain (i.e. the task of `zooming' in on weak fetal complexes requires more processing time). The use of third-order based high resolution differential cepstrum technique results in recovery of the delay of the order of 120 milliseconds.
[Evaluation of the quality of Anales Españoles de Pediatría versus Medicina Clínica].

PubMed

Bonillo Perales, A

2002-08-01

To compare the scientific methodology and quality of articles published in Anales Españoles de Pediatría and Medicina Clínica. A stratified and randomized selection of 40 original articles published in 2001 in Anales Españoles de Pediatría and Medicina Clínica was made. Methodological errors in the critical analysis of original articles (21 items), epidemiological design, sample size, statistical complexity and levels of scientific evidence in both journals were compared using the chi-squared and/or Student's t-test. No differences were found between Anales Españoles de Pediatría and Medicina Clínica in the critical evaluation of original articles (p > 0.2). In original articles published in Anales Españoles de Pediatría, the designs were of lower scientific evidence (a lower proportion of clinical trials, cohort and case-control studies) (17.5 vs 42.5 %, p 0.05), sample sizes were smaller (p 0.003) and there was less statistical complexity in the results section (p 0.03). To improve the scientific quality of Anales Españoles de Pediatría, improved study designs, larger sample sizes and greater statistical complexity are required in its articles.
Statistical learning and selective inference.

PubMed

Taylor, Jonathan; Tibshirani, Robert J

2015-06-23

We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Statistical Features of Complex Systems ---Toward Establishing Sociological Physics---

NASA Astrophysics Data System (ADS)

Kobayashi, Naoki; Kuninaka, Hiroto; Wakita, Jun-ichi; Matsushita, Mitsugu

2011-07-01

Complex systems have recently attracted much attention, both in natural sciences and in sociological sciences. Members constituting a complex system evolve through nonlinear interactions among each other. This means that in a complex system the multiplicative experience or, so to speak, the history of each member produces its present characteristics. If attention is paid to any statistical property in any complex system, the lognormal distribution is the most natural and appropriate among the standard or ``normal'' statistics to overview the whole system. In fact, the lognormality emerges rather conspicuously when we examine, as familiar and typical examples of statistical aspects in complex systems, the nursing-care period for the aged, populations of prefectures and municipalities, and our body height and weight. Many other examples are found in nature and society. On the basis of these observations, we discuss the possibility of sociological physics.
Robust Strategy for Rocket Engine Health Monitoring

NASA Technical Reports Server (NTRS)

Santi, L. Michael

2001-01-01

Monitoring the health of rocket engine systems is essentially a two-phase process. The acquisition phase involves sensing physical conditions at selected locations, converting physical inputs to electrical signals, conditioning the signals as appropriate to establish scale or filter interference, and recording results in a form that is easy to interpret. The inference phase involves analysis of results from the acquisition phase, comparison of analysis results to established health measures, and assessment of health indications. A variety of analytical tools may be employed in the inference phase of health monitoring. These tools can be separated into three broad categories: statistical, rule based, and model based. Statistical methods can provide excellent comparative measures of engine operating health. They require well-characterized data from an ensemble of "typical" engines, or "golden" data from a specific test assumed to define the operating norm in order to establish reliable comparative measures. Statistical methods are generally suitable for real-time health monitoring because they do not deal with the physical complexities of engine operation. The utility of statistical methods in rocket engine health monitoring is hindered by practical limits on the quantity and quality of available data. This is due to the difficulty and high cost of data acquisition, the limited number of available test engines, and the problem of simulating flight conditions in ground test facilities. In addition, statistical methods incur a penalty for disregarding flow complexity and are therefore limited in their ability to define performance shift causality. Rule based methods infer the health state of the engine system based on comparison of individual measurements or combinations of measurements with defined health norms or rules. This does not mean that rule based methods are necessarily simple. Although binary yes-no health assessment can sometimes be established by relatively simple rules, the causality assignment needed for refined health monitoring often requires an exceptionally complex rule base involving complicated logical maps. Structuring the rule system to be clear and unambiguous can be difficult, and the expert input required to maintain a large logic network and associated rule base can be prohibitive.
Holo-analysis.

PubMed

Rosen, G D

2006-06-01

Meta-analysis is a vague descriptor used to encompass very diverse methods of data collection analysis, ranging from simple averages to more complex statistical methods. Holo-analysis is a fully comprehensive statistical analysis of all available data and all available variables in a specified topic, with results expressed in a holistic factual empirical model. The objectives and applications of holo-analysis include software production for prediction of responses with confidence limits, translation of research conditions to praxis (field) circumstances, exposure of key missing variables, discovery of theoretically unpredictable variables and interactions, and planning future research. Holo-analyses are cited as examples of the effects on broiler feed intake and live weight gain of exogenous phytases, which account for 70% of variation in responses in terms of 20 highly significant chronological, dietary, environmental, genetic, managemental, and nutrient variables. Even better future accountancy of variation will be facilitated if and when authors of papers routinely provide key data for currently neglected variables, such as temperatures, complete feed formulations, and mortalities.
The Equity of New York State's System of Financing Schools: An Update.

ERIC Educational Resources Information Center

Scheuer, Joan

1983-01-01

This statistical analysis of the equity and efficiency of New York's complex school finance system concludes that legislation since 1975 has neither significantly reduced wide disparities in local spending nor weakened the link between wealth and expenditure because the system cannot be improved without a substantial funding increase. (MJL)
Some Technical Aspects of a CALIOP and MODIS Data Analysis that Examines Near-Cloud Aerosol Properties as a Function of Cloud Fraction

NASA Technical Reports Server (NTRS)

Varnai, Tamas; Yang, Weidong; Marshak, Alexander

2016-01-01

CALIOP shows stronger near-cloud changes in aerosol properties at higher cloud fractions. Cloud fraction variations explain a third of near-cloud changes in overall aerosol statistics. Cloud fraction and aerosol particle size distribution have a complex relationship.
Power Analysis for Complex Mediational Designs Using Monte Carlo Methods

ERIC Educational Resources Information Center

Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

2010-01-01

Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex…
PUNCHED CARD SYSTEM NEEDN'T BE COMPLEX TO GIVE COMPLETE CONTROL.

ERIC Educational Resources Information Center

BEMIS, HAZEL T.

AT WORCESTER JUNIOR COLLEGE, MASSACHUSETTS, USE OF A MANUALLY OPERATED PUNCHED CARD SYSTEM HAS RESULTED IN (1) SIMPLIFIED REGISTRATION PROCEDURES, (2) QUICK ANALYSIS OF CONFLICTS AND PROBLEMS IN CLASS SCHEDULING, (3) READY ACCESS TO STATISTICAL INFORMATION, (4) DIRECTORY INFORMATION IN A WIDE RANGE OF CLASSIFICATIONS, (5) EASY VERIFICATION OF…
Modeling Conditional Probabilities in Complex Educational Assessments. CSE Technical Report.

ERIC Educational Resources Information Center

Mislevy, Robert J.; Almond, Russell; Dibello, Lou; Jenkins, Frank; Steinberg, Linda; Yan, Duanli; Senturk, Deniz

An active area in psychometric research is coordinated task design and statistical analysis built around cognitive models. Compared with classical test theory and item response theory, there is often less information from observed data about the measurement-model parameters. On the other hand, there is more information from the grounding…
Complex Network Analysis for Characterizing Global Value Chains in Equipment Manufacturing

PubMed Central

Meng, Bo; Cheng, Lihong

2017-01-01

The rise of global value chains (GVCs) characterized by the so-called “outsourcing”, “fragmentation production”, and “trade in tasks” has been considered one of the most important phenomena for the 21st century trade. GVCs also can play a decisive role in trade policy making. However, due to the increasing complexity and sophistication of international production networks, especially in the equipment manufacturing industry, conventional trade statistics and the corresponding trade indicators may give us a distorted picture of trade. This paper applies various network analysis tools to the new GVC accounting system proposed by Koopman et al. (2014) and Wang et al. (2013) in which gross exports can be decomposed into value-added terms through various routes along GVCs. This helps to divide the equipment manufacturing-related GVCs into some sub-networks with clear visualization. The empirical results of this paper significantly improve our understanding of the topology of equipment manufacturing-related GVCs as well as the interdependency of countries in these GVCs that is generally invisible from the traditional trade statistics. PMID:28081201

Rapid determination of trace copper in animal feed based on micro-plate colorimetric reaction and statistical partitioning correction.

PubMed

Niu, Yiming; Wang, Jiayi; Zhang, Chi; Chen, Yiqiang

2017-04-15

The objective of this study was to develop a micro-plate based colorimetric assay for rapid and high-throughput detection of copper in animal feed. Copper ion in animal feed was extracted by trichloroacetic acid solution and reduced to cuprous ion by hydroxylamine. The cuprous ion can chelate with 2,2'-bicinchoninic acid to form a Cu-BCA complex which was detected with high sensitivity by micro-plate reader at 354nm. The whole assay procedure can be completed within 20min. To eliminate matrix interference, a statistical partitioning correction approach was proposed, which makes the detection of copper in complex samples possible. The limit of detection was 0.035μg/mL and the detection range was 0.1-10μg/mL of copper in buffer solution. Actual sample analysis indicated that this colorimetric assay produced results consistent with atomic absorption spectrometry analysis. These results demonstrated that the developed assay can be used for rapid determination of copper in animal feed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Characterizing and locating air pollution sources in a complex industrial district using optical remote sensing technology and multivariate statistical modeling.

PubMed

Chang, Pao-Erh Paul; Yang, Jen-Chih Rena; Den, Walter; Wu, Chang-Fu

2014-09-01

Emissions of volatile organic compounds (VOCs) are most frequent environmental nuisance complaints in urban areas, especially where industrial districts are nearby. Unfortunately, identifying the responsible emission sources of VOCs is essentially a difficult task. In this study, we proposed a dynamic approach to gradually confine the location of potential VOC emission sources in an industrial complex, by combining multi-path open-path Fourier transform infrared spectrometry (OP-FTIR) measurement and the statistical method of principal component analysis (PCA). Close-cell FTIR was further used to verify the VOC emission source by measuring emitted VOCs from selected exhaust stacks at factories in the confined areas. Multiple open-path monitoring lines were deployed during a 3-month monitoring campaign in a complex industrial district. The emission patterns were identified and locations of emissions were confined by the wind data collected simultaneously. N,N-Dimethyl formamide (DMF), 2-butanone, toluene, and ethyl acetate with mean concentrations of 80.0 ± 1.8, 34.5 ± 0.8, 103.7 ± 2.8, and 26.6 ± 0.7 ppbv, respectively, were identified as the major VOC mixture at all times of the day around the receptor site. As the toxic air pollutant, the concentrations of DMF in air samples were found exceeding the ambient standard despite the path-average effect of OP-FTIR upon concentration levels. The PCA data identified three major emission sources, including PU coating, chemical packaging, and lithographic printing industries. Applying instrumental measurement and statistical modeling, this study has established a systematic approach for locating emission sources. Statistical modeling (PCA) plays an important role in reducing dimensionality of a large measured dataset and identifying underlying emission sources. Instrumental measurement, however, helps verify the outcomes of the statistical modeling. The field study has demonstrated the feasibility of using multi-path OP-FTIR measurement. The wind data incorporating with the statistical modeling (PCA) may successfully identify the major emission source in a complex industrial district.
Answering the Questions of Whether and When Learning Occurs: Using Discrete-Time Survival Analysis to Investigate the Ways in Which College Chemistry Students' Ideas about Structure-Property Relationships Evolve

ERIC Educational Resources Information Center

Underwood, Sonia M.; Reyes-Gastelum, David; Cooper, Melanie M.

2015-01-01

Longitudinal studies can provide significant insights into how students develop competence in a topic or subject area over time. However, there are many barriers, such as retention of students in the study and the complexity of data analysis, that make these studies rare. Here, we present how a statistical framework, discrete-time survival…
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.

PubMed

Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz

2017-03-01

Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
In Planta Single-Molecule Pull-Down Reveals Tetrameric Stoichiometry of HD-ZIPIII:LITTLE ZIPPER Complexes.

PubMed

Husbands, Aman Y; Aggarwal, Vasudha; Ha, Taekjip; Timmermans, Marja C P

2016-08-01

Deciphering complex biological processes markedly benefits from approaches that directly assess the underlying biomolecular interactions. Most commonly used approaches to monitor protein-protein interactions typically provide nonquantitative readouts that lack statistical power and do not yield information on the heterogeneity or stoichiometry of protein complexes. Single-molecule pull-down (SiMPull) uses single-molecule fluorescence detection to mitigate these disadvantages and can quantitatively interrogate interactions between proteins and other compounds, such as nucleic acids, small molecule ligands, and lipids. Here, we establish SiMPull in plants using the HOMEODOMAIN LEUCINE ZIPPER III (HD-ZIPIII) and LITTLE ZIPPER (ZPR) interaction as proof-of-principle. Colocalization analysis of fluorophore-tagged HD-ZIPIII and ZPR proteins provides strong statistical evidence of complex formation. In addition, we use SiMPull to directly quantify YFP and mCherry maturation probabilities, showing these differ substantially from values obtained in mammalian systems. Leveraging these probabilities, in conjunction with fluorophore photobleaching assays on over 2000 individual complexes, we determined HD-ZIPIII:ZPR stoichiometry. Intriguingly, these complexes appear as heterotetramers, comprising two HD-ZIPIII and two ZPR molecules, rather than heterodimers as described in the current model. This surprising result raises new questions about the regulation of these key developmental factors and is illustrative of the unique contribution SiMPull is poised to make to in planta protein interaction studies. © 2016 American Society of Plant Biologists. All rights reserved.
Random forests for classification in ecology

USGS Publications Warehouse

Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

2007-01-01

Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.
Tsallis statistics and neurodegenerative disorders

NASA Astrophysics Data System (ADS)

Iliopoulos, Aggelos C.; Tsolaki, Magdalini; Aifantis, Elias C.

2016-08-01

In this paper, we perform statistical analysis of time series deriving from four neurodegenerative disorders, namely epilepsy, amyotrophic lateral sclerosis (ALS), Parkinson's disease (PD), Huntington's disease (HD). The time series are concerned with electroencephalograms (EEGs) of healthy and epileptic states, as well as gait dynamics (in particular stride intervals) of the ALS, PD and HDs. We study data concerning one subject for each neurodegenerative disorder and one healthy control. The analysis is based on Tsallis non-extensive statistical mechanics and in particular on the estimation of Tsallis q-triplet, namely {qstat, qsen, qrel}. The deviation of Tsallis q-triplet from unity indicates non-Gaussian statistics and long-range dependencies for all time series considered. In addition, the results reveal the efficiency of Tsallis statistics in capturing differences in brain dynamics between healthy and epileptic states, as well as differences between ALS, PD, HDs from healthy control subjects. The results indicate that estimations of Tsallis q-indices could be used as possible biomarkers, along with others, for improving classification and prediction of epileptic seizures, as well as for studying the gait complex dynamics of various diseases providing new insights into severity, medications and fall risk, improving therapeutic interventions.
Decoding the spatial signatures of multi-scale climate variability - a climate network perspective

NASA Astrophysics Data System (ADS)

Donner, R. V.; Jajcay, N.; Wiedermann, M.; Ekhtiari, N.; Palus, M.

2017-12-01

During the last years, the application of complex networks as a versatile tool for analyzing complex spatio-temporal data has gained increasing interest. Establishing this approach as a new paradigm in climatology has already provided valuable insights into key spatio-temporal climate variability patterns across scales, including novel perspectives on the dynamics of the El Nino Southern Oscillation or the emergence of extreme precipitation patterns in monsoonal regions. In this work, we report first attempts to employ network analysis for disentangling multi-scale climate variability. Specifically, we introduce the concept of scale-specific climate networks, which comprises a sequence of networks representing the statistical association structure between variations at distinct time scales. For this purpose, we consider global surface air temperature reanalysis data and subject the corresponding time series at each grid point to a complex-valued continuous wavelet transform. From this time-scale decomposition, we obtain three types of signals per grid point and scale - amplitude, phase and reconstructed signal, the statistical similarity of which is then represented by three complex networks associated with each scale. We provide a detailed analysis of the resulting connectivity patterns reflecting the spatial organization of climate variability at each chosen time-scale. Global network characteristics like transitivity or network entropy are shown to provide a new view on the (global average) relevance of different time scales in climate dynamics. Beyond expected trends originating from the increasing smoothness of fluctuations at longer scales, network-based statistics reveal different degrees of fragmentation of spatial co-variability patterns at different scales and zonal shifts among the key players of climate variability from tropically to extra-tropically dominated patterns when moving from inter-annual to decadal scales and beyond. The obtained results demonstrate the potential usefulness of systematically exploiting scale-specific climate networks, whose general patterns are in line with existing climatological knowledge, but provide vast opportunities for further quantifications at local, regional and global scales that are yet to be explored.
The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

PubMed Central

Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth

2006-01-01

Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity. PMID:16626497
Multivariate meta-analysis for non-linear and other multi-parameter associations

PubMed Central

Gasparrini, A; Armstrong, B; Kenward, M G

2012-01-01

In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
Study/experimental/research design: much more than statistics.

PubMed

Knight, Kenneth L

2010-01-01

The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.
Statistical genetics concepts and approaches in schizophrenia and related neuropsychiatric research.

PubMed

Schork, Nicholas J; Greenwood, Tiffany A; Braff, David L

2007-01-01

Statistical genetics is a research field that focuses on mathematical models and statistical inference methodologies that relate genetic variations (ie, naturally occurring human DNA sequence variations or "polymorphisms") to particular traits or diseases (phenotypes) usually from data collected on large samples of families or individuals. The ultimate goal of such analysis is the identification of genes and genetic variations that influence disease susceptibility. Although of extreme interest and importance, the fact that many genes and environmental factors contribute to neuropsychiatric diseases of public health importance (eg, schizophrenia, bipolar disorder, and depression) complicates relevant studies and suggests that very sophisticated mathematical and statistical modeling may be required. In addition, large-scale contemporary human DNA sequencing and related projects, such as the Human Genome Project and the International HapMap Project, as well as the development of high-throughput DNA sequencing and genotyping technologies have provided statistical geneticists with a great deal of very relevant and appropriate information and resources. Unfortunately, the use of these resources and their interpretation are not straightforward when applied to complex, multifactorial diseases such as schizophrenia. In this brief and largely nonmathematical review of the field of statistical genetics, we describe many of the main concepts, definitions, and issues that motivate contemporary research. We also provide a discussion of the most pressing contemporary problems that demand further research if progress is to be made in the identification of genes and genetic variations that predispose to complex neuropsychiatric diseases.
Modeling Stochastic Complexity in Complex Adaptive Systems: Non-Kolmogorov Probability and the Process Algebra Approach.

PubMed

Sulis, William H

2017-10-01

Walter Freeman III pioneered the application of nonlinear dynamical systems theories and methodologies in his work on mesoscopic brain dynamics.Sadly, mainstream psychology and psychiatry still cling to linear correlation based data analysis techniques, which threaten to subvert the process of experimentation and theory building. In order to progress, it is necessary to develop tools capable of managing the stochastic complexity of complex biopsychosocial systems, which includes multilevel feedback relationships, nonlinear interactions, chaotic dynamics and adaptability. In addition, however, these systems exhibit intrinsic randomness, non-Gaussian probability distributions, non-stationarity, contextuality, and non-Kolmogorov probabilities, as well as the absence of mean and/or variance and conditional probabilities. These properties and their implications for statistical analysis are discussed. An alternative approach, the Process Algebra approach, is described. It is a generative model, capable of generating non-Kolmogorov probabilities. It has proven useful in addressing fundamental problems in quantum mechanics and in the modeling of developing psychosocial systems.
Local image statistics: maximum-entropy constructions and perceptual salience

PubMed Central

Victor, Jonathan D.; Conte, Mary M.

2012-01-01

The space of visual signals is high-dimensional and natural visual images have a highly complex statistical structure. While many studies suggest that only a limited number of image statistics are used for perceptual judgments, a full understanding of visual function requires analysis not only of the impact of individual image statistics, but also, how they interact. In natural images, these statistical elements (luminance distributions, correlations of low and high order, edges, occlusions, etc.) are intermixed, and their effects are difficult to disentangle. Thus, there is a need for construction of stimuli in which one or more statistical elements are introduced in a controlled fashion, so that their individual and joint contributions can be analyzed. With this as motivation, we present algorithms to construct synthetic images in which local image statistics—including luminance distributions, pair-wise correlations, and higher-order correlations—are explicitly specified and all other statistics are determined implicitly by maximum-entropy. We then apply this approach to measure the sensitivity of the human visual system to local image statistics and to sample their interactions. PMID:22751397
Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

NASA Astrophysics Data System (ADS)

Friedel, M. J.; Daughney, C.

2016-12-01

The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.
Study design and statistical analysis of data in human population studies with the micronucleus assay.

PubMed

Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano

2011-01-01

The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.
Electroencephalogram complexity analysis in children with attention-deficit/hyperactivity disorder during a visual cognitive task.

PubMed

Zarafshan, Hadi; Khaleghi, Ali; Mohammadi, Mohammad Reza; Moeini, Mahdi; Malmir, Nastaran

2016-01-01

The aim of this study was to investigate electroencephalogram (EEG) dynamics using complexity analysis in children with attention-deficit/hyperactivity disorder (ADHD) compared with healthy control children when performing a cognitive task. Thirty 7-12-year-old children meeting Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-5) criteria for ADHD and 30 healthy control children underwent an EEG evaluation during a cognitive task, and Lempel-Ziv complexity (LZC) values were computed. There were no significant differences between ADHD and control groups on age and gender. The mean LZC of the ADHD children was significantly larger than healthy children over the right anterior and right posterior regions during the cognitive performance. In the ADHD group, complexity of the right hemisphere was higher than that of the left hemisphere, but the complexity of the left hemisphere was higher than that of the right hemisphere in the normal group. Although fronto-striatal dysfunction is considered conclusive evidence for the pathophysiology of ADHD, our arithmetic mental task has provided evidence of structural and functional changes in the posterior regions and probably cerebellum in ADHD.
Automated X-ray Flare Detection with GOES, 2003-2017: The Where of the Flare Catalog and Early Statistical Analysis

NASA Astrophysics Data System (ADS)

Loftus, K.; Saar, S. H.

2017-12-01

NOAA's Space Weather Prediction Center publishes the current definitive public soft X-ray flare catalog, derived using data from the X-ray Sensor (XRS) on the Geostationary Operational Environmental Satellites (GOES) series. However, this flare list has shortcomings for use in scientific analysis. Its detection algorithm has drawbacks (missing smaller flux events and poorly characterizing complex ones), and its event timing is imprecise (peak and end times are frequently marked incorrectly, and hence peak fluxes are underestimated). It also lacks explicit and regular spatial location data. We present a new database, "The Where of the Flare" catalog, which improves upon the precision of NOAA's current version, with more consistent and accurate spatial locations, timings, and peak fluxes. Our catalog also offers several new parameters per flare (e.g. background flux, integrated flux). We use data from the GOES Solar X-ray Imager (SXI) for spatial flare locating. Our detection algorithm is more sensitive to smaller flux events close to the background level and more precisely marks flare start/peak/end times so that integrated flux can be accurately calculated. It also decomposes complex events (with multiple overlapping flares) by constituent peaks. The catalog dates from the operation of the first SXI instrument in 2003 until the present. We give an overview of the detection algorithm's design, review the catalog's features, and discuss preliminary statistical analyses of light curve morphology, complex event decomposition, and integrated flux distribution. The Where of the Flare catalog will be useful in studying X-ray flare statistics and correlating X-ray flare properties with other observations. This work was supported by Contract #8100002705 from Lockheed-Martin to SAO in support of the science of NASA's IRIS mission.
Dissortativity and duplications in oral cancer

NASA Astrophysics Data System (ADS)

Shinde, Pramod; Yadav, Alok; Rai, Aparna; Jalan, Sarika

2015-08-01

More than 300 000 new cases worldwide are being diagnosed with oral cancer annually. Complexity of oral cancer renders designing drug targets very difficult. We analyse protein-protein interaction network for the normal and oral cancer tissue and detect crucial changes in the structural properties of the networks in terms of the interactions of the hub proteins and the degree-degree correlations. Further analysis of the spectra of both the networks, while exhibiting universal statistical behaviour, manifest distinction in terms of the zero degeneracy, providing insight to the complexity of the underlying system.
Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

PubMed Central

Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

2003-01-01

Background Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Results Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. Conclusion In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects. PMID:12962547

Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

PubMed

Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

2003-09-08

Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects.
Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

PubMed Central

Stewart, Gavin B.; Altman, Douglas G.; Askie, Lisa M.; Duley, Lelia; Simmonds, Mark C.; Stewart, Lesley A.

2012-01-01

Background Individual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and Findings We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. Conclusions For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials. PMID:23056232
The Relationship between Zinc Levels and Autism: A Systematic Review and Meta-analysis.

PubMed

Babaknejad, Nasim; Sayehmiri, Fatemeh; Sayehmiri, Kourosh; Mohamadkhani, Ashraf; Bahrami, Somaye

2016-01-01

Autism is a complex behaviorally defined disorder.There is a relationship between zinc (Zn) levels in autistic patients and development of pathogenesis, but the conclusion is not permanent. The present study conducted to estimate this probability using meta-analysis method. In this study, Fixed Effect Model, twelve articles published from 1978 to 2012 were selected by searching Google scholar, PubMed, ISI Web of Science, and Scopus and information were analyzed. I² statistics were calculated to examine heterogeneity. The information was analyzed using R and STATA Ver. 12.2. There was no significant statistical difference between hair, nail, and teeth Zn levels between controls and autistic patients: -0.471 [95% confidence interval (95% CI): -1.172 to 0.231]. There was significant statistical difference between plasma Zn concentration and autistic patients besides healthy controls: -0.253 (95% CI: 0.498 to -0.007). Using a Random Effect Model, the overall Integration of data from the two groups was -0.414 (95% CI: -0.878 to -0.051). Based on sensitivity analysis, zinc supplements can be used for the nutritional therapy for autistic patients.
[On measuring of factors influencing the complex need for cultural entertainments of the inhabitants in geriatric nursing homes (3rd information) (author's transl)].

PubMed

Kuhlmey, J; Lautsch, E

1980-01-01

In our 2. information on the investigation of the need for cultural entertainments of inhabitants in geriatric nursing homes we tested the influence of the factors age, sex, kind of work and during of stay in the geriatric nursing home singly and successively for each single indicator of this complex need. In this 3. information the influence of this four factors was investigated in these contradictory dependency on the indicators under synchronous consideration of their contradictory dependency. The contradictory dependency of the factors was presented by typisation (cluster analysis). As a result of the cluster analysis same classes arose--similar disposed inhabitants belong to same classes. The average coinage in this classes was obtained and differences were analysed by statistical methods multidimensional analysis of variance and analysis of discriminance).
Multiple comparison analysis testing in ANOVA.

PubMed

McHugh, Mary L

2011-01-01

The Analysis of Variance (ANOVA) test has long been an important tool for researchers conducting studies on multiple experimental groups and one or more control groups. However, ANOVA cannot provide detailed information on differences among the various study groups, or on complex combinations of study groups. To fully understand group differences in an ANOVA, researchers must conduct tests of the differences between particular pairs of experimental and control groups. Tests conducted on subsets of data tested previously in another analysis are called post hoc tests. A class of post hoc tests that provide this type of detailed information for ANOVA results are called "multiple comparison analysis" tests. The most commonly used multiple comparison analysis statistics include the following tests: Tukey, Newman-Keuls, Scheffee, Bonferroni and Dunnett. These statistical tools each have specific uses, advantages and disadvantages. Some are best used for testing theory while others are useful in generating new theory. Selection of the appropriate post hoc test will provide researchers with the most detailed information while limiting Type 1 errors due to alpha inflation.
Sample size and power considerations in network meta-analysis

PubMed Central

2012-01-01

Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327
Multivariate analysis in thoracic research.

PubMed

Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego

2015-03-01

Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
Association analysis of multiple traits by an approach of combining P values.

PubMed

Chen, Lili; Wang, Yong; Zhou, Yajing

2018-03-01

Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

NASA Astrophysics Data System (ADS)

Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

2017-10-01

The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Testing Hypotheses about Sun-Climate Complexity Linking

NASA Astrophysics Data System (ADS)

Rypdal, M.; Rypdal, K.

2010-03-01

We reexamine observational evidence presented in support of the hypothesis of a sun-climate complexity linking by N. Scafetta and B. J. West, Phys. Rev. Lett. 90, 248701 (2003)PRLTAO0031-900710.1103/PhysRevLett.90.248701, which contended that the integrated solar flare index (SFI) and the global temperature anomaly (GTA) both follow Lévy walk statistics with the same waiting-time exponent μ≈2.1. However, their analysis does not account for trends in the signal, cannot deal correctly with infinite variance processes (Lévy flights), and suffers from considering only the second moment. Our analysis shows that properly detrended, the integrated SFI is well described as a Lévy flight, and the integrated GTA as a persistent fractional Brownian motion. These very different stochastic properties of the solar and climate records do not support the hypothesis of a sun-climate complexity linking.
Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset.

PubMed

Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne

2016-01-30

Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.
Data Treatment for LC-MS Untargeted Analysis.

PubMed

Riccadonna, Samantha; Franceschi, Pietro

2018-01-01

Liquid chromatography-mass spectrometry (LC-MS) untargeted experiments require complex chemometrics strategies to extract information from the experimental data. Here we discuss "data preprocessing", the set of procedures performed on the raw data to produce a data matrix which will be the starting point for the subsequent statistical analysis. Data preprocessing is a crucial step on the path to knowledge extraction, which should be carefully controlled and optimized in order to maximize the output of any untargeted metabolomics investigation.
Nonlinear Analysis of Time Series in Genome-Wide Linkage Disequilibrium Data

NASA Astrophysics Data System (ADS)

Hernández-Lemus, Enrique; Estrada-Gil, Jesús K.; Silva-Zolezzi, Irma; Fernández-López, J. Carlos; Hidalgo-Miranda, Alfredo; Jiménez-Sánchez, Gerardo

2008-02-01

The statistical study of large scale genomic data has turned out to be a very important tool in population genetics. Quantitative methods are essential to understand and implement association studies in the biomedical and health sciences. Nevertheless, the characterization of recently admixed populations has been an elusive problem due to the presence of a number of complex phenomena. For example, linkage disequilibrium structures are thought to be more complex than their non-recently admixed population counterparts, presenting the so-called ancestry blocks, admixed regions that are not yet smoothed by the effect of genetic recombination. In order to distinguish characteristic features for various populations we have implemented several methods, some of them borrowed or adapted from the analysis of nonlinear time series in statistical physics and quantitative physiology. We calculate the main fractal dimensions (Kolmogorov's capacity, information dimension and correlation dimension, usually named, D0, D1 and D2). We also have made detrended fluctuation analysis and information based similarity index calculations for the probability distribution of correlations of linkage disequilibrium coefficient of six recently admixed (mestizo) populations within the Mexican Genome Diversity Project [1] and for the non-recently admixed populations in the International HapMap Project [2]. Nonlinear correlations showed up as a consequence of internal structure within the haplotype distributions. The analysis of these correlations as well as the scope and limitations of these procedures within the biomedical sciences are discussed.
Analysis of one gravitational slope cycle

NASA Astrophysics Data System (ADS)

Palis, Edouard; Lebourg, Thomas; Vidal, Maurin; Tric, Emmanuel

2015-04-01

Since about twenty years of studies on landslides, we realized the role and subtle interactions that existed between the structural complexity, masses dynamics and complex internal circulation of fluids. The La Clapière DSL (Deep-Seated Landslide) is now very well known by the scientific community (volume, impact, challenges, observations...), but this mass of knowledge, has not yet been compiled nor looked through a coupled analysis of its spatial and temporal variability. Since 2007, a will to share and access to uniform data was set up by the Versant Instabilities Multidisciplinary Observatory (OMIV, National Service of French Observation (SNO)). This observatory (with associated laboratories) allowed the installation of permanent and autonomous measuring stations: GPS, meteorology, seismology, water chemistry sources. For two years now, a permanent electrical tomography device is installed at the bottom of the slope to complement the current monitoring system, and allowed a deeper understanding of the physical changes in the massif. The analysis of these data allows to observe different dynamic regimes, as well as different responses to external factors: instantaneous, delayed, long-term variability. The purpose of this synthesis study is to analyze the temporal and spatial evolution of the electrical resistivity, displacement and hydrometeors for one year cycle (November 2012 to November 2013). Thus, a qualitative and statistical approach by clusters, principal component analysis (PCA), and temporal pseudo-3D of these variables was established. This new statistical study also explains the major role of the fault and the base of the landslide, as well as the chronology of the water flow in the massif, allowing a better understanding of the complex and uneven in time dynamic in this area.
European economies in crisis: A multifractal analysis of disruptive economic events and the effects of financial assistance

NASA Astrophysics Data System (ADS)

Siokis, Fotios M.

2014-02-01

We analyze the complexity of rare economic events in troubled European economies. The economic crisis initiated at the end of 2009, forced a number of European economies to request financial assistance from world organizations. By employing the stock market index as a leading indicator of the economic activity, we test whether the financial assistance programs altered the statistical properties of the index. The effects of major financial program agreements on the economies can be best illustrated by the comparison of the multifractal spectra of the time series before and after the agreement. We reveal that the returns of the time series exhibit strong multifractal properties for all periods under investigation. In two of the three investigated economies, financial assistance along with governments’ initiatives appear to have altered the statistical properties of the stock market indexes increasing the width of the multifractal spectra and thus the complexity of the market.
Universality classes of fluctuation dynamics in hierarchical complex systems

NASA Astrophysics Data System (ADS)

Macêdo, A. M. S.; González, Iván R. Roa; Salazar, D. S. P.; Vasconcelos, G. L.

2017-03-01

A unified approach is proposed to describe the statistics of the short-time dynamics of multiscale complex systems. The probability density function of the relevant time series (signal) is represented as a statistical superposition of a large time-scale distribution weighted by the distribution of certain internal variables that characterize the slowly changing background. The dynamics of the background is formulated as a hierarchical stochastic model whose form is derived from simple physical constraints, which in turn restrict the dynamics to only two possible classes. The probability distributions of both the signal and the background have simple representations in terms of Meijer G functions. The two universality classes for the background dynamics manifest themselves in the signal distribution as two types of tails: power law and stretched exponential, respectively. A detailed analysis of empirical data from classical turbulence and financial markets shows excellent agreement with the theory.
Study nonlinear dynamics of stratospheric ozone concentration at Pakistan Terrestrial region

NASA Astrophysics Data System (ADS)

Jan, Bulbul; Zai, Muhammad Ayub Khan Yousuf; Afradi, Faisal Khan; Aziz, Zohaib

2018-03-01

This study investigates the nonlinear dynamics of the stratospheric ozone layer at Pakistan atmospheric region. Ozone considered now the most important issue in the world because of its diverse effects on earth biosphere, including human health, ecosystem, marine life, agriculture yield and climate change. Therefore, this paper deals with total monthly time series data of stratospheric ozone over the Pakistan atmospheric region from 1970 to 2013. Two approaches, basic statistical analysis and Fractal dimension (D) have adapted to study the nature of nonlinear dynamics of stratospheric ozone level. Results obtained from this research have shown that the Hurst exponent values of both methods of fractal dimension revealed an anti-persistent behavior (negatively correlated), i.e. decreasing trend for all lags and Rescaled range analysis is more appropriate as compared to Detrended fluctuation analysis. For seasonal time series all month follows an anti-persistent behavior except in the month of November which shown persistence behavior i.e. time series is an independent and increasing trend. The normality test statistics also confirmed the nonlinear behavior of ozone and the rejection of hypothesis indicates the strong evidence of the complexity of data. This study will be useful to the researchers working in the same field in the future to verify the complex nature of stratospheric ozone.
Combining inventories of land cover and forest resources with prediction models and remotely sensed data

Treesearch

Raymond L. Czaplewski

1989-01-01

It is difficult to design systems for national and global resource inventory and analysis that efficiently satisfy changing, and increasingly complex objectives. It is proposed that individual inventory, monitoring, modeling, and remote sensing systems be specialized to achieve portions of the objectives. These separate systems can be statistically linked to accomplish...
Toward Theological Inclusivism: The Effects of a World Religions Course in a Mormon University

ERIC Educational Resources Information Center

Properzi, Mauro

2017-01-01

Inclusivist, exclusivist, and pluralist attitudes toward other religions interact in complex ways within the Mormon faith. Hence, a course on the world's religions at LDS-sponsored Brigham Young University presents an interesting case study in this context. Through survey data and statistical analysis this article attempts to examine the effect of…
Best Phd thesis Prize: Statistical analysis of ALFALFA galaxies: insights in galaxy

NASA Astrophysics Data System (ADS)

Papastergis, E.

2013-09-01

We use the rich dataset of local universe galaxies detected by the ALFALFA 21cm survey to study the statistical properties of gas-bearing galaxies. In particular, we measure the number density of galaxies as a function of their baryonic mass ("baryonic mass function") and rotational velocity ("velocity width function"), and we characterize their clustering properties ("two-point correlation function"). These statistical distributions are determined by both the properties of dark matter on small scales, as well as by the complex baryonic processes through which galaxies form over cosmic time. We interpret the ALFALFA measurements with the aid of publicly available cosmological N-body simulations and we present some key results related to galaxy formation and small-scale cosmology.

A practical approach to language complexity: a Wikipedia case study.

PubMed

Yasseri, Taha; Kornai, András; Kertész, János

2012-01-01

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.
The value and cost of complexity in predictive modelling: role of tissue anisotropic conductivity and fibre tracts in neuromodulation

NASA Astrophysics Data System (ADS)

Salman Shahid, Syed; Bikson, Marom; Salman, Humaira; Wen, Peng; Ahfock, Tony

2014-06-01

Objectives. Computational methods are increasingly used to optimize transcranial direct current stimulation (tDCS) dose strategies and yet complexities of existing approaches limit their clinical access. Since predictive modelling indicates the relevance of subject/pathology based data and hence the need for subject specific modelling, the incremental clinical value of increasingly complex modelling methods must be balanced against the computational and clinical time and costs. For example, the incorporation of multiple tissue layers and measured diffusion tensor (DTI) based conductivity estimates increase model precision but at the cost of clinical and computational resources. Costs related to such complexities aggregate when considering individual optimization and the myriad of potential montages. Here, rather than considering if additional details change current-flow prediction, we consider when added complexities influence clinical decisions. Approach. Towards developing quantitative and qualitative metrics of value/cost associated with computational model complexity, we considered field distributions generated by two 4 × 1 high-definition montages (m1 = 4 × 1 HD montage with anode at C3 and m2 = 4 × 1 HD montage with anode at C1) and a single conventional (m3 = C3-Fp2) tDCS electrode montage. We evaluated statistical methods, including residual error (RE) and relative difference measure (RDM), to consider the clinical impact and utility of increased complexities, namely the influence of skull, muscle and brain anisotropic conductivities in a volume conductor model. Main results. Anisotropy modulated current-flow in a montage and region dependent manner. However, significant statistical changes, produced within montage by anisotropy, did not change qualitative peak and topographic comparisons across montages. Thus for the examples analysed, clinical decision on which dose to select would not be altered by the omission of anisotropic brain conductivity. Significance. Results illustrate the need to rationally balance the role of model complexity, such as anisotropy in detailed current flow analysis versus value in clinical dose design. However, when extending our analysis to include axonal polarization, the results provide presumably clinically meaningful information. Hence the importance of model complexity may be more relevant with cellular level predictions of neuromodulation.
All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds

PubMed Central

2017-01-01

Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-‘R'-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment. PMID:29119109
Statistics and Informatics in Space Astrophysics

NASA Astrophysics Data System (ADS)

Feigelson, E.

2017-12-01

The interest in statistical and computational methodology has seen rapid growth in space-based astrophysics, parallel to the growth seen in Earth remote sensing. There is widespread agreement that scientific interpretation of the cosmic microwave background, discovery of exoplanets, and classifying multiwavelength surveys is too complex to be accomplished with traditional techniques. NASA operates several well-functioning Science Archive Research Centers providing 0.5 PBy datasets to the research community. These databases are integrated with full-text journal articles in the NASA Astrophysics Data System (200K pageviews/day). Data products use interoperable formats and protocols established by the International Virtual Observatory Alliance. NASA supercomputers also support complex astrophysical models of systems such as accretion disks and planet formation. Academic researcher interest in methodology has significantly grown in areas such as Bayesian inference and machine learning, and statistical research is underway to treat problems such as irregularly spaced time series and astrophysical model uncertainties. Several scholarly societies have created interest groups in astrostatistics and astroinformatics. Improvements are needed on several fronts. Community education in advanced methodology is not sufficiently rapid to meet the research needs. Statistical procedures within NASA science analysis software are sometimes not optimal, and pipeline development may not use modern software engineering techniques. NASA offers few grant opportunities supporting research in astroinformatics and astrostatistics.
Applications of modern statistical methods to analysis of data in physical science

NASA Astrophysics Data System (ADS)

Wicker, James Eric

Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
Mild cognitive impairment and fMRI studies of brain functional connectivity: the state of the art

PubMed Central

Farràs-Permanyer, Laia; Guàrdia-Olmos, Joan; Peró-Cebollero, Maribel

2015-01-01

In the last 15 years, many articles have studied brain connectivity in Mild Cognitive Impairment patients with fMRI techniques, seemingly using different connectivity statistical models in each investigation to identify complex connectivity structures so as to recognize typical behavior in this type of patient. This diversity in statistical approaches may cause problems in results comparison. This paper seeks to describe how researchers approached the study of brain connectivity in MCI patients using fMRI techniques from 2002 to 2014. The focus is on the statistical analysis proposed by each research group in reference to the limitations and possibilities of those techniques to identify some recommendations to improve the study of functional connectivity. The included articles came from a search of Web of Science and PsycINFO using the following keywords: f MRI, MCI, and functional connectivity. Eighty-one papers were found, but two of them were discarded because of the lack of statistical analysis. Accordingly, 79 articles were included in this review. We summarized some parts of the articles, including the goal of every investigation, the cognitive paradigm and methods used, brain regions involved, use of ROI analysis and statistical analysis, emphasizing on the connectivity estimation model used in each investigation. The present analysis allowed us to confirm the remarkable variability of the statistical analysis methods found. Additionally, the study of brain connectivity in this type of population is not providing, at the moment, any significant information or results related to clinical aspects relevant for prediction and treatment. We propose to follow guidelines for publishing fMRI data that would be a good solution to the problem of study replication. The latter aspect could be important for future publications because a higher homogeneity would benefit the comparison between publications and the generalization of results. PMID:26300802
Statistical methods used in articles published by the Journal of Periodontal and Implant Science.

PubMed

Choi, Eunsil; Lyu, Jiyoung; Park, Jinyoung; Kim, Hae-Young

2014-12-01

The purposes of this study were to assess the trend of use of statistical methods including parametric and nonparametric methods and to evaluate the use of complex statistical methodology in recent periodontal studies. This study analyzed 123 articles published in the Journal of Periodontal & Implant Science (JPIS) between 2010 and 2014. Frequencies and percentages were calculated according to the number of statistical methods used, the type of statistical method applied, and the type of statistical software used. Most of the published articles considered (64.4%) used statistical methods. Since 2011, the percentage of JPIS articles using statistics has increased. On the basis of multiple counting, we found that the percentage of studies in JPIS using parametric methods was 61.1%. Further, complex statistical methods were applied in only 6 of the published studies (5.0%), and nonparametric statistical methods were applied in 77 of the published studies (38.9% of a total of 198 studies considered). We found an increasing trend towards the application of statistical methods and nonparametric methods in recent periodontal studies and thus, concluded that increased use of complex statistical methodology might be preferred by the researchers in the fields of study covered by JPIS.
Minimized state complexity of quantum-encoded cryptic processes

NASA Astrophysics Data System (ADS)

Riechers, Paul M.; Mahoney, John R.; Aghamohammadi, Cina; Crutchfield, James P.

2016-05-01

The predictive information required for proper trajectory sampling of a stochastic process can be more efficiently transmitted via a quantum channel than a classical one. This recent discovery allows quantum information processing to drastically reduce the memory necessary to simulate complex classical stochastic processes. It also points to a new perspective on the intrinsic complexity that nature must employ in generating the processes we observe. The quantum advantage increases with codeword length: the length of process sequences used in constructing the quantum communication scheme. In analogy with the classical complexity measure, statistical complexity, we use this reduced communication cost as an entropic measure of state complexity in the quantum representation. Previously difficult to compute, the quantum advantage is expressed here in closed form using spectral decomposition. This allows for efficient numerical computation of the quantum-reduced state complexity at all encoding lengths, including infinite. Additionally, it makes clear how finite-codeword reduction in state complexity is controlled by the classical process's cryptic order, and it allows asymptotic analysis of infinite-cryptic-order processes.
A generalized association test based on U statistics.

PubMed

Wei, Changshuai; Lu, Qing

2017-07-01

Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotypes. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian Kernel-based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer's disease neuroimaging initiative data, identifying three genes, APOE , APOC1 and TOMM40 , associated with imaging phenotype. We developed a C ++ package for analysis of WGS data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu . weichangshuai@gmail.com ; qlu@epi.msu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Effects of additional data on Bayesian clustering.

PubMed

Yamazaki, Keisuke

2017-10-01

Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluation of forensic DNA mixture evidence: protocol for evaluation, interpretation, and statistical calculations using the combined probability of inclusion.

PubMed

Bieber, Frederick R; Buckleton, John S; Budowle, Bruce; Butler, John M; Coble, Michael D

2016-08-31

The evaluation and interpretation of forensic DNA mixture evidence faces greater interpretational challenges due to increasingly complex mixture evidence. Such challenges include: casework involving low quantity or degraded evidence leading to allele and locus dropout; allele sharing of contributors leading to allele stacking; and differentiation of PCR stutter artifacts from true alleles. There is variation in statistical approaches used to evaluate the strength of the evidence when inclusion of a specific known individual(s) is determined, and the approaches used must be supportable. There are concerns that methods utilized for interpretation of complex forensic DNA mixtures may not be implemented properly in some casework. Similar questions are being raised in a number of U.S. jurisdictions, leading to some confusion about mixture interpretation for current and previous casework. Key elements necessary for the interpretation and statistical evaluation of forensic DNA mixtures are described. Given the most common method for statistical evaluation of DNA mixtures in many parts of the world, including the USA, is the Combined Probability of Inclusion/Exclusion (CPI/CPE). Exposition and elucidation of this method and a protocol for use is the focus of this article. Formulae and other supporting materials are provided. Guidance and details of a DNA mixture interpretation protocol is provided for application of the CPI/CPE method in the analysis of more complex forensic DNA mixtures. This description, in turn, should help reduce the variability of interpretation with application of this methodology and thereby improve the quality of DNA mixture interpretation throughout the forensic community.
THE MEASUREMENT OF BONE QUALITY USING GRAY LEVEL CO-OCCURRENCE MATRIX TEXTURAL FEATURES.

PubMed

Shirvaikar, Mukul; Huang, Ning; Dong, Xuanliang Neil

2016-10-01

In this paper, statistical methods for the estimation of bone quality to predict the risk of fracture are reported. Bone mineral density and bone architecture properties are the main contributors of bone quality. Dual-energy X-ray Absorptiometry (DXA) is the traditional clinical measurement technique for bone mineral density, but does not include architectural information to enhance the prediction of bone fragility. Other modalities are not practical due to cost and access considerations. This study investigates statistical parameters based on the Gray Level Co-occurrence Matrix (GLCM) extracted from two-dimensional projection images and explores links with architectural properties and bone mechanics. Data analysis was conducted on Micro-CT images of 13 trabecular bones (with an in-plane spatial resolution of about 50μm). Ground truth data for bone volume fraction (BV/TV), bone strength and modulus were available based on complex 3D analysis and mechanical tests. Correlation between the statistical parameters and biomechanical test results was studied using regression analysis. The results showed Cluster-Shade was strongly correlated with the microarchitecture of the trabecular bone and related to mechanical properties. Once the principle thesis of utilizing second-order statistics is established, it can be extended to other modalities, providing cost and convenience advantages for patients and doctors.
THE MEASUREMENT OF BONE QUALITY USING GRAY LEVEL CO-OCCURRENCE MATRIX TEXTURAL FEATURES

PubMed Central

Shirvaikar, Mukul; Huang, Ning; Dong, Xuanliang Neil

2016-01-01

In this paper, statistical methods for the estimation of bone quality to predict the risk of fracture are reported. Bone mineral density and bone architecture properties are the main contributors of bone quality. Dual-energy X-ray Absorptiometry (DXA) is the traditional clinical measurement technique for bone mineral density, but does not include architectural information to enhance the prediction of bone fragility. Other modalities are not practical due to cost and access considerations. This study investigates statistical parameters based on the Gray Level Co-occurrence Matrix (GLCM) extracted from two-dimensional projection images and explores links with architectural properties and bone mechanics. Data analysis was conducted on Micro-CT images of 13 trabecular bones (with an in-plane spatial resolution of about 50μm). Ground truth data for bone volume fraction (BV/TV), bone strength and modulus were available based on complex 3D analysis and mechanical tests. Correlation between the statistical parameters and biomechanical test results was studied using regression analysis. The results showed Cluster-Shade was strongly correlated with the microarchitecture of the trabecular bone and related to mechanical properties. Once the principle thesis of utilizing second-order statistics is established, it can be extended to other modalities, providing cost and convenience advantages for patients and doctors. PMID:28042512
An Investigation of the Fine Spatial Structure of Meteor Streams Using the Relational Database ``Meteor''

NASA Astrophysics Data System (ADS)

Karpov, A. V.; Yumagulov, E. Z.

2003-05-01

We have restored and ordered the archive of meteor observations carried out with a meteor radar complex ``KGU-M5'' since 1986. A relational database has been formed under the control of the Database Management System (DBMS) Oracle 8. We also improved and tested a statistical method for studying the fine spatial structure of meteor streams with allowance for the specific features of application of the DBMS. Statistical analysis of the results of observations made it possible to obtain information about the substance distribution in the Quadrantid, Geminid, and Perseid meteor streams.
Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics

NASA Astrophysics Data System (ADS)

Yang, Albert C.-C.; Hseu, Shu-Shya; Yien, Huey-Wen; Goldberger, Ary L.; Peng, C.-K.

2003-03-01

Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.
Statistical complexity without explicit reference to underlying probabilities

NASA Astrophysics Data System (ADS)

Pennini, F.; Plastino, A.

2018-06-01

We show that extremely simple systems of a not too large number of particles can be simultaneously thermally stable and complex. To such an end, we extend the statistical complexity's notion to simple configurations of non-interacting particles, without appeal to probabilities, and discuss configurational properties.
Design and control strategies for CELSS - Integrating mechanistic paradigms and biological complexities

NASA Technical Reports Server (NTRS)

Moore, B., III; Kaufmann, R.; Reinhold, C.

1981-01-01

Systems analysis and control theory consideration are given to simulations of both individual components and total systems, in order to develop a reliable control strategy for a Controlled Ecological Life Support System (CELSS) which includes complex biological components. Because of the numerous nonlinearities and tight coupling within the biological component, classical control theory may be inadequate and the statistical analysis of factorial experiments more useful. The range in control characteristics of particular species may simplify the overall task by providing an appropriate balance of stability and controllability to match species function in the overall design. The ultimate goal of this research is the coordination of biological and mechanical subsystems in order to achieve a self-supporting environment.
Late-paleozoic granitoid complexes of the southwest Primorye: geochemistry, age and typification

NASA Astrophysics Data System (ADS)

Veldemar, A. A.; Vovna, G. M.

2017-12-01

The article presents the first data of geochemical studies of the Late Permian granitoids of the Gamov Complex located in the southwestern part of the Voznesenskiy terrane. The purpose of the study was to identify the main geochemical features of the Late Paleozoic granitoids of the southwestern Primorye, which in the future will allow us to draw conclusions about the petrogenesis of these granitoids. Elemental analysis of 20 samples was carried out, conducted statistical and mathematical processing of the data, have been constructed representative diagrams and graphs for this group of rocks. Elemental analysis was performed by atomic emission (ICP-AES) and inductively-coupled-plasma (ICP-MS) mass spectrometry, at the Analytical Center FEGI FEB RAS.
Effective control of complex turbulent dynamical systems through statistical functionals.

PubMed

Majda, Andrew J; Qi, Di

2017-05-30

Turbulent dynamical systems characterized by both a high-dimensional phase space and a large number of instabilities are ubiquitous among complex systems in science and engineering, including climate, material, and neural science. Control of these complex systems is a grand challenge, for example, in mitigating the effects of climate change or safe design of technology with fully developed shear turbulence. Control of flows in the transition to turbulence, where there is a small dimension of instabilities about a basic mean state, is an important and successful discipline. In complex turbulent dynamical systems, it is impossible to track and control the large dimension of instabilities, which strongly interact and exchange energy, and new control strategies are needed. The goal of this paper is to propose an effective statistical control strategy for complex turbulent dynamical systems based on a recent statistical energy principle and statistical linear response theory. We illustrate the potential practical efficiency and verify this effective statistical control strategy on the 40D Lorenz 1996 model in forcing regimes with various types of fully turbulent dynamics with nearly one-half of the phase space unstable.
UNCERTAINTY ON RADIATION DOSES ESTIMATED BY BIOLOGICAL AND RETROSPECTIVE PHYSICAL METHODS.

PubMed

Ainsbury, Elizabeth A; Samaga, Daniel; Della Monaca, Sara; Marrale, Maurizio; Bassinet, Celine; Burbidge, Christopher I; Correcher, Virgilio; Discher, Michael; Eakins, Jon; Fattibene, Paola; Güçlü, Inci; Higueras, Manuel; Lund, Eva; Maltar-Strmecki, Nadica; McKeever, Stephen; Rääf, Christopher L; Sholom, Sergey; Veronese, Ivan; Wieser, Albrecht; Woda, Clemens; Trompier, Francois

2018-03-01

Biological and physical retrospective dosimetry are recognised as key techniques to provide individual estimates of dose following unplanned exposures to ionising radiation. Whilst there has been a relatively large amount of recent development in the biological and physical procedures, development of statistical analysis techniques has failed to keep pace. The aim of this paper is to review the current state of the art in uncertainty analysis techniques across the 'EURADOS Working Group 10-Retrospective dosimetry' members, to give concrete examples of implementation of the techniques recommended in the international standards, and to further promote the use of Monte Carlo techniques to support characterisation of uncertainties. It is concluded that sufficient techniques are available and in use by most laboratories for acute, whole body exposures to highly penetrating radiation, but further work will be required to ensure that statistical analysis is always wholly sufficient for the more complex exposure scenarios.

A novel complete-case analysis to determine statistical significance between treatments in an intention-to-treat population of randomized clinical trials involving missing data.

PubMed

Liu, Wei; Ding, Jinhui

2018-04-01

The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

PubMed

Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

2017-08-31

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

PubMed Central

Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

2017-01-01

Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

PubMed

Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

2017-09-15

Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Multi-Level, Multi Time-Scale Fluorescence Intermittency of Photosynthetic LH2 Complexes: A Precursor of Non-Photochemical Quenching?

PubMed

Schörner, Mario; Beyer, Sebastian Reinhardt; Southall, June; Cogdell, Richard J; Köhler, Jürgen

2015-11-05

The light harvesting complex LH2 is a chromoprotein that is an ideal system for studying protein dynamics via the spectral fluctuations of the emission of its intrinsic chromophores. We have immobilized these complexes in a polymer film and studied the fluctuations of the fluorescence intensity from individual complexes over 9 orders of magnitude in time. Combining time-tagged detection of single photons with a change-point analysis has allowed the unambigeous identification of the various intensity levels due to the huge statistical basis of the data set. We propose that the observed intensity level fluctuations reflect conformational changes of the protein backbone that might be a precursor of the mechanism from which nonphotochemical quenching of higher plants has evolved.
Driving factors of interactions between the exchange rate market and the commodity market: A wavelet-based complex network perspective

NASA Astrophysics Data System (ADS)

Wen, Shaobo; An, Haizhong; Chen, Zhihua; Liu, Xueyong

2017-08-01

In traditional econometrics, a time series must be in a stationary sequence. However, it usually shows time-varying fluctuations, and it remains a challenge to execute a multiscale analysis of the data and discover the topological characteristics of conduction in different scales. Wavelet analysis and complex networks in physical statistics have special advantages in solving these problems. We select the exchange rate variable from the Chinese market and the commodity price index variable from the world market as the time series of our study. We explore the driving factors behind the behavior of the two markets and their topological characteristics in three steps. First, we use the Kalman filter to find the optimal estimation of the relationship between the two markets. Second, wavelet analysis is used to extract the scales of the relationship that are driven by different frequency wavelets. Meanwhile, we search for the actual economic variables corresponding to different frequency wavelets. Finally, a complex network is used to search for the transfer characteristics of the combination of states driven by different frequency wavelets. The results show that statistical physics have a unique advantage over traditional econometrics. The Chinese market has time-varying impacts on the world market: it has greater influence when the world economy is stable and less influence in times of turmoil. The process of forming the state combination is random. Transitions between state combinations have a clustering feature. Based on these characteristics, we can effectively reduce the information burden on investors and correctly respond to the government's policy mix.
Genetic architecture of wood properties based on association analysis and co-expression networks in white spruce.

PubMed

Lamara, Mebarek; Raherison, Elie; Lenz, Patrick; Beaulieu, Jean; Bousquet, Jean; MacKay, John

2016-04-01

Association studies are widely utilized to analyze complex traits but their ability to disclose genetic architectures is often limited by statistical constraints, and functional insights are usually minimal in nonmodel organisms like forest trees. We developed an approach to integrate association mapping results with co-expression networks. We tested single nucleotide polymorphisms (SNPs) in 2652 candidate genes for statistical associations with wood density, stiffness, microfibril angle and ring width in a population of 1694 white spruce trees (Picea glauca). Associations mapping identified 229-292 genes per wood trait using a statistical significance level of P < 0.05 to maximize discovery. Over-representation of genes associated for nearly all traits was found in a xylem preferential co-expression group developed in independent experiments. A xylem co-expression network was reconstructed with 180 wood associated genes and several known MYB and NAC regulators were identified as network hubs. The network revealed a link between the gene PgNAC8, wood stiffness and microfibril angle, as well as considerable within-season variation for both genetic control of wood traits and gene expression. Trait associations were distributed throughout the network suggesting complex interactions and pleiotropic effects. Our findings indicate that integration of association mapping and co-expression networks enhances our understanding of complex wood traits. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Right-sizing statistical models for longitudinal data.

PubMed

Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M

2015-12-01

Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).
Multi-level emulation of complex climate model responses to boundary forcing data

NASA Astrophysics Data System (ADS)

Tran, Giang T.; Oliver, Kevin I. C.; Holden, Philip B.; Edwards, Neil R.; Sóbester, András; Challenor, Peter

2018-04-01

Climate model components involve both high-dimensional input and output fields. It is desirable to efficiently generate spatio-temporal outputs of these models for applications in integrated assessment modelling or to assess the statistical relationship between such sets of inputs and outputs, for example, uncertainty analysis. However, the need for efficiency often compromises the fidelity of output through the use of low complexity models. Here, we develop a technique which combines statistical emulation with a dimensionality reduction technique to emulate a wide range of outputs from an atmospheric general circulation model, PLASIM, as functions of the boundary forcing prescribed by the ocean component of a lower complexity climate model, GENIE-1. Although accurate and detailed spatial information on atmospheric variables such as precipitation and wind speed is well beyond the capability of GENIE-1's energy-moisture balance model of the atmosphere, this study demonstrates that the output of this model is useful in predicting PLASIM's spatio-temporal fields through multi-level emulation. Meaningful information from the fast model, GENIE-1 was extracted by utilising the correlation between variables of the same type in the two models and between variables of different types in PLASIM. We present here the construction and validation of several PLASIM variable emulators and discuss their potential use in developing a hybrid model with statistical components.
Sources of difficulty in assessment: example of PISA science items

NASA Astrophysics Data System (ADS)

Le Hebel, Florence; Montpied, Pascale; Tiberghien, Andrée; Fontanieu, Valérie

2017-03-01

The understanding of what makes a question difficult is a crucial concern in assessment. To study the difficulty of test questions, we focus on the case of PISA, which assesses to what degree 15-year-old students have acquired knowledge and skills essential for full participation in society. Our research question is to identify PISA science item characteristics that could influence the item's proficiency level. It is based on an a-priori item analysis and a statistical analysis. Results show that only the cognitive complexity and the format out of the different characteristics of PISA science items determined in our a-priori analysis have an explanatory power on an item's proficiency levels. The proficiency level cannot be explained by the dependence/independence of the information provided in the unit and/or item introduction and the competence. We conclude that in PISA, it appears possible to anticipate a high proficiency level, that is, students' low scores for items displaying a high cognitive complexity. In the case of a middle or low cognitive complexity level item, the cognitive complexity level is not sufficient to predict item difficulty. Other characteristics play a crucial role in item difficulty. We discuss anticipating the difficulties in assessment in a broader perspective.
Accounting for the Impact of Management Scenarios on Typha Domingensis (Cattail) in an Everglades Wetland

NASA Astrophysics Data System (ADS)

Lagerwall, Gareth; Kiker, Gregory; Muñoz-Carpena, Rafael; Wang, Naiming

2017-01-01

The coupled regional simulation model, and the transport and reaction simulation engine were recently adapted to simulate ecology, specifically Typha domingensis (Cattail) dynamics in the Everglades. While Cattail is a native Everglades species, it has become invasive over the years due to an altered habitat over the last few decades, taking over historically Cladium jamaicense (Sawgrass) areas. Two models of different levels of algorithmic complexity were developed in previous studies, and are used here to determine the impact of various management decisions on the average Cattail density within Water Conservation Area 2A in the Everglades. A Global Uncertainty and Sensitivity Analysis was conducted to test the importance of these management scenarios, as well as the effectiveness of using zonal statistics. Management scenarios included high, medium and low initial water depths, soil phosphorus concentrations, initial Cattail and Sawgrass densities, as well as annually alternating water depths and soil phosphorus concentrations, and a steadily decreasing soil phosphorus concentration. Analysis suggests that zonal statistics are good indicators of regional trends, and that high soil phosphorus concentration is a pre-requisite for expansive Cattail growth. It is a complex task to manage Cattail expansion in this region, requiring the close management and monitoring of water depth and soil phosphorus concentration, and possibly other factors not considered in the model complexities. However, this modeling framework with user-definable complexities and management scenarios, can be considered a useful tool in analyzing many more alternatives, which could be used to aid management decisions in the future.
Accounting for the Impact of Management Scenarios on Typha Domingensis (Cattail) in an Everglades Wetland.

PubMed

Lagerwall, Gareth; Kiker, Gregory; Muñoz-Carpena, Rafael; Wang, Naiming

2017-01-01

The coupled regional simulation model, and the transport and reaction simulation engine were recently adapted to simulate ecology, specifically Typha domingensis (Cattail) dynamics in the Everglades. While Cattail is a native Everglades species, it has become invasive over the years due to an altered habitat over the last few decades, taking over historically Cladium jamaicense (Sawgrass) areas. Two models of different levels of algorithmic complexity were developed in previous studies, and are used here to determine the impact of various management decisions on the average Cattail density within Water Conservation Area 2A in the Everglades. A Global Uncertainty and Sensitivity Analysis was conducted to test the importance of these management scenarios, as well as the effectiveness of using zonal statistics. Management scenarios included high, medium and low initial water depths, soil phosphorus concentrations, initial Cattail and Sawgrass densities, as well as annually alternating water depths and soil phosphorus concentrations, and a steadily decreasing soil phosphorus concentration. Analysis suggests that zonal statistics are good indicators of regional trends, and that high soil phosphorus concentration is a pre-requisite for expansive Cattail growth. It is a complex task to manage Cattail expansion in this region, requiring the close management and monitoring of water depth and soil phosphorus concentration, and possibly other factors not considered in the model complexities. However, this modeling framework with user-definable complexities and management scenarios, can be considered a useful tool in analyzing many more alternatives, which could be used to aid management decisions in the future.
Analysis of Power Laws, Shape Collapses, and Neural Complexity: New Techniques and MATLAB Support via the NCC Toolbox.

PubMed

Marshall, Najja; Timme, Nicholas M; Bennett, Nicholas; Ripp, Monica; Lautzenhiser, Edward; Beggs, John M

2016-01-01

Neural systems include interactions that occur across many scales. Two divergent methods for characterizing such interactions have drawn on the physical analysis of critical phenomena and the mathematical study of information. Inferring criticality in neural systems has traditionally rested on fitting power laws to the property distributions of "neural avalanches" (contiguous bursts of activity), but the fractal nature of avalanche shapes has recently emerged as another signature of criticality. On the other hand, neural complexity, an information theoretic measure, has been used to capture the interplay between the functional localization of brain regions and their integration for higher cognitive functions. Unfortunately, treatments of all three methods-power-law fitting, avalanche shape collapse, and neural complexity-have suffered from shortcomings. Empirical data often contain biases that introduce deviations from true power law in the tail and head of the distribution, but deviations in the tail have often been unconsidered; avalanche shape collapse has required manual parameter tuning; and the estimation of neural complexity has relied on small data sets or statistical assumptions for the sake of computational efficiency. In this paper we present technical advancements in the analysis of criticality and complexity in neural systems. We use maximum-likelihood estimation to automatically fit power laws with left and right cutoffs, present the first automated shape collapse algorithm, and describe new techniques to account for large numbers of neural variables and small data sets in the calculation of neural complexity. In order to facilitate future research in criticality and complexity, we have made the software utilized in this analysis freely available online in the MATLAB NCC (Neural Complexity and Criticality) Toolbox.
Analysis of covariance as a remedy for demographic mismatch of research subject groups: some sobering simulations.

PubMed

Adams, K M; Brown, G G; Grant, I

1985-08-01

Analysis of Covariance (ANCOVA) is often used in neuropsychological studies to effect ex-post-facto adjustment of performance variables amongst groups of subjects mismatched on some relevant demographic variable. This paper reviews some of the statistical assumptions underlying this usage. In an attempt to illustrate the complexities of this statistical technique, three sham studies using actual patient data are presented. These staged simulations have varying relationships between group test performance differences and levels of covariate discrepancy. The results were robust and consistent in their nature, and were held to support the wisdom of previous cautions by statisticians concerning the employment of ANCOVA to justify comparisons between incomparable groups. ANCOVA should not be used in neuropsychological research to equate groups unequal on variables such as age and education or to exert statistical control whose objective is to eliminate consideration of the covariate as an explanation for results. Finally, the report advocates by example the use of simulation to further our understanding of neuropsychological variables.
A Flexible Approach for the Statistical Visualization of Ensemble Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Potter, K.; Wilson, A.; Bremer, P.

2009-09-29

Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
A RESEARCH DATABASE FOR IMPROVED DATA MANAGEMENT AND ANALYSIS IN LONGITUDINAL STUDIES

PubMed Central

BIELEFELD, ROGER A.; YAMASHITA, TOYOKO S.; KEREKES, EDWARD F.; ERCANLI, EHAT; SINGER, LYNN T.

2014-01-01

We developed a research database for a five-year prospective investigation of the medical, social, and developmental correlates of chronic lung disease during the first three years of life. We used the Ingres database management system and the Statit statistical software package. The database includes records containing 1300 variables each, the results of 35 psychological tests, each repeated five times (providing longitudinal data on the child, the parents, and behavioral interactions), both raw and calculated variables, and both missing and deferred values. The four-layer menu-driven user interface incorporates automatic activation of complex functions to handle data verification, missing and deferred values, static and dynamic backup, determination of calculated values, display of database status, reports, bulk data extraction, and statistical analysis. PMID:7596250
Anatomical Network Analysis Shows Decoupling of Modular Lability and Complexity in the Evolution of the Primate Skull

PubMed Central

Esteve-Altava, Borja; Boughner, Julia C.; Diogo, Rui; Villmoare, Brian A.; Rasskin-Gutman, Diego

2015-01-01

Modularity and complexity go hand in hand in the evolution of the skull of primates. Because analyses of these two parameters often use different approaches, we do not know yet how modularity evolves within, or as a consequence of, an also-evolving complex organization. Here we use a novel network theory-based approach (Anatomical Network Analysis) to assess how the organization of skull bones constrains the co-evolution of modularity and complexity among primates. We used the pattern of bone contacts modeled as networks to identify connectivity modules and quantify morphological complexity. We analyzed whether modularity and complexity evolved coordinately in the skull of primates. Specifically, we tested Herbert Simon’s general theory of near-decomposability, which states that modularity promotes the evolution of complexity. We found that the skulls of extant primates divide into one conserved cranial module and up to three labile facial modules, whose composition varies among primates. Despite changes in modularity, statistical analyses reject a positive feedback between modularity and complexity. Our results suggest a decoupling of complexity and modularity that translates to varying levels of constraint on the morphological evolvability of the primate skull. This study has methodological and conceptual implications for grasping the constraints that underlie the developmental and functional integration of the skull of humans and other primates. PMID:25992690
Effect of Flexible Duty Hour Policies on Length of Stay for Complex Intra-Abdominal Operations: A Flexibility in Duty Hour Requirements for Surgical Trainees (FIRST) Trial Analysis.

PubMed

Stulberg, Jonah J; Pavey, Emily S; Cohen, Mark E; Ko, Clifford Y; Hoyt, David B; Bilimoria, Karl Y

2017-02-01

Changes to resident duty hour policies in the Flexibility in Duty Hour Requirements for Surgical Trainees (FIRST) trial could impact hospitalized patients' length of stay (LOS) by altering care coordination. Length of stay can also serve as a reflection of all complications, particularly those not captured in the FIRST trial (eg pneumothorax from central line). Programs were randomized to either maintaining current ACGME duty hour policies (Standard arm) or more flexible policies waiving rules on maximum shift lengths and time off between shifts (Flexible arm). Our objective was to determine whether flexibility in resident duty hours affected LOS in patients undergoing high-risk surgical operations. Patients were identified who underwent hepatectomy, pancreatectomy, laparoscopic colectomy, open colectomy, or ventral hernia repair (2014-2015 academic year) at 154 hospitals participating in the FIRST trial. Two procedure-stratified evaluations of LOS were undertaken: multivariable negative binomial regression analysis on LOS and a multivariable logistic regression analysis on the likelihood of a prolonged LOS (>75 th percentile). Before any adjustments, there was no statistically significant difference in overall mean LOS between study arms (Flexible Policy: mean [SD] LOS 6.03 [5.78] days vs Standard Policy: mean LOS 6.21 [5.82] days; p = 0.74). In adjusted analyses, there was no statistically significant difference in LOS between study arms overall (incidence rate ratio for Flexible vs Standard: 0.982; 95% CI, 0.939-1.026; p = 0.41) or for any individual procedures. In addition, there was no statistically significant difference in the proportion of patients with prolonged LOS between study arms overall (Flexible vs Standard: odds ratio = 1.028; 95% CI, 0.871-1.212) or for any individual procedures. Duty hour flexibility had no statistically significant effect on LOS in patients undergoing complex intra-abdominal operations. Copyright © 2016 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Evaluation of physiologic complexity in time series using generalized sample entropy and surrogate data analysis

NASA Astrophysics Data System (ADS)

Eduardo Virgilio Silva, Luiz; Otavio Murta, Luiz

2012-12-01

Complexity in time series is an intriguing feature of living dynamical systems, with potential use for identification of system state. Although various methods have been proposed for measuring physiologic complexity, uncorrelated time series are often assigned high values of complexity, errouneously classifying them as a complex physiological signals. Here, we propose and discuss a method for complex system analysis based on generalized statistical formalism and surrogate time series. Sample entropy (SampEn) was rewritten inspired in Tsallis generalized entropy, as function of q parameter (qSampEn). qSDiff curves were calculated, which consist of differences between original and surrogate series qSampEn. We evaluated qSDiff for 125 real heart rate variability (HRV) dynamics, divided into groups of 70 healthy, 44 congestive heart failure (CHF), and 11 atrial fibrillation (AF) subjects, and for simulated series of stochastic and chaotic process. The evaluations showed that, for nonperiodic signals, qSDiff curves have a maximum point (qSDiffmax) for q ≠1. Values of q where the maximum point occurs and where qSDiff is zero were also evaluated. Only qSDiffmax values were capable of distinguish HRV groups (p-values 5.10×10-3, 1.11×10-7, and 5.50×10-7 for healthy vs. CHF, healthy vs. AF, and CHF vs. AF, respectively), consistently with the concept of physiologic complexity, and suggests a potential use for chaotic system analysis.
Spectral Entropies as Information-Theoretic Tools for Complex Network Comparison

NASA Astrophysics Data System (ADS)

De Domenico, Manlio; Biamonte, Jacob

2016-10-01

Any physical system can be viewed from the perspective that information is implicitly represented in its state. However, the quantification of this information when it comes to complex networks has remained largely elusive. In this work, we use techniques inspired by quantum statistical mechanics to define an entropy measure for complex networks and to develop a set of information-theoretic tools, based on network spectral properties, such as Rényi q entropy, generalized Kullback-Leibler and Jensen-Shannon divergences, the latter allowing us to define a natural distance measure between complex networks. First, we show that by minimizing the Kullback-Leibler divergence between an observed network and a parametric network model, inference of model parameter(s) by means of maximum-likelihood estimation can be achieved and model selection can be performed with appropriate information criteria. Second, we show that the information-theoretic metric quantifies the distance between pairs of networks and we can use it, for instance, to cluster the layers of a multilayer system. By applying this framework to networks corresponding to sites of the human microbiome, we perform hierarchical cluster analysis and recover with high accuracy existing community-based associations. Our results imply that spectral-based statistical inference in complex networks results in demonstrably superior performance as well as a conceptual backbone, filling a gap towards a network information theory.

Coupled disease-behavior dynamics on complex networks: A review

NASA Astrophysics Data System (ADS)

Wang, Zhen; Andrews, Michael A.; Wu, Zhi-Xi; Wang, Lin; Bauch, Chris T.

2015-12-01

It is increasingly recognized that a key component of successful infection control efforts is understanding the complex, two-way interaction between disease dynamics and human behavioral and social dynamics. Human behavior such as contact precautions and social distancing clearly influence disease prevalence, but disease prevalence can in turn alter human behavior, forming a coupled, nonlinear system. Moreover, in many cases, the spatial structure of the population cannot be ignored, such that social and behavioral processes and/or transmission of infection must be represented with complex networks. Research on studying coupled disease-behavior dynamics in complex networks in particular is growing rapidly, and frequently makes use of analysis methods and concepts from statistical physics. Here, we review some of the growing literature in this area. We contrast network-based approaches to homogeneous-mixing approaches, point out how their predictions differ, and describe the rich and often surprising behavior of disease-behavior dynamics on complex networks, and compare them to processes in statistical physics. We discuss how these models can capture the dynamics that characterize many real-world scenarios, thereby suggesting ways that policy makers can better design effective prevention strategies. We also describe the growing sources of digital data that are facilitating research in this area. Finally, we suggest pitfalls which might be faced by researchers in the field, and we suggest several ways in which the field could move forward in the coming years.
Testing for independence in J×K contingency tables with complex sample survey data.

PubMed

Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C

2015-09-01

The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.
Research in Computational Astrobiology

NASA Technical Reports Server (NTRS)

Chaban, Galina; Colombano, Silvano; Scargle, Jeff; New, Michael H.; Pohorille, Andrew; Wilson, Michael A.

2003-01-01

We report on several projects in the field of computational astrobiology, which is devoted to advancing our understanding of the origin, evolution and distribution of life in the Universe using theoretical and computational tools. Research projects included modifying existing computer simulation codes to use efficient, multiple time step algorithms, statistical methods for analysis of astrophysical data via optimal partitioning methods, electronic structure calculations on water-nuclei acid complexes, incorporation of structural information into genomic sequence analysis methods and calculations of shock-induced formation of polycylic aromatic hydrocarbon compounds.
Complex Analysis of Combat in Afghanistan

DTIC Science & Technology

2014-12-01

analysis we have β−ffE ~)( where β= 2H - 1 = 1 - γ, with H being the Hurst exponent , related to the correlation exponent γ. Usually, real-world data are...statistical nature. In every instance we found strong power law correlations in the data, and were able to extract accurate scaling exponents . On the... exponents , α. The case αɘ.5 corresponds to long-term anti-correlations, meaning that large values are most likely to be followed by small values and
Fusion of multiscale wavelet-based fractal analysis on retina image for stroke prediction.

PubMed

Che Azemin, M Z; Kumar, Dinesh K; Wong, T Y; Wang, J J; Kawasaki, R; Mitchell, P; Arjunan, Sridhar P

2010-01-01

In this paper, we present a novel method of analyzing retinal vasculature using Fourier Fractal Dimension to extract the complexity of the retinal vasculature enhanced at different wavelet scales. Logistic regression was used as a fusion method to model the classifier for 5-year stroke prediction. The efficacy of this technique has been tested using standard pattern recognition performance evaluation, Receivers Operating Characteristics (ROC) analysis and medical prediction statistics, odds ratio. Stroke prediction model was developed using the proposed system.
Robust inference for group sequential trials.

PubMed

Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

2017-03-01

For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.
[Applications of mathematical statistics methods on compatibility researches of traditional Chinese medicines formulae].

PubMed

Mai, Lan-Yin; Li, Yi-Xuan; Chen, Yong; Xie, Zhen; Li, Jie; Zhong, Ming-Yu

2014-05-01

The compatibility of traditional Chinese medicines (TCMs) formulae containing enormous information, is a complex component system. Applications of mathematical statistics methods on the compatibility researches of traditional Chinese medicines formulae have great significance for promoting the modernization of traditional Chinese medicines and improving clinical efficacies and optimizations of formulae. As a tool for quantitative analysis, data inference and exploring inherent rules of substances, the mathematical statistics method can be used to reveal the working mechanisms of the compatibility of traditional Chinese medicines formulae in qualitatively and quantitatively. By reviewing studies based on the applications of mathematical statistics methods, this paper were summarized from perspective of dosages optimization, efficacies and changes of chemical components as well as the rules of incompatibility and contraindication of formulae, will provide the references for further studying and revealing the working mechanisms and the connotations of traditional Chinese medicines.
An analysis of I/O efficient order-statistic-based techniques for noise power estimation in the HRMS sky survey's operational system

NASA Technical Reports Server (NTRS)

Zimmerman, G. A.; Olsen, E. T.

1992-01-01

Noise power estimation in the High-Resolution Microwave Survey (HRMS) sky survey element is considered as an example of a constant false alarm rate (CFAR) signal detection problem. Order-statistic-based noise power estimators for CFAR detection are considered in terms of required estimator accuracy and estimator dynamic range. By limiting the dynamic range of the value to be estimated, the performance of an order-statistic estimator can be achieved by simpler techniques requiring only a single pass of the data. Simple threshold-and-count techniques are examined, and it is shown how several parallel threshold-and-count estimation devices can be used to expand the dynamic range to meet HRMS system requirements with minimal hardware complexity. An input/output (I/O) efficient limited-precision order-statistic estimator with wide but limited dynamic range is also examined.
Bayesian models based on test statistics for multiple hypothesis testing problems.

PubMed

Ji, Yuan; Lu, Yiling; Mills, Gordon B

2008-04-01

We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
The statistical big bang of 1911: ideology, technological innovation and the production of medical statistics.

PubMed

Higgs, W

1996-12-01

This paper examines the relationship between intellectual debate, technologies for analysing information, and the production of statistics in the General Register Office (GRO) in London in the early twentieth century. It argues that controversy between eugenicists and public health officials respecting the cause and effect of class-specific variations in fertility led to the introduction of questions in the 1911 census on marital fertility. The increasing complexity of the census necessitated a shift from manual to mechanised forms of data processing within the GRO. The subsequent increase in processing power allowed the GRO to make important changes to the medical and demographic statistics it published in the annual Reports of the Registrar General. These included substituting administrative sanitary districts for registration districts as units of analysis, consistently transferring deaths in institutions back to place of residence, and abstracting deaths according to the International List of Causes of Death.
Dynamic heterogeneity and non-Gaussian statistics for acetylcholine receptors on live cell membrane

NASA Astrophysics Data System (ADS)

He, W.; Song, H.; Su, Y.; Geng, L.; Ackerson, B. J.; Peng, H. B.; Tong, P.

2016-05-01

The Brownian motion of molecules at thermal equilibrium usually has a finite correlation time and will eventually be randomized after a long delay time, so that their displacement follows the Gaussian statistics. This is true even when the molecules have experienced a complex environment with a finite correlation time. Here, we report that the lateral motion of the acetylcholine receptors on live muscle cell membranes does not follow the Gaussian statistics for normal Brownian diffusion. From a careful analysis of a large volume of the protein trajectories obtained over a wide range of sampling rates and long durations, we find that the normalized histogram of the protein displacements shows an exponential tail, which is robust and universal for cells under different conditions. The experiment indicates that the observed non-Gaussian statistics and dynamic heterogeneity are inherently linked to the slow-active remodelling of the underlying cortical actin network.
Site Suitability Analysis for Beekeeping via Analythical Hyrearchy Process, Konya Example

NASA Astrophysics Data System (ADS)

Sarı, F.; Ceylan, D. A.

2017-11-01

Over the past decade, the importance of the beekeeping activities has been emphasized in the field of biodiversity, ecosystems, agriculture and human health. Thus, efficient management and deciding correct beekeeping activities seems essential to maintain and improve productivity and efficiency. Due to this importance, considering the economic contributions to the rural area, the need for suitability analysis concept has been revealed. At this point, Multi Criteria Decision Analysis (MCDA) and Geographical Information Systems (GIS) integration provides efficient solutions to the complex structure of decision- making process for beekeeping activities. In this study, site suitability analysis via Analytical Hierarchy Process (AHP) was carried out for Konya city in Turkey. Slope, elevation, aspect, distance to water resources, roads and settlements, precipitation and flora criteria are included to determine suitability. The requirements, expectations and limitations of beekeeping activities are specified with the participation of experts and stakeholders. The final suitability map were validated with existing 117 beekeeping locations and Turkish Statistical Institute 2016 beekeeping statistics for Konya province.
Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

PubMed

Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

2016-02-01

Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
A Survey of Logic Formalisms to Support Mishap Analysis

NASA Technical Reports Server (NTRS)

Johnson, Chris; Holloway, C. M.

2003-01-01

Mishap investigations provide important information about adverse events and near miss incidents. They are intended to help avoid any recurrence of previous failures. Over time, they can also yield statistical information about incident frequencies that helps to detect patterns of failure and can validate risk assessments. However, the increasing complexity of many safety critical systems is posing new challenges for mishap analysis. Similarly, the recognition that many failures have complex, systemic causes has helped to widen the scope of many mishap investigations. These two factors have combined to pose new challenges for the analysis of adverse events. A new generation of formal and semi-formal techniques have been proposed to help investigators address these problems. We introduce the term mishap logics to collectively describe these notations that might be applied to support the analysis of mishaps. The proponents of these notations have argued that they can be used to formally prove that certain events created the necessary and sufficient causes for a mishap to occur. These proofs can be used to reduce the bias that is often perceived to effect the interpretation of adverse events. Others have argued that one cannot use logic formalisms to prove causes in the same way that one might prove propositions or theorems. Such mechanisms cannot accurately capture the wealth of inductive, deductive and statistical forms of inference that investigators must use in their analysis of adverse events. This paper provides an overview of these mishap logics. It also identifies several additional classes of logic that might also be used to support mishap analysis.
Detecting transitions in protein dynamics using a recurrence quantification analysis based bootstrap method.

PubMed

Karain, Wael I

2017-11-28

Proteins undergo conformational transitions over different time scales. These transitions are closely intertwined with the protein's function. Numerous standard techniques such as principal component analysis are used to detect these transitions in molecular dynamics simulations. In this work, we add a new method that has the ability to detect transitions in dynamics based on the recurrences in the dynamical system. It combines bootstrapping and recurrence quantification analysis. We start from the assumption that a protein has a "baseline" recurrence structure over a given period of time. Any statistically significant deviation from this recurrence structure, as inferred from complexity measures provided by recurrence quantification analysis, is considered a transition in the dynamics of the protein. We apply this technique to a 132 ns long molecular dynamics simulation of the β-Lactamase Inhibitory Protein BLIP. We are able to detect conformational transitions in the nanosecond range in the recurrence dynamics of the BLIP protein during the simulation. The results compare favorably to those extracted using the principal component analysis technique. The recurrence quantification analysis based bootstrap technique is able to detect transitions between different dynamics states for a protein over different time scales. It is not limited to linear dynamics regimes, and can be generalized to any time scale. It also has the potential to be used to cluster frames in molecular dynamics trajectories according to the nature of their recurrence dynamics. One shortcoming for this method is the need to have large enough time windows to insure good statistical quality for the recurrence complexity measures needed to detect the transitions.
Voltage-Gated Potassium Channel Antibodies in Slow-Progression Motor Neuron Disease.

PubMed

Godani, Massimiliano; Zoccarato, Marco; Beronio, Alessandro; Zuliani, Luigi; Benedetti, Luana; Giometto, Bruno; Del Sette, Massimo; Raggio, Elisa; Baldi, Roberta; Vincent, Angela

2017-01-01

The spectrum of autoimmune neurological diseases associated with voltage-gated potassium channel (VGKC)-complex antibodies (Abs) ranges from peripheral nerve disorders to limbic encephalitis. Recently, low titers of VGKC-complex Abs have also been reported in neurodegenerative disorders, but their clinical relevance is unknown. The aim of the study was to explore the prevalence of VGKC-complex Abs in slow-progression motor neuron disease (MND). We compared 11 patients affected by slow-progression MND with 9 patients presenting typical progression illness. Sera were tested for VGKC-complex Abs by radioimmunoassay. The distribution of VGKC-complex Abs was analyzed with the Mann-Whitney U test. The statistical analysis showed a significant difference between the mean values in the study and control groups. A case with long-survival MND harboring VGKC-complex Abs and treated with intravenous immunoglobulins is described. Although VGKC-complex Abs are not likely to be pathogenic, these results could reflect the coexistence of an immunological activation in patients with slow disease progression. © 2016 S. Karger AG, Basel.
Nonlinear stochastic interacting dynamics and complexity of financial gasket fractal-like lattice percolation

NASA Astrophysics Data System (ADS)

Zhang, Wei; Wang, Jun

2018-05-01

A novel nonlinear stochastic interacting price dynamics is proposed and investigated by the bond percolation on Sierpinski gasket fractal-like lattice, aim to make a new approach to reproduce and study the complexity dynamics of real security markets. Fractal-like lattices correspond to finite graphs with vertices and edges, which are similar to fractals, and Sierpinski gasket is a well-known example of fractals. Fractional ordinal array entropy and fractional ordinal array complexity are introduced to analyze the complexity behaviors of financial signals. To deeper comprehend the fluctuation characteristics of the stochastic price evolution, the complexity analysis of random logarithmic returns and volatility are preformed, including power-law distribution, fractional sample entropy and fractional ordinal array complexity. For further verifying the rationality and validity of the developed stochastic price evolution, the actual security market dataset are also studied with the same statistical methods for comparison. The empirical results show that this stochastic price dynamics can reconstruct complexity behaviors of the actual security markets to some extent.
Distinct white matter integrity in glutamic acid decarboxylase and voltage-gated potassium channel-complex antibody-associated limbic encephalitis.

PubMed

Wagner, Jan; Schoene-Bake, Jan-Christoph; Witt, Juri-Alexander; Helmstaedter, Christoph; Malter, Michael P; Stoecker, Winfried; Probst, Christian; Weber, Bernd; Elger, Christian E

2016-03-01

Autoantibodies against glutamic acid decarboxylase (GAD) and the voltage-gated potassium channel (VGKC) complex are associated with distinct subtypes of limbic encephalitis regarding clinical presentation, response to therapy, and outcome. The aim of this study was to investigate white matter changes in these two limbic encephalitis subtypes by means of diffusion tensor imaging (DTI). Diffusion data were obtained in 14 patients with GAD antibodies and 16 patients with VGKC-complex antibodies and compared with age- and gender-matched control groups. Voxelwise statistical analysis was carried out using tract-based spatial statistics. The results were furthermore compared with those of 15 patients with unilateral histologically confirmed hippocampal sclerosis and correlated with verbal and figural memory performance. We found widespread changes of fractional anisotropy and all diffusivity parameters in GAD-associated limbic encephalitis, whereas no changes were found in VGKC-complex-associated limbic encephalitis. The changes observed in the GAD group were even more extensive when compared against those of the hippocampal sclerosis group, although the disease duration was markedly shorter in patients with GAD antibodies. Correlation analysis revealed areas with a trend toward a negative correlation of diffusivity parameters with figural memory performance located mainly in the right temporal lobe in the GAD group as well. The present study provides further evidence that, depending on the associated antibody, limbic encephalitis features clearly distinct imaging characteristics by showing widespread white matter changes in GAD-associated limbic encephalitis and preserved white matter integrity in VGKC-complex-associated limbic encephalitis. Furthermore, our results contribute to a better understanding of the specific pathophysiologic properties in these two subforms of limbic encephalitis by revealing that patients with GAD antibodies show widespread affections of white matter across various regions of the brain. In contrast to this, the inflammatory process seems to be more localized in VGKC-complex-associated limbic encephalitis, primarily affecting mesiotemporal gray matter. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.
Analyzing Immunoglobulin Repertoires

PubMed Central

Chaudhary, Neha; Wesemann, Duane R.

2018-01-01

Somatic assembly of T cell receptor and B cell receptor (BCR) genes produces a vast diversity of lymphocyte antigen recognition capacity. The advent of efficient high-throughput sequencing of lymphocyte antigen receptor genes has recently generated unprecedented opportunities for exploration of adaptive immune responses. With these opportunities have come significant challenges in understanding the analysis techniques that most accurately reflect underlying biological phenomena. In this regard, sample preparation and sequence analysis techniques, which have largely been borrowed and adapted from other fields, continue to evolve. Here, we review current methods and challenges of library preparation, sequencing and statistical analysis of lymphocyte receptor repertoire studies. We discuss the general steps in the process of immune repertoire generation including sample preparation, platforms available for sequencing, processing of sequencing data, measurable features of the immune repertoire, and the statistical tools that can be used for analysis and interpretation of the data. Because BCR analysis harbors additional complexities, such as immunoglobulin (Ig) (i.e., antibody) gene somatic hypermutation and class switch recombination, the emphasis of this review is on Ig/BCR sequence analysis. PMID:29593723
Study/Experimental/Research Design: Much More Than Statistics

PubMed Central

Knight, Kenneth L.

2010-01-01

Abstract Context: The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes “Methods” sections hard to read and understand. Objective: To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. Description: The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Advantages: Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results. PMID:20064054

Statistical methods to estimate treatment effects from multichannel electroencephalography (EEG) data in clinical trials.

PubMed

Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir

2010-07-15

With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.
Sparse approximation of currents for statistics on curves and surfaces.

PubMed

Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas

2008-01-01

Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.
The Problem of Auto-Correlation in Parasitology

PubMed Central

Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick

2012-01-01

Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
Proposal for a biometrics of the cortical surface: a statistical method for relative surface distance metrics

NASA Astrophysics Data System (ADS)

Bookstein, Fred L.

1995-08-01

Recent advances in computational geometry have greatly extended the range of neuroanatomical questions that can be approached by rigorous quantitative methods. One of the major current challenges in this area is to describe the variability of human cortical surface form and its implications for individual differences in neurophysiological functioning. Existing techniques for representation of stochastically invaginated surfaces do not conduce to the necessary parametric statistical summaries. In this paper, following a hint from David Van Essen and Heather Drury, I sketch a statistical method customized for the constraints of this complex data type. Cortical surface form is represented by its Riemannian metric tensor and averaged according to parameters of a smooth averaged surface. Sulci are represented by integral trajectories of the smaller principal strains of this metric, and their statistics follow the statistics of that relative metric. The diagrams visualizing this tensor analysis look like alligator leather but summarize all aspects of cortical surface form in between the principal sulci, the reliable ones; no flattening is required.
Complex Adaptive System Models and the Genetic Analysis of Plasma HDL-Cholesterol Concentration

PubMed Central

Rea, Thomas J.; Brown, Christine M.; Sing, Charles F.

2006-01-01

Despite remarkable advances in diagnosis and therapy, ischemic heart disease (IHD) remains a leading cause of morbidity and mortality in industrialized countries. Recent efforts to estimate the influence of genetic variation on IHD risk have focused on predicting individual plasma high-density lipoprotein cholesterol (HDL-C) concentration. Plasma HDL-C concentration (mg/dl), a quantitative risk factor for IHD, has a complex multifactorial etiology that involves the actions of many genes. Single gene variations may be necessary but are not individually sufficient to predict a statistically significant increase in risk of disease. The complexity of phenotype-genotype-environment relationships involved in determining plasma HDL-C concentration has challenged commonly held assumptions about genetic causation and has led to the question of which combination of variations, in which subset of genes, in which environmental strata of a particular population significantly improves our ability to predict high or low risk phenotypes. We document the limitations of inferences from genetic research based on commonly accepted biological models, consider how evidence for real-world dynamical interactions between HDL-C determinants challenges the simplifying assumptions implicit in traditional linear statistical genetic models, and conclude by considering research options for evaluating the utility of genetic information in predicting traits with complex etiologies. PMID:17146134
The Labor Market in the Regions of Belarus: An Analysis of Employment Tendencies

ERIC Educational Resources Information Center

Sokolova, G. N.

2013-01-01

In Belarus, the ways in which statistics are compiled, the complex rules for registering as unemployed, and the segmentation of the labor market and job-seeking activities, all combine to hide the actual levels of employment and unemployment. This in turn makes it difficult to develop appropriate and effective labor policies, and to have support…
Multivariate Statistical Analysis: a tool for groundwater quality assessment in the hidrogeologic region of the Ring of Cenotes, Yucatan, Mexico.

NASA Astrophysics Data System (ADS)

Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.

2014-12-01

The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
Mapping of epistatic quantitative trait loci in four-way crosses.

PubMed

He, Xiao-Hong; Qin, Hongde; Hu, Zhongli; Zhang, Tianzhen; Zhang, Yuan-Ming

2011-01-01

Four-way crosses (4WC) involving four different inbred lines often appear in plant and animal commercial breeding programs. Direct mapping of quantitative trait loci (QTL) in these commercial populations is both economical and practical. However, the existing statistical methods for mapping QTL in a 4WC population are built on the single-QTL genetic model. This simple genetic model fails to take into account QTL interactions, which play an important role in the genetic architecture of complex traits. In this paper, therefore, we attempted to develop a statistical method to detect epistatic QTL in 4WC population. Conditional probabilities of QTL genotypes, computed by the multi-point single locus method, were used to sample the genotypes of all putative QTL in the entire genome. The sampled genotypes were used to construct the design matrix for QTL effects. All QTL effects, including main and epistatic effects, were simultaneously estimated by the penalized maximum likelihood method. The proposed method was confirmed by a series of Monte Carlo simulation studies and real data analysis of cotton. The new method will provide novel tools for the genetic dissection of complex traits, construction of QTL networks, and analysis of heterosis.
The fossil record of evolution: Data on diversification and extinction

NASA Technical Reports Server (NTRS)

Sepkoski, J. John, Jr.

1990-01-01

The two principle efforts include: (1) a compilation of a synoptic, mesoscale data base on times of origination and extinction of animal genera in the oceans over the last 600 million years of geologic time; and (2) an analysis of statistical patterns in these data that relate to the diversification of complex life and to the occurrence of mass extinctions, especially those that might be associated with extraterrestrial phenomena. The data base is unique in its taxonomic scope and detail and in its temporal resolution. It is a valuable resource for investigating evolutionary expansions and extinctions of complex life.
OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

PubMed

Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan

2016-01-01

Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.
Adaptation in Coding by Large Populations of Neurons in the Retina

NASA Astrophysics Data System (ADS)

Ioffe, Mark L.

A comprehensive theory of neural computation requires an understanding of the statistical properties of the neural population code. The focus of this work is the experimental study and theoretical analysis of the statistical properties of neural activity in the tiger salamander retina. This is an accessible yet complex system, for which we control the visual input and record from a substantial portion--greater than a half--of the ganglion cell population generating the spiking output. Our experiments probe adaptation of the retina to visual statistics: a central feature of sensory systems which have to adjust their limited dynamic range to a far larger space of possible inputs. In Chapter 1 we place our work in context with a brief overview of the relevant background. In Chapter 2 we describe the experimental methodology of recording from 100+ ganglion cells in the tiger salamander retina. In Chapter 3 we first present the measurements of adaptation of individual cells to changes in stimulation statistics and then investigate whether pairwise correlations in fluctuations of ganglion cell activity change across different stimulation conditions. We then transition to a study of the population-level probability distribution of the retinal response captured with maximum-entropy models. Convergence of the model inference is presented in Chapter 4. In Chapter 5 we first test the empirical presence of a phase transition in such models fitting the retinal response to different experimental conditions, and then proceed to develop other characterizations which are sensitive to complexity in the interaction matrix. This includes an analysis of the dynamics of sampling at finite temperature, which demonstrates a range of subtle attractor-like properties in the energy landscape. These are largely conserved when ambient illumination is varied 1000-fold, a result not necessarily apparent from the measured low-order statistics of the distribution. Our results form a consistent picture which is discussed at the end of Chapter 5. We conclude with a few future directions related to this thesis.
TASI: A software tool for spatial-temporal quantification of tumor spheroid dynamics.

PubMed

Hou, Yue; Konen, Jessica; Brat, Daniel J; Marcus, Adam I; Cooper, Lee A D

2018-05-08

Spheroid cultures derived from explanted cancer specimens are an increasingly utilized resource for studying complex biological processes like tumor cell invasion and metastasis, representing an important bridge between the simplicity and practicality of 2-dimensional monolayer cultures and the complexity and realism of in vivo animal models. Temporal imaging of spheroids can capture the dynamics of cell behaviors and microenvironments, and when combined with quantitative image analysis methods, enables deep interrogation of biological mechanisms. This paper presents a comprehensive open-source software framework for Temporal Analysis of Spheroid Imaging (TASI) that allows investigators to objectively characterize spheroid growth and invasion dynamics. TASI performs spatiotemporal segmentation of spheroid cultures, extraction of features describing spheroid morpho-phenotypes, mathematical modeling of spheroid dynamics, and statistical comparisons of experimental conditions. We demonstrate the utility of this tool in an analysis of non-small cell lung cancer spheroids that exhibit variability in metastatic and proliferative behaviors.
New High-Performance Droplet Freezing Assay (HP-DFA) for the Analysis of Ice Nuclei with Complex Composition

NASA Astrophysics Data System (ADS)

Kunert, Anna Theresa; Scheel, Jan Frederik; Helleis, Frank; Klimach, Thomas; Pöschl, Ulrich; Fröhlich-Nowoisky, Janine

2016-04-01

Freezing of water above homogeneous freezing is catalyzed by ice nucleation active (INA) particles called ice nuclei (IN), which can be of various inorganic or biological origin. The freezing temperatures reach up to -1 °C for some biological samples and are dependent on the chemical composition of the IN. The standard method to analyze IN in solution is the droplet freezing assay (DFA) established by Gabor Vali in 1970. Several modifications and improvements were already made within the last decades, but they are still limited by either small droplet numbers, large droplet volumes or inadequate separation of the single droplets resulting in mutual interferences and therefore improper measurements. The probability that miscellaneous IN are concentrated together in one droplet increases with the volume of the droplet, which can be described by the Poisson distribution. At a given concentration, the partition of a droplet into several smaller droplets leads to finely dispersed IN resulting in better statistics and therefore in a better resolution of the nucleation spectrum. We designed a new customized high-performance droplet freezing assay (HP-DFA), which represents an upgrade of the previously existing DFAs in terms of temperature range and statistics. The necessity of observing freezing events at temperatures lower than homogeneous freezing due to freezing point depression, requires high-performance thermostats combined with an optimal insulation. Furthermore, we developed a cooling setup, which allows both huge and tiny temperature changes within a very short period of time. Besides that, the new DFA provides the analysis of more than 750 droplets per run with a small droplet volume of 5 μL. This enables a fast and more precise analysis of biological samples with complex IN composition as well as better statistics for every sample at the same time.
How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

PubMed

West, Brady T; Sakshaug, Joseph W; Aurelien, Guy Alain S

2016-01-01

Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data.
Statistical analysis and ANN modeling for predicting hydrological extremes under climate change scenarios: the example of a small Mediterranean agro-watershed.

PubMed

Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P

2015-05-01

The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.
How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

PubMed Central

West, Brady T.; Sakshaug, Joseph W.; Aurelien, Guy Alain S.

2016-01-01

Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data. PMID:27355817
Diagram-based Analysis of Causal Systems (DACS): elucidating inter-relationships between determinants of acute lower respiratory infections among children in sub-Saharan Africa.

PubMed

Rehfuess, Eva A; Best, Nicky; Briggs, David J; Joffe, Mike

2013-12-06

Effective interventions require evidence on how individual causal pathways jointly determine disease. Based on the concept of systems epidemiology, this paper develops Diagram-based Analysis of Causal Systems (DACS) as an approach to analyze complex systems, and applies it by examining the contributions of proximal and distal determinants of childhood acute lower respiratory infections (ALRI) in sub-Saharan Africa. Diagram-based Analysis of Causal Systems combines the use of causal diagrams with multiple routinely available data sources, using a variety of statistical techniques. In a step-by-step process, the causal diagram evolves from conceptual based on a priori knowledge and assumptions, through operational informed by data availability which then undergoes empirical testing, to integrated which synthesizes information from multiple datasets. In our application, we apply different regression techniques to Demographic and Health Survey (DHS) datasets for Benin, Ethiopia, Kenya and Namibia and a pooled World Health Survey (WHS) dataset for sixteen African countries. Explicit strategies are employed to make decisions transparent about the inclusion/omission of arrows, the sign and strength of the relationships and homogeneity/heterogeneity across settings.Findings about the current state of evidence on the complex web of socio-economic, environmental, behavioral and healthcare factors influencing childhood ALRI, based on DHS and WHS data, are summarized in an integrated causal diagram. Notably, solid fuel use is structured by socio-economic factors and increases the risk of childhood ALRI mortality. Diagram-based Analysis of Causal Systems is a means of organizing the current state of knowledge about a specific area of research, and a framework for integrating statistical analyses across a whole system. This partly a priori approach is explicit about causal assumptions guiding the analysis and about researcher judgment, and wrong assumptions can be reversed following empirical testing. This approach is well-suited to dealing with complex systems, in particular where data are scarce.
A System-Level Pathway-Phenotype Association Analysis Using Synthetic Feature Random Forest

PubMed Central

Pan, Qinxin; Hu, Ting; Malley, James D.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.

2015-01-01

As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations. PMID:24535726
Diagram-based Analysis of Causal Systems (DACS): elucidating inter-relationships between determinants of acute lower respiratory infections among children in sub-Saharan Africa

PubMed Central

2013-01-01

Background Effective interventions require evidence on how individual causal pathways jointly determine disease. Based on the concept of systems epidemiology, this paper develops Diagram-based Analysis of Causal Systems (DACS) as an approach to analyze complex systems, and applies it by examining the contributions of proximal and distal determinants of childhood acute lower respiratory infections (ALRI) in sub-Saharan Africa. Results Diagram-based Analysis of Causal Systems combines the use of causal diagrams with multiple routinely available data sources, using a variety of statistical techniques. In a step-by-step process, the causal diagram evolves from conceptual based on a priori knowledge and assumptions, through operational informed by data availability which then undergoes empirical testing, to integrated which synthesizes information from multiple datasets. In our application, we apply different regression techniques to Demographic and Health Survey (DHS) datasets for Benin, Ethiopia, Kenya and Namibia and a pooled World Health Survey (WHS) dataset for sixteen African countries. Explicit strategies are employed to make decisions transparent about the inclusion/omission of arrows, the sign and strength of the relationships and homogeneity/heterogeneity across settings. Findings about the current state of evidence on the complex web of socio-economic, environmental, behavioral and healthcare factors influencing childhood ALRI, based on DHS and WHS data, are summarized in an integrated causal diagram. Notably, solid fuel use is structured by socio-economic factors and increases the risk of childhood ALRI mortality. Conclusions Diagram-based Analysis of Causal Systems is a means of organizing the current state of knowledge about a specific area of research, and a framework for integrating statistical analyses across a whole system. This partly a priori approach is explicit about causal assumptions guiding the analysis and about researcher judgment, and wrong assumptions can be reversed following empirical testing. This approach is well-suited to dealing with complex systems, in particular where data are scarce. PMID:24314302
Statistical analysis of modal properties of a cable-stayed bridge through long-term structural health monitoring with wireless smart sensor networks

NASA Astrophysics Data System (ADS)

Asadollahi, Parisa; Li, Jian

2016-04-01

Understanding the dynamic behavior of complex structures such as long-span bridges requires dense deployment of sensors. Traditional wired sensor systems are generally expensive and time-consuming to install due to cabling. With wireless communication and on-board computation capabilities, wireless smart sensor networks have the advantages of being low cost, easy to deploy and maintain and therefore facilitate dense instrumentation for structural health monitoring. A long-term monitoring project was recently carried out for a cable-stayed bridge in South Korea with a dense array of 113 smart sensors, which feature the world's largest wireless smart sensor network for civil structural monitoring. This paper presents a comprehensive statistical analysis of the modal properties including natural frequencies, damping ratios and mode shapes of the monitored cable-stayed bridge. Data analyzed in this paper is composed of structural vibration signals monitored during a 12-month period under ambient excitations. The correlation between environmental temperature and the modal frequencies is also investigated. The results showed the long-term statistical structural behavior of the bridge, which serves as the basis for Bayesian statistical updating for the numerical model.

Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kleijnen, J.P.C.; Helton, J.C.

1999-04-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less
A Statistics-based Platform for Quantitative N-terminome Analysis and Identification of Protease Cleavage Products*

PubMed Central

auf dem Keller, Ulrich; Prudova, Anna; Gioia, Magda; Butler, Georgina S.; Overall, Christopher M.

2010-01-01

Terminal amine isotopic labeling of substrates (TAILS), our recently introduced platform for quantitative N-terminome analysis, enables wide dynamic range identification of original mature protein N-termini and protease cleavage products. Modifying TAILS by use of isobaric tag for relative and absolute quantification (iTRAQ)-like labels for quantification together with a robust statistical classifier derived from experimental protease cleavage data, we report reliable and statistically valid identification of proteolytic events in complex biological systems in MS2 mode. The statistical classifier is supported by a novel parameter evaluating ion intensity-dependent quantification confidences of single peptide quantifications, the quantification confidence factor (QCF). Furthermore, the isoform assignment score (IAS) is introduced, a new scoring system for the evaluation of single peptide-to-protein assignments based on high confidence protein identifications in the same sample prior to negative selection enrichment of N-terminal peptides. By these approaches, we identified and validated, in addition to known substrates, low abundance novel bioactive MMP-2 targets including the plasminogen receptor S100A10 (p11) and the proinflammatory cytokine proEMAP/p43 that were previously undescribed. PMID:20305283
Analysis of Power Laws, Shape Collapses, and Neural Complexity: New Techniques and MATLAB Support via the NCC Toolbox

PubMed Central

Marshall, Najja; Timme, Nicholas M.; Bennett, Nicholas; Ripp, Monica; Lautzenhiser, Edward; Beggs, John M.

2016-01-01

Neural systems include interactions that occur across many scales. Two divergent methods for characterizing such interactions have drawn on the physical analysis of critical phenomena and the mathematical study of information. Inferring criticality in neural systems has traditionally rested on fitting power laws to the property distributions of “neural avalanches” (contiguous bursts of activity), but the fractal nature of avalanche shapes has recently emerged as another signature of criticality. On the other hand, neural complexity, an information theoretic measure, has been used to capture the interplay between the functional localization of brain regions and their integration for higher cognitive functions. Unfortunately, treatments of all three methods—power-law fitting, avalanche shape collapse, and neural complexity—have suffered from shortcomings. Empirical data often contain biases that introduce deviations from true power law in the tail and head of the distribution, but deviations in the tail have often been unconsidered; avalanche shape collapse has required manual parameter tuning; and the estimation of neural complexity has relied on small data sets or statistical assumptions for the sake of computational efficiency. In this paper we present technical advancements in the analysis of criticality and complexity in neural systems. We use maximum-likelihood estimation to automatically fit power laws with left and right cutoffs, present the first automated shape collapse algorithm, and describe new techniques to account for large numbers of neural variables and small data sets in the calculation of neural complexity. In order to facilitate future research in criticality and complexity, we have made the software utilized in this analysis freely available online in the MATLAB NCC (Neural Complexity and Criticality) Toolbox. PMID:27445842
Preictal dynamics of EEG complexity in intracranially recorded epileptic seizure: a case report.

PubMed

Bob, Petr; Roman, Robert; Svetlak, Miroslav; Kukleta, Miloslav; Chladek, Jan; Brazdil, Milan

2014-11-01

Recent findings suggest that neural complexity reflecting a number of independent processes in the brain may characterize typical changes during epileptic seizures and may enable to describe preictal dynamics. With respect to previously reported findings suggesting specific changes in neural complexity during preictal period, we have used measure of pointwise correlation dimension (PD2) as a sensitive indicator of nonstationary changes in complexity of the electroencephalogram (EEG) signal. Although this measure of complexity in epileptic patients was previously reported by Feucht et al (Applications of correlation dimension and pointwise dimension for non-linear topographical analysis of focal onset seizures. Med Biol Comput. 1999;37:208-217), it was not used to study changes in preictal dynamics. With this aim to study preictal changes of EEG complexity, we have examined signals from 11 multicontact depth (intracerebral) EEG electrodes located in 108 cortical and subcortical brain sites, and from 3 scalp EEG electrodes in a patient with intractable epilepsy, who underwent preoperative evaluation before epilepsy surgery. From those 108 EEG contacts, records related to 44 electrode contacts implanted into lesional structures and white matter were not included into the experimental analysis.The results show that in comparison to interictal period (at about 8-6 minutes before seizure onset), there was a statistically significant decrease in PD2 complexity in the preictal period at about 2 minutes before seizure onset in all 64 intracranial channels localized in various brain sites that were included into the analysis and in 3 scalp EEG channels as well. Presented results suggest that using PD2 in EEG analysis may have significant implications for research of preictal dynamics and prediction of epileptic seizures.
Managing complex research datasets using electronic tools: A meta-analysis exemplar

PubMed Central

Brown, Sharon A.; Martin, Ellen E.; Garcia, Theresa J.; Winter, Mary A.; García, Alexandra A.; Brown, Adama; Cuevas, Heather E.; Sumlin, Lisa L.

2013-01-01

Meta-analyses of broad scope and complexity require investigators to organize many study documents and manage communication among several research staff. Commercially available electronic tools, e.g., EndNote, Adobe Acrobat Pro, Blackboard, Excel, and IBM SPSS Statistics (SPSS), are useful for organizing and tracking the meta-analytic process, as well as enhancing communication among research team members. The purpose of this paper is to describe the electronic processes we designed, using commercially available software, for an extensive quantitative model-testing meta-analysis we are conducting. Specific electronic tools improved the efficiency of (a) locating and screening studies, (b) screening and organizing studies and other project documents, (c) extracting data from primary studies, (d) checking data accuracy and analyses, and (e) communication among team members. The major limitation in designing and implementing a fully electronic system for meta-analysis was the requisite upfront time to: decide on which electronic tools to use, determine how these tools would be employed, develop clear guidelines for their use, and train members of the research team. The electronic process described here has been useful in streamlining the process of conducting this complex meta-analysis and enhancing communication and sharing documents among research team members. PMID:23681256
Managing complex research datasets using electronic tools: a meta-analysis exemplar.

PubMed

Brown, Sharon A; Martin, Ellen E; Garcia, Theresa J; Winter, Mary A; García, Alexandra A; Brown, Adama; Cuevas, Heather E; Sumlin, Lisa L

2013-06-01

Meta-analyses of broad scope and complexity require investigators to organize many study documents and manage communication among several research staff. Commercially available electronic tools, for example, EndNote, Adobe Acrobat Pro, Blackboard, Excel, and IBM SPSS Statistics (SPSS), are useful for organizing and tracking the meta-analytic process as well as enhancing communication among research team members. The purpose of this article is to describe the electronic processes designed, using commercially available software, for an extensive, quantitative model-testing meta-analysis. Specific electronic tools improved the efficiency of (a) locating and screening studies, (b) screening and organizing studies and other project documents, (c) extracting data from primary studies, (d) checking data accuracy and analyses, and (e) communication among team members. The major limitation in designing and implementing a fully electronic system for meta-analysis was the requisite upfront time to decide on which electronic tools to use, determine how these tools would be used, develop clear guidelines for their use, and train members of the research team. The electronic process described here has been useful in streamlining the process of conducting this complex meta-analysis and enhancing communication and sharing documents among research team members.
A fast boosting-based screening method for large-scale association study in complex traits with genetic heterogeneity.

PubMed

Wang, Lu-Yong; Fasulo, D

2006-01-01

Genome-wide association study for complex diseases will generate massive amount of single nucleotide polymorphisms (SNPs) data. Univariate statistical test (i.e. Fisher exact test) was used to single out non-associated SNPs. However, the disease-susceptible SNPs may have little marginal effects in population and are unlikely to retain after the univariate tests. Also, model-based methods are impractical for large-scale dataset. Moreover, genetic heterogeneity makes the traditional methods harder to identify the genetic causes of diseases. A more recent random forest method provides a more robust method for screening the SNPs in thousands scale. However, for more large-scale data, i.e., Affymetrix Human Mapping 100K GeneChip data, a faster screening method is required to screening SNPs in whole-genome large scale association analysis with genetic heterogeneity. We propose a boosting-based method for rapid screening in large-scale analysis of complex traits in the presence of genetic heterogeneity. It provides a relatively fast and fairly good tool for screening and limiting the candidate SNPs for further more complex computational modeling task.
Determining the Composition and Stability of Protein Complexes Using an Integrated Label-Free and Stable Isotope Labeling Strategy

PubMed Central

Greco, Todd M.; Guise, Amanda J.; Cristea, Ileana M.

2016-01-01

In biological systems, proteins catalyze the fundamental reactions that underlie all cellular functions, including metabolic processes and cell survival and death pathways. These biochemical reactions are rarely accomplished alone. Rather, they involve a concerted effect from many proteins that may operate in a directed signaling pathway and/or may physically associate in a complex to achieve a specific enzymatic activity. Therefore, defining the composition and regulation of protein complexes is critical for understanding cellular functions. In this chapter, we describe an approach that uses quantitative mass spectrometry (MS) to assess the specificity and the relative stability of protein interactions. Isolation of protein complexes from mammalian cells is performed by rapid immunoaffinity purification, and followed by in-solution digestion and high-resolution mass spectrometry analysis. We employ complementary quantitative MS workflows to assess the specificity of protein interactions using label-free MS and statistical analysis, and the relative stability of the interactions using a metabolic labeling technique. For each candidate protein interaction, scores from the two workflows can be correlated to minimize nonspecific background and profile protein complex composition and relative stability. PMID:26867737
A high-confidence interaction map identifies SIRT1 as a mediator of acetylation of USP22 and the SAGA coactivator complex.

PubMed

Armour, Sean M; Bennett, Eric J; Braun, Craig R; Zhang, Xiao-Yong; McMahon, Steven B; Gygi, Steven P; Harper, J Wade; Sinclair, David A

2013-04-01

Although many functions and targets have been attributed to the histone and protein deacetylase SIRT1, a comprehensive analysis of SIRT1 binding proteins yielding a high-confidence interaction map has not been established. Using a comparative statistical analysis of binding partners, we have assembled a high-confidence SIRT1 interactome. Employing this method, we identified the deubiquitinating enzyme ubiquitin-specific protease 22 (USP22), a component of the deubiquitinating module (DUBm) of the SAGA transcriptional coactivating complex, as a SIRT1-interacting partner. We found that this interaction is highly specific, requires the ZnF-UBP domain of USP22, and is disrupted by the inactivating H363Y mutation within SIRT1. Moreover, we show that USP22 is acetylated on multiple lysine residues and that alteration of a single lysine (K129) within the ZnF-UBP domain is sufficient to alter interaction of the DUBm with the core SAGA complex. Furthermore, USP22-mediated recruitment of SIRT1 activity promotes the deacetylation of individual SAGA complex components. Our results indicate an important role of SIRT1-mediated deacetylation in regulating the formation of DUBm subcomplexes within the larger SAGA complex.
Localisation of deformations of the midfacial complex in subjects with class III malocclusions employing thin-plate spline analysis

PubMed Central

SINGH, G. D.; McNAMARA JR, J. A.; LOZANOFF, S.

1997-01-01

This study determines deformations of the midface that contribute to a class III appearance, employing thin-plate spline analysis. A total of 135 lateral cephalographs of prepubertal children of European-American descent with either class III malocclusions or a class I molar occlusion were compared. The cephalographs were traced and checked, and 7 homologous landmarks of the midface were identified and digitised. The data sets were scaled to an equivalent size and subjected to Procrustes analysis. These statistical tests indicated significant differences (P<0.05) between the averaged class I and class III morphologies. Thin-plate spline analysis indicated that both affine and nonaffine transformations contribute towards the total spline for the averaged midfacial configuration. For nonaffine transformations, partial warp 3 had the highest magnitude, indicating the large scale deformations of the midfacial configuration. These deformations affected the palatal landmarks, and were associated with compression of the midfacial complex in the anteroposterior plane predominantly. Partial warp 4 produced some vertical compression of the posterior aspect of the midfacial complex whereas partial warps 1 and 2 indicated localised shape changes of the maxillary alveolus region. Large spatial-scale deformations therefore affect the midfacial complex in an anteroposterior axis, in combination with vertical compression and localised distortions. These deformations may represent a developmental diminution of the palatal complex anteroposteriorly that, allied with vertical shortening of midfacial height posteriorly, results in class III malocclusions with a retrusive midfacial profile. PMID:9449078
Localisation of deformations of the midfacial complex in subjects with class III malocclusions employing thin-plate spline analysis.

PubMed

Singh, G D; McNamara, J A; Lozanoff, S

1997-11-01

This study determines deformations of the midface that contribute to a class III appearance, employing thinplate spline analysis. A total of 135 lateral cephalographs of prepubertal children of European-American descent with either class III malocclusions or a class I molar occlusion were compared. The cephalographs were traced and checked, and 7 homologous landmarks of the midface were identified and digitised. The data sets were scaled to an equivalent size and subjected to Procrustes analysis. These statistical tests indicated significant differences (P < 0.05) between the averaged class I and class III morphologies. Thinplate spline analysis indicated that both affine and nonaffine transformations contribute towards the total spline for the averaged midfacial configuration. For nonaffine transformations, partial warp 3 had the highest magnitude, indicating the large scale deformations of the midfacial configuration. These deformations affected the palatal landmarks, and were associated with compression of the midfacial complex in the anteroposterior plane predominantly. Partial warp 4 produced some vertical compression of the posterior aspect of the midfacial complex whereas partial warps 1 and 2 indicated localised shape changes of the maxillary alveolus region. large spatial-scale deformations therefore affect the midfacial complex in an anteroposterior axis, in combination with vertical compression and localised distortions. These deformations may represent a developmental diminution of the palatal complex anteroposteriorly that, allied with vertical shortening of midfacial height posteriorly, results in class III malocclusions with a retrusive midfacial profile.
Multi-region statistical shape model for cochlear implantation

NASA Astrophysics Data System (ADS)

Romera, Jordi; Kjer, H. Martin; Piella, Gemma; Ceresa, Mario; González Ballester, Miguel A.

2016-03-01

Statistical shape models are commonly used to analyze the variability between similar anatomical structures and their use is established as a tool for analysis and segmentation of medical images. However, using a global model to capture the variability of complex structures is not enough to achieve the best results. The complexity of a proper global model increases even more when the amount of data available is limited to a small number of datasets. Typically, the anatomical variability between structures is associated to the variability of their physiological regions. In this paper, a complete pipeline is proposed for building a multi-region statistical shape model to study the entire variability from locally identified physiological regions of the inner ear. The proposed model, which is based on an extension of the Point Distribution Model (PDM), is built for a training set of 17 high-resolution images (24.5 μm voxels) of the inner ear. The model is evaluated according to its generalization ability and specificity. The results are compared with the ones of a global model built directly using the standard PDM approach. The evaluation results suggest that better accuracy can be achieved using a regional modeling of the inner ear.
Advancing Clinical Proteomics via Analysis Based on Biological Complexes: A Tale of Five Paradigms.

PubMed

Goh, Wilson Wen Bin; Wong, Limsoon

2016-09-02

Despite advances in proteomic technologies, idiosyncratic data issues, for example, incomplete coverage and inconsistency, resulting in large data holes, persist. Moreover, because of naïve reliance on statistical testing and its accompanying p values, differential protein signatures identified from such proteomics data have little diagnostic power. Thus, deploying conventional analytics on proteomics data is insufficient for identifying novel drug targets or precise yet sensitive biomarkers. Complex-based analysis is a new analytical approach that has potential to resolve these issues but requires formalization. We categorize complex-based analysis into five method classes or paradigms and propose an even-handed yet comprehensive evaluation rubric based on both simulated and real data. The first four paradigms are well represented in the literature. The fifth and newest paradigm, the network-paired (NP) paradigm, represented by a method called Extremely Small SubNET (ESSNET), dominates in precision-recall and reproducibility, maintains strong performance in small sample sizes, and sensitively detects low-abundance complexes. In contrast, the commonly used over-representation analysis (ORA) and direct-group (DG) test paradigms maintain good overall precision but have severe reproducibility issues. The other two paradigms considered here are the hit-rate and rank-based network analysis paradigms; both of these have good precision-recall and reproducibility, but they do not consider low-abundance complexes. Therefore, given its strong performance, NP/ESSNET may prove to be a useful approach for improving the analytical resolution of proteomics data. Additionally, given its stability, it may also be a powerful new approach toward functional enrichment tests, much like its ORA and DG counterparts.
Reducing the Complexity of an Agent-Based Local Heroin Market Model

PubMed Central

Heard, Daniel; Bobashev, Georgiy V.; Morris, Robert J.

2014-01-01

This project explores techniques for reducing the complexity of an agent-based model (ABM). The analysis involved a model developed from the ethnographic research of Dr. Lee Hoffer in the Larimer area heroin market, which involved drug users, drug sellers, homeless individuals and police. The authors used statistical techniques to create a reduced version of the original model which maintained simulation fidelity while reducing computational complexity. This involved identifying key summary quantities of individual customer behavior as well as overall market activity and replacing some agents with probability distributions and regressions. The model was then extended to allow external market interventions in the form of police busts. Extensions of this research perspective, as well as its strengths and limitations, are discussed. PMID:25025132
Weighted fractional permutation entropy and fractional sample entropy for nonlinear Potts financial dynamics

NASA Astrophysics Data System (ADS)

Xu, Kaixuan; Wang, Jun

2017-02-01

In this paper, recently introduced permutation entropy and sample entropy are further developed to the fractional cases, weighted fractional permutation entropy (WFPE) and fractional sample entropy (FSE). The fractional order generalization of information entropy is utilized in the above two complexity approaches, to detect the statistical characteristics of fractional order information in complex systems. The effectiveness analysis of proposed methods on the synthetic data and the real-world data reveals that tuning the fractional order allows a high sensitivity and more accurate characterization to the signal evolution, which is useful in describing the dynamics of complex systems. Moreover, the numerical research on nonlinear complexity behaviors is compared between the returns series of Potts financial model and the actual stock markets. And the empirical results confirm the feasibility of the proposed model.
Complexity-entropy causality plane: A useful approach for distinguishing songs

NASA Astrophysics Data System (ADS)

Ribeiro, Haroldo V.; Zunino, Luciano; Mendes, Renio S.; Lenzi, Ervin K.

2012-04-01

Nowadays we are often faced with huge databases resulting from the rapid growth of data storage technologies. This is particularly true when dealing with music databases. In this context, it is essential to have techniques and tools able to discriminate properties from these massive sets. In this work, we report on a statistical analysis of more than ten thousand songs aiming to obtain a complexity hierarchy. Our approach is based on the estimation of the permutation entropy combined with an intensive complexity measure, building up the complexity-entropy causality plane. The results obtained indicate that this representation space is very promising to discriminate songs as well as to allow a relative quantitative comparison among songs. Additionally, we believe that the here-reported method may be applied in practical situations since it is simple, robust and has a fast numerical implementation.
Data series embedding and scale invariant statistics.

PubMed

Michieli, I; Medved, B; Ristov, S

2010-06-01

Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated. (c) 2009 Elsevier B.V. All rights reserved.
A new statistical PCA-ICA algorithm for location of R-peaks in ECG.

PubMed

Chawla, M P S; Verma, H K; Kumar, Vinod

2008-09-16

The success of ICA to separate the independent components from the mixture depends on the properties of the electrocardiogram (ECG) recordings. This paper discusses some of the conditions of independent component analysis (ICA) that could affect the reliability of the separation and evaluation of issues related to the properties of the signals and number of sources. Principal component analysis (PCA) scatter plots are plotted to indicate the diagnostic features in the presence and absence of base-line wander in interpreting the ECG signals. In this analysis, a newly developed statistical algorithm by authors, based on the use of combined PCA-ICA for two correlated channels of 12-channel ECG data is proposed. ICA technique has been successfully implemented in identifying and removal of noise and artifacts from ECG signals. Cleaned ECG signals are obtained using statistical measures like kurtosis and variance of variance after ICA processing. This analysis also paper deals with the detection of QRS complexes in electrocardiograms using combined PCA-ICA algorithm. The efficacy of the combined PCA-ICA algorithm lies in the fact that the location of the R-peaks is bounded from above and below by the location of the cross-over points, hence none of the peaks are ignored or missed.
Scripts for TRUMP data analyses. Part II (HLA-related data): statistical analyses specific for hematopoietic stem cell transplantation.

PubMed

Kanda, Junya

2016-01-01

The Transplant Registry Unified Management Program (TRUMP) made it possible for members of the Japan Society for Hematopoietic Cell Transplantation (JSHCT) to analyze large sets of national registry data on autologous and allogeneic hematopoietic stem cell transplantation. However, as the processes used to collect transplantation information are complex and differed over time, the background of these processes should be understood when using TRUMP data. Previously, information on the HLA locus of patients and donors had been collected using a questionnaire-based free-description method, resulting in some input errors. To correct minor but significant errors and provide accurate HLA matching data, the use of a Stata or EZR/R script offered by the JSHCT is strongly recommended when analyzing HLA data in the TRUMP dataset. The HLA mismatch direction, mismatch counting method, and different impacts of HLA mismatches by stem cell source are other important factors in the analysis of HLA data. Additionally, researchers should understand the statistical analyses specific for hematopoietic stem cell transplantation, such as competing risk, landmark analysis, and time-dependent analysis, to correctly analyze transplant data. The data center of the JSHCT can be contacted if statistical assistance is required.
Global Profiling and Novel Structure Discovery Using Multiple Neutral Loss/Precursor Ion Scanning Combined with Substructure Recognition and Statistical Analysis (MNPSS): Characterization of Terpene-Conjugated Curcuminoids in Curcuma longa as a Case Study.

PubMed

Qiao, Xue; Lin, Xiong-hao; Ji, Shuai; Zhang, Zheng-xiang; Bo, Tao; Guo, De-an; Ye, Min

2016-01-05

To fully understand the chemical diversity of an herbal medicine is challenging. In this work, we describe a new approach to globally profile and discover novel compounds from an herbal extract using multiple neutral loss/precursor ion scanning combined with substructure recognition and statistical analysis. Turmeric (the rhizomes of Curcuma longa L.) was used as an example. This approach consists of three steps: (i) multiple neutral loss/precursor ion scanning to obtain substructure information; (ii) targeted identification of new compounds by extracted ion current and substructure recognition; and (iii) untargeted identification using total ion current and multivariate statistical analysis to discover novel structures. Using this approach, 846 terpecurcumins (terpene-conjugated curcuminoids) were discovered from turmeric, including a number of potentially novel compounds. Furthermore, two unprecedented compounds (terpecurcumins X and Y) were purified, and their structures were identified by NMR spectroscopy. This study extended the application of mass spectrometry to global profiling of natural products in herbal medicines and could help chemists to rapidly discover novel compounds from a complex matrix.

On some stochastic formulations and related statistical moments of pharmacokinetic models.

PubMed

Matis, J H; Wehrly, T E; Metzler, C M

1983-02-01

This paper presents the deterministic and stochastic model for a linear compartment system with constant coefficients, and it develops expressions for the mean residence times (MRT) and the variances of the residence times (VRT) for the stochastic model. The expressions are relatively simple computationally, involving primarily matrix inversion, and they are elegant mathematically, in avoiding eigenvalue analysis and the complex domain. The MRT and VRT provide a set of new meaningful response measures for pharmacokinetic analysis and they give added insight into the system kinetics. The new analysis is illustrated with an example involving the cholesterol turnover in rats.
Neuronal Correlation Parameter and the Idea of Thermodynamic Entropy of an N-Body Gravitationally Bounded System.

PubMed

Haranas, Ioannis; Gkigkitzis, Ioannis; Kotsireas, Ilias; Austerlitz, Carlos

2017-01-01

Understanding how the brain encodes information and performs computation requires statistical and functional analysis. Given the complexity of the human brain, simple methods that facilitate the interpretation of statistical correlations among different brain regions can be very useful. In this report we introduce a numerical correlation measure that may serve the interpretation of correlational neuronal data, and may assist in the evaluation of different brain states. The description of the dynamical brain system, through a global numerical measure may indicate the presence of an action principle which may facilitate a application of physics principles in the study of the human brain and cognition.
Stokes-correlometry of polarization-inhomogeneous objects

NASA Astrophysics Data System (ADS)

Ushenko, O. G.; Dubolazov, A.; Bodnar, G. B.; Bachynskiy, V. T.; Vanchulyak, O.

2018-01-01

The paper consists of two parts. The first part presents short theoretical basics of the method of Stokes-correlometry description of optical anisotropy of biological tissues. It was provided experimentally measured coordinate distributions of modulus (MSV) and phase (PhSV) of complex Stokes vector of skeletal muscle tissue. It was defined the values and ranges of changes of statistic moments of the 1st-4th orders, which characterize the distributions of values of MSV and PhSV. The second part presents the data of statistic analysis of the distributions of modulus MSV and PhSV. It was defined the objective criteria of differentiation of samples with urinary incontinence.
The clinical value of large neuroimaging data sets in Alzheimer's disease.

PubMed

Toga, Arthur W

2012-02-01

Rapid advances in neuroimaging and cyberinfrastructure technologies have brought explosive growth in the Web-based warehousing, availability, and accessibility of imaging data on a variety of neurodegenerative and neuropsychiatric disorders and conditions. There has been a prolific development and emergence of complex computational infrastructures that serve as repositories of databases and provide critical functionalities such as sophisticated image analysis algorithm pipelines and powerful three-dimensional visualization and statistical tools. The statistical and operational advantages of collaborative, distributed team science in the form of multisite consortia push this approach in a diverse range of population-based investigations. Copyright © 2012 Elsevier Inc. All rights reserved.
Teaching Statistics--Despite Its Applications

ERIC Educational Resources Information Center

Ridgway, Jim; Nicholson, James; McCusker, Sean

2007-01-01

Evidence-based policy requires sophisticated modelling and reasoning about complex social data. The current UK statistics curricula do not equip tomorrow's citizens to understand such reasoning. We advocate radical curriculum reform, designed to require students to reason from complex data.
Modeling stock price dynamics by continuum percolation system and relevant complex systems analysis

NASA Astrophysics Data System (ADS)

Xiao, Di; Wang, Jun

2012-10-01

The continuum percolation system is developed to model a random stock price process in this work. Recent empirical research has demonstrated various statistical features of stock price changes, the financial model aiming at understanding price fluctuations needs to define a mechanism for the formation of the price, in an attempt to reproduce and explain this set of empirical facts. The continuum percolation model is usually referred to as a random coverage process or a Boolean model, the local interaction or influence among traders is constructed by the continuum percolation, and a cluster of continuum percolation is applied to define the cluster of traders sharing the same opinion about the market. We investigate and analyze the statistical behaviors of normalized returns of the price model by some analysis methods, including power-law tail distribution analysis, chaotic behavior analysis and Zipf analysis. Moreover, we consider the daily returns of Shanghai Stock Exchange Composite Index from January 1997 to July 2011, and the comparisons of return behaviors between the actual data and the simulation data are exhibited.
Learning Predictive Statistics: Strategies and Brain Mechanisms.

PubMed

Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

2017-08-30

When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to changes in the environment's statistics. We provide evidence for an alternate route for learning complex temporal statistics: extracting the most probable outcome in a given context is implemented by interactions between executive and motor corticostriatal mechanisms compared with visual corticostriatal circuits (including hippocampal cortex) that support learning of the exact temporal statistics. Copyright © 2017 Wang et al.
Mathematics and statistics research department. Progress report, period ending June 30, 1981

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lever, W.E.; Kane, V.E.; Scott, D.S.

1981-09-01

This report is the twenty-fourth in the series of progress reports of the Mathematics and Statistics Research Department of the Computer Sciences Division, Union Carbide Corporation - Nuclear Division (UCC-ND). Part A records research progress in biometrics research, materials science applications, model evaluation, moving boundary problems, multivariate analysis, numerical linear algebra, risk analysis, and complementary areas. Collaboration and consulting with others throughout the UCC-ND complex are recorded in Part B. Included are sections on biology and health sciences, chemistry, energy, engineering, environmental sciences, health and safety research, materials sciences, safeguards, surveys, and uranium resource evaluation. Part C summarizes the variousmore » educational activities in which the staff was engaged. Part D lists the presentations of research results, and Part E records the staff's other professional activities during the report period.« less
Characterizing chaotic melodies in automatic music composition

NASA Astrophysics Data System (ADS)

Coca, Andrés E.; Tost, Gerard O.; Zhao, Liang

2010-09-01

In this paper, we initially present an algorithm for automatic composition of melodies using chaotic dynamical systems. Afterward, we characterize chaotic music in a comprehensive way as comprising three perspectives: musical discrimination, dynamical influence on musical features, and musical perception. With respect to the first perspective, the coherence between generated chaotic melodies (continuous as well as discrete chaotic melodies) and a set of classical reference melodies is characterized by statistical descriptors and melodic measures. The significant differences among the three types of melodies are determined by discriminant analysis. Regarding the second perspective, the influence of dynamical features of chaotic attractors, e.g., Lyapunov exponent, Hurst coefficient, and correlation dimension, on melodic features is determined by canonical correlation analysis. The last perspective is related to perception of originality, complexity, and degree of melodiousness (Euler's gradus suavitatis) of chaotic and classical melodies by nonparametric statistical tests.
Study of pre-seismic kHz EM emissions by means of complex systems

NASA Astrophysics Data System (ADS)

Balasis, Georgios; Papadimitriou, Constantinos; Eftaxias, Konstantinos

2010-05-01

The field of study of complex systems holds that the dynamics of complex systems are founded on universal principles that may used to describe disparate problems ranging from particle physics to economies of societies. A corollary is that transferring ideas and results from investigators in hitherto disparate areas will cross-fertilize and lead to important new results. It is well-known that the Boltzmann-Gibbs statistical mechanics works best in dealing with systems composed of either independent subsystems or interacting via short-range forces, and whose subsystems can access all the available phase space. For systems exhibiting long-range correlations, memory, or fractal properties, non-extensive Tsallis statistical mechanics becomes the most appropriate mathematical framework. As it was mentioned a central property of the magnetic storm, solar flare, and earthquake preparation process is the possible occurrence of coherent large-scale collective with a very rich structure, resulting from the repeated nonlinear interactions among collective with a very rich structure, resulting from the repeated nonlinear interactions among its constituents. Consequently, the non-extensive statistical mechanics is an appropriate regime to investigate universality, if any, in magnetic storm, solar flare, earthquake and pre-failure EM emission occurrence. A model for earthquake dynamics coming from a non-extensive Tsallis formulation, starting from first principles, has been recently introduced. This approach leads to a Gutenberg-Richter type law for the magnitude distribution of earthquakes which provides an excellent fit to seismicities generated in various large geographic areas usually identified as "seismic regions". We examine whether the Gutenberg-Richter law corresponding to a non-extensive Tsallis statistics is able to describe the distribution of amplitude of earthquakes, pre-seismic kHz EM emissions (electromagnetic earthquakes), solar flares, and magnetic storms. The analysis shows that the introduced non-extensive model provides an excellent fit to the experimental data, incorporating the characteristics of universality by means of non-extensive statistics into the extreme events under study.
An approach for the assessment of the statistical aspects of the SEA coupling loss factors and the vibrational energy transmission in complex aircraft structures: Experimental investigation and methods benchmark

NASA Astrophysics Data System (ADS)

Bouhaj, M.; von Estorff, O.; Peiffer, A.

2017-09-01

In the application of Statistical Energy Analysis "SEA" to complex assembled structures, a purely predictive model often exhibits errors. These errors are mainly due to a lack of accurate modelling of the power transmission mechanism described through the Coupling Loss Factors (CLF). Experimental SEA (ESEA) is practically used by the automotive and aerospace industry to verify and update the model or to derive the CLFs for use in an SEA predictive model when analytical estimates cannot be made. This work is particularly motivated by the lack of procedures that allow an estimate to be made of the variance and confidence intervals of the statistical quantities when using the ESEA technique. The aim of this paper is to introduce procedures enabling a statistical description of measured power input, vibration energies and the derived SEA parameters. Particular emphasis is placed on the identification of structural CLFs of complex built-up structures comparing different methods. By adopting a Stochastic Energy Model (SEM), the ensemble average in ESEA is also addressed. For this purpose, expressions are obtained to randomly perturb the energy matrix elements and generate individual samples for the Monte Carlo (MC) technique applied to derive the ensemble averaged CLF. From results of ESEA tests conducted on an aircraft fuselage section, the SEM approach provides a better performance of estimated CLFs compared to classical matrix inversion methods. The expected range of CLF values and the synthesized energy are used as quality criteria of the matrix inversion, allowing to assess critical SEA subsystems, which might require a more refined statistical description of the excitation and the response fields. Moreover, the impact of the variance of the normalized vibration energy on uncertainty of the derived CLFs is outlined.
GARNET--gene set analysis with exploration of annotation relations.

PubMed

Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu

2011-02-15

Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Quantitative characterization of galectin-3-C affinity mass spectrometry measurements: Comprehensive data analysis, obstacles, shortcuts and robustness.

PubMed

Haramija, Marko; Peter-Katalinić, Jasna

2017-10-30

Affinity mass spectrometry (AMS) is an emerging tool in the field of the study of protein•carbohydrate complexes. However, experimental obstacles and data analysis are preventing faster integration of AMS methods into the glycoscience field. Here we show how analysis of direct electrospray ionization mass spectrometry (ESI-MS) AMS data can be simplified for screening purposes, even for complex AMS spectra. A direct ESI-MS assay was tested in this study and binding data for the galectin-3C•lactose complex were analyzed using a comprehensive and simplified data analysis approach. In the comprehensive data analysis approach, noise, all protein charge states, alkali ion adducts and signal overlap were taken into account. In a simplified approach, only the intensities of the fully protonated free protein and the protein•carbohydrate complex for the main protein charge state were taken into account. In our study, for high intensity signals, noise was negligible, sodiated protein and sodiated complex signals cancelled each other out when calculating the K d value, and signal overlap influenced the Kd value only to a minor extent. Influence of these parameters on low intensity signals was much higher. However, low intensity protein charge states should be avoided in quantitative AMS analyses due to poor ion statistics. The results indicate that noise, alkali ion adducts, signal overlap, as well as low intensity protein charge states, can be neglected for preliminary experiments, as well as in screening assays. One comprehensive data analysis performed as a control should be sufficient to validate this hypothesis for other binding systems as well. Copyright © 2017 John Wiley & Sons, Ltd.
A conceptual model for the development process of confirmatory adaptive clinical trials within an emergency research network.

PubMed

Mawocha, Samkeliso C; Fetters, Michael D; Legocki, Laurie J; Guetterman, Timothy C; Frederiksen, Shirley; Barsan, William G; Lewis, Roger J; Berry, Donald A; Meurer, William J

2017-06-01

Adaptive clinical trials use accumulating data from enrolled subjects to alter trial conduct in pre-specified ways based on quantitative decision rules. In this research, we sought to characterize the perspectives of key stakeholders during the development process of confirmatory-phase adaptive clinical trials within an emergency clinical trials network and to build a model to guide future development of adaptive clinical trials. We used an ethnographic, qualitative approach to evaluate key stakeholders' views about the adaptive clinical trial development process. Stakeholders participated in a series of multidisciplinary meetings during the development of five adaptive clinical trials and completed a Strengths-Weaknesses-Opportunities-Threats questionnaire. In the analysis, we elucidated overarching themes across the stakeholders' responses to develop a conceptual model. Four major overarching themes emerged during the analysis of stakeholders' responses to questioning: the perceived statistical complexity of adaptive clinical trials and the roles of collaboration, communication, and time during the development process. Frequent and open communication and collaboration were viewed by stakeholders as critical during the development process, as were the careful management of time and logistical issues related to the complexity of planning adaptive clinical trials. The Adaptive Design Development Model illustrates how statistical complexity, time, communication, and collaboration are moderating factors in the adaptive design development process. The intensity and iterative nature of this process underscores the need for funding mechanisms for the development of novel trial proposals in academic settings.
Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.

PubMed

Kwak, Il-Youp; Pan, Wei

2017-01-01

To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
CMM Data Analysis Tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Due to the increase in the use of Coordinate Measuring Machines (CMMs) to measure fine details and complex geometries in manufacturing, many programs have been made to compile and analyze the data. These programs typically require extensive setup to determine the expected results in order to not only track the pass/fail of a dimension, but also to use statistical process control (SPC). These extra steps and setup times have been addressed through the CMM Data Analysis Tool, which only requires the output of the CMM to provide both pass/fail analysis on all parts run to the same inspection program asmore » well as provide graphs which help visualize where the part measures within the allowed tolerances. This provides feedback not only to the customer for approval of a part during development, but also to machining process engineers to identify when any dimension is drifting towards an out of tolerance condition during production. This program can handle hundreds of parts with complex dimensions and will provide an analysis within minutes.« less
Correlation of RNA secondary structure statistics with thermodynamic stability and applications to folding.

PubMed

Wu, Johnny C; Gardner, David P; Ozer, Stuart; Gutell, Robin R; Ren, Pengyu

2009-08-28

The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.
Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

PubMed

Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

2013-01-01

Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

NASA Astrophysics Data System (ADS)

Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.

2013-04-01

Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.
Microarray R-based analysis of complex lysate experiments with MIRACLE

PubMed Central

List, Markus; Block, Ines; Pedersen, Marlene Lemvig; Christiansen, Helle; Schmidt, Steffen; Thomassen, Mads; Tan, Qihua; Baumbach, Jan; Mollenhauer, Jan

2014-01-01

Motivation: Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected. Results: This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis. Availability: Project URL: http://www.nanocan.org/miracle/ Contact: mlist@health.sdu.dk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161257

Microarray R-based analysis of complex lysate experiments with MIRACLE.

PubMed

List, Markus; Block, Ines; Pedersen, Marlene Lemvig; Christiansen, Helle; Schmidt, Steffen; Thomassen, Mads; Tan, Qihua; Baumbach, Jan; Mollenhauer, Jan

2014-09-01

Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected. This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis. Project URL: http://www.nanocan.org/miracle/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
The applications of Complexity Theory and Tsallis Non-extensive Statistics at Solar Plasma Dynamics

NASA Astrophysics Data System (ADS)

Pavlos, George

2015-04-01

As the solar plasma lives far from equilibrium it is an excellent laboratory for testing complexity theory and non-equilibrium statistical mechanics. In this study, we present the highlights of complexity theory and Tsallis non extensive statistical mechanics as concerns their applications at solar plasma dynamics, especially at sunspot, solar flare and solar wind phenomena. Generally, when a physical system is driven far from equilibrium states some novel characteristics can be observed related to the nonlinear character of dynamics. Generally, the nonlinearity in space plasma dynamics can generate intermittent turbulence with the typical characteristics of the anomalous diffusion process and strange topologies of stochastic space plasma fields (velocity and magnetic fields) caused by the strange dynamics and strange kinetics (Zaslavsky, 2002). In addition, according to Zelenyi and Milovanov (2004) the complex character of the space plasma system includes the existence of non-equilibrium (quasi)-stationary states (NESS) having the topology of a percolating fractal set. The stabilization of a system near the NESS is perceived as a transition into a turbulent state determined by self-organization processes. The long-range correlation effects manifest themselves as a strange non-Gaussian behavior of kinetic processes near the NESS plasma state. The complex character of space plasma can also be described by the non-extensive statistical thermodynamics pioneered by Tsallis, which offers a consistent and effective theoretical framework, based on a generalization of Boltzmann - Gibbs (BG) entropy, to describe far from equilibrium nonlinear complex dynamics (Tsallis, 2009). In a series of recent papers, the hypothesis of Tsallis non-extensive statistics in magnetosphere, sunspot dynamics, solar flares, solar wind and space plasma in general, was tested and verified (Karakatsanis et al., 2013; Pavlos et al., 2014; 2015). Our study includes the analysis of solar plasma time series at three cases: sunspot index, solar flare and solar wind data. The non-linear analysis of the sunspot index is embedded in the non-extensive statistical theory of Tsallis (1988; 2004; 2009). The q-triplet of Tsallis, as well as the correlation dimension and the Lyapunov exponent spectrum were estimated for the SVD components of the sunspot index timeseries. Also the multifractal scaling exponent spectrum f(a), the generalized Renyi dimension spectrum D(q) and the spectrum J(p) of the structure function exponents were estimated experimentally and theoretically by using the q-entropy principle included in Tsallis non-extensive statistical theory, following Arimitsu and Arimitsu (2000, 2001). Our analysis showed clearly the following: (a) a phase transition process in the solar dynamics from high dimensional non-Gaussian SOC state to a low dimensional non-Gaussian chaotic state, (b) strong intermittent solar turbulence and anomalous (multifractal) diffusion solar process, which is strengthened as the solar dynamics makes a phase transition to low dimensional chaos in accordance to Ruzmaikin, Zelenyi and Milovanov's studies (Zelenyi and Milovanov, 1991; Milovanov and Zelenyi, 1993; Ruzmakin et al., 1996), (c) faithful agreement of Tsallis non-equilibrium statistical theory with the experimental estimations of: (i) non-Gaussian probability distribution function P(x), (ii) multifractal scaling exponent spectrum f(a) and generalized Renyi dimension spectrum Dq, (iii) exponent spectrum J(p) of the structure functions estimated for the sunspot index and its underlying non equilibrium solar dynamics. Also, the q-triplet of Tsallis as well as the correlation dimension and the Lyapunov exponent spectrum were estimated for the singular value decomposition (SVD) components of the solar flares timeseries. Also the multifractal scaling exponent spectrum f(a), the generalized Renyi dimension spectrum D(q) and the spectrum J(p) of the structure function exponents were estimated experimentally and theoretically by using the q-entropy principle included in Tsallis non-extensive statistical theory, following Arimitsu and Arimitsu (2000). Our analysis showed clearly the following: (a) a phase transition process in the solar flare dynamics from a high dimensional non-Gaussian self-organized critical (SOC) state to a low dimensional also non-Gaussian chaotic state, (b) strong intermittent solar corona turbulence and an anomalous (multifractal) diffusion solar corona process, which is strengthened as the solar corona dynamics makes a phase transition to low dimensional chaos, (c) faithful agreement of Tsallis non-equilibrium statistical theory with the experimental estimations of the functions: (i) non-Gaussian probability distribution function P(x), (ii) f(a) and D(q), and (iii) J(p) for the solar flares timeseries and its underlying non-equilibrium solar dynamics, and (d) the solar flare dynamical profile is revealed similar to the dynamical profile of the solar corona zone as far as the phase transition process from self-organized criticality (SOC) to chaos state. However the solar low corona (solar flare) dynamical characteristics can be clearly discriminated from the dynamical characteristics of the solar convection zone. At last we present novel results revealing non-equilibrium phase transition processes in the solar wind plasma during a strong shock event, which can take place in Solar wind plasma system. The solar wind plasma as well as the entire solar plasma system is a typical case of stochastic spatiotemporal distribution of physical state variables such as force fields ( ) and matter fields (particle and current densities or bulk plasma distributions). This study shows clearly the non-extensive and non-Gaussian character of the solar wind plasma and the existence of multi-scale strong correlations from the microscopic to the macroscopic level. It also underlines the inefficiency of classical magneto-hydro-dynamic (MHD) or plasma statistical theories, based on the classical central limit theorem (CLT), to explain the complexity of the solar wind dynamics, since these theories include smooth and differentiable spatial-temporal functions (MHD theory) or Gaussian statistics (Boltzmann-Maxwell statistical mechanics). On the contrary, the results of this study indicate the presence of non-Gaussian non-extensive statistics with heavy tails probability distribution functions, which are related to the q-extension of CLT. Finally, the results of this study can be understood in the framework of modern theoretical concepts such as non-extensive statistical mechanics (Tsallis, 2009), fractal topology (Zelenyi and Milovanov, 2004), turbulence theory (Frisch, 1996), strange dynamics (Zaslavsky, 2002), percolation theory (Milovanov, 1997), anomalous diffusion theory and anomalous transport theory (Milovanov, 2001), fractional dynamics (Tarasov, 2013) and non-equilibrium phase transition theory (Chang, 1992). References 1. T. Arimitsu, N. Arimitsu, Tsallis statistics and fully developed turbulence, J. Phys. A: Math. Gen. 33 (2000) L235. 2. T. Arimitsu, N. Arimitsu, Analysis of turbulence by statistics based on generalized entropies, Physica A 295 (2001) 177-194. 3. T. Chang, Low-dimensional behavior and symmetry braking of stochastic systems near criticality can these effects be observed in space and in the laboratory, IEEE 20 (6) (1992) 691-694. 4. U. Frisch, Turbulence, Cambridge University Press, Cambridge, UK, 1996, p. 310. 5. L.P. Karakatsanis, G.P. Pavlos, M.N. Xenakis, Tsallis non-extensive statistics, intermittent turbulence, SOC and chaos in the solar plasma. Part two: Solar flares dynamics, Physica A 392 (2013) 3920-3944. 6. A.V. Milovanov, Topological proof for the Alexander-Orbach conjecture, Phys. Rev. E 56 (3) (1997) 2437-2446. 7. A.V. Milovanov, L.M. Zelenyi, Fracton excitations as a driving mechanism for the self-organized dynamical structuring in the solar wind, Astrophys. Space Sci. 264 (1-4) (1999) 317-345. 8. A.V. Milovanov, Stochastic dynamics from the fractional Fokker-Planck-Kolmogorov equation: large-scale behavior of the turbulent transport coefficient, Phys. Rev. E 63 (2001) 047301. 9. G.P. Pavlos, et al., Universality of non-extensive Tsallis statistics and time series analysis: Theory and applications, Physica A 395 (2014) 58-95. 10. G.P. Pavlos, et al., Tsallis non-extensive statistics and solar wind plasma complexity, Physica A 422 (2015) 113-135. 11. A.A. Ruzmaikin, et al., Spectral properties of solar convection and diffusion, ApJ 471 (1996) 1022. 12. V.E. Tarasov, Review of some promising fractional physical models, Internat. J. Modern Phys. B 27 (9) (2013) 1330005. 13. C. Tsallis, Possible generalization of BG statistics, J. Stat. Phys. J 52 (1-2) (1988) 479-487. 14. C. Tsallis, Nonextensive statistical mechanics: construction and physical interpretation, in: G.M. Murray, C. Tsallis (Eds.), Nonextensive Entropy-Interdisciplinary Applications, Oxford Univ. Press, 2004, pp. 1-53. 15. C. Tsallis, Introduction to Non-Extensive Statistical Mechanics, Springer, 2009. 16. G.M. Zaslavsky, Chaos, fractional kinetics, and anomalous transport, Physics Reports 371 (2002) 461-580. 17. L.M. Zelenyi, A.V. Milovanov, Fractal properties of sunspots, Sov. Astron. Lett. 17 (6) (1991) 425. 18. L.M. Zelenyi, A.V. Milovanov, Fractal topology and strange kinetics: from percolation theory to problems in cosmic electrodynamics, Phys.-Usp. 47 (8), (2004) 749-788.
Reusable, extensible, and modifiable R scripts and Kepler workflows for comprehensive single set ChIP-seq analysis.

PubMed

Cormier, Nathan; Kolisnik, Tyler; Bieda, Mark

2016-07-05

There has been an enormous expansion of use of chromatin immunoprecipitation followed by sequencing (ChIP-seq) technologies. Analysis of large-scale ChIP-seq datasets involves a complex series of steps and production of several specialized graphical outputs. A number of systems have emphasized custom development of ChIP-seq pipelines. These systems are primarily based on custom programming of a single, complex pipeline or supply libraries of modules and do not produce the full range of outputs commonly produced for ChIP-seq datasets. It is desirable to have more comprehensive pipelines, in particular ones addressing common metadata tasks, such as pathway analysis, and pipelines producing standard complex graphical outputs. It is advantageous if these are highly modular systems, available as both turnkey pipelines and individual modules, that are easily comprehensible, modifiable and extensible to allow rapid alteration in response to new analysis developments in this growing area. Furthermore, it is advantageous if these pipelines allow data provenance tracking. We present a set of 20 ChIP-seq analysis software modules implemented in the Kepler workflow system; most (18/20) were also implemented as standalone, fully functional R scripts. The set consists of four full turnkey pipelines and 16 component modules. The turnkey pipelines in Kepler allow data provenance tracking. Implementation emphasized use of common R packages and widely-used external tools (e.g., MACS for peak finding), along with custom programming. This software presents comprehensive solutions and easily repurposed code blocks for ChIP-seq analysis and pipeline creation. Tasks include mapping raw reads, peakfinding via MACS, summary statistics, peak location statistics, summary plots centered on the transcription start site (TSS), gene ontology, pathway analysis, and de novo motif finding, among others. These pipelines range from those performing a single task to those performing full analyses of ChIP-seq data. The pipelines are supplied as both Kepler workflows, which allow data provenance tracking, and, in the majority of cases, as standalone R scripts. These pipelines are designed for ease of modification and repurposing.
Experimental Analysis and Measurement of Situation Awareness

DTIC Science & Technology

1995-11-01

the participant is interacting that can be characterized uniquely by a set of information, knowledge and response options. However, the concept of a...should receive attention is when the interruption or the surprise creates a statistical interaction between two or more of the other variables of...Awareness in Complex Systems. Daytona Beach, Fl: Embry-Riddle Aeronautical University Press. Sarter, N.B., and Woods, D.D. (1994). Pilot interaction
Development of a Comprehensive Digital Avionics Curriculum for the Aeronautical Engineer

DTIC Science & Technology

2006-03-01

able to analyze and design aircraft and missile guidance and control systems, including feedback stabilization schemes and stochastic processes, using ...Uncertainty modeling for robust control; Robust closed-loop stability and performance; Robust H- infinity control; Robustness check using mu-analysis...Controlled feedback (reduces noise) 3. Statistical group response (reduce pressure toward conformity) When used as a tool to study a complex problem
Multispecies, Integrative GWAS for Focal Segmental Glomerulosclerosis

DTIC Science & Technology

2017-09-01

is a frequent cause of end-stage renal disease (ESRD. We investigated the genetic basis of FSGS and recruited a heterogeneous population of...understanding the complex genetic mechanisms of FSGS. 15. SUBJECT TERMS FSGS, MCD, GWAS, CNV 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT uu...disease (MCD). Using a variety of statistical and genetic approaches, including genome wide association analysis and rare copy number variations (CNVs
Understanding characteristics in multivariate traffic flow time series from complex network structure

NASA Astrophysics Data System (ADS)

Yan, Ying; Zhang, Shen; Tang, Jinjun; Wang, Xiaofei

2017-07-01

Discovering dynamic characteristics in traffic flow is the significant step to design effective traffic managing and controlling strategy for relieving traffic congestion in urban cities. A new method based on complex network theory is proposed to study multivariate traffic flow time series. The data were collected from loop detectors on freeway during a year. In order to construct complex network from original traffic flow, a weighted Froenius norm is adopt to estimate similarity between multivariate time series, and Principal Component Analysis is implemented to determine the weights. We discuss how to select optimal critical threshold for networks at different hour in term of cumulative probability distribution of degree. Furthermore, two statistical properties of networks: normalized network structure entropy and cumulative probability of degree, are utilized to explore hourly variation in traffic flow. The results demonstrate these two statistical quantities express similar pattern to traffic flow parameters with morning and evening peak hours. Accordingly, we detect three traffic states: trough, peak and transitional hours, according to the correlation between two aforementioned properties. The classifying results of states can actually represent hourly fluctuation in traffic flow by analyzing annual average hourly values of traffic volume, occupancy and speed in corresponding hours.
Binding Site and Potency Prediction of Teixobactin and other Lipid II Ligands by Statistical Base Scoring of Conformational Space Maps.

PubMed

Lungu, Claudiu N; Diudea, Mircea V

2018-01-01

Lipid II, a peptidoglycan, is a precursor in bacterial cell synthesis. It has both hydrophilic and lipophilic properties. The molecule translocates a bacterial membrane to deliver and incorporate "building blocks" from disaccharide-pentapeptide into the peptidoglican wall. Lipid II is a valid antibiotic target. A receptor binding pocket may be occupied by a ligand in various plausible conformations, among which only few ones are energetically related to a biological activity in the physiological efficiency domain. This paper reports the mapping of the conformational space of Lipid II in its interaction with Teixobactin and other Lipid II ligands. In order to study computationally the complex between Lipid II and ligands, a docking study was first carried on. Docking site was retrieved form literature. After docking, 5 ligand conformations and further 5 complexes (denoted 00 to 04) for each molecule were taken into account. For each structure, conformational studies were performed. Statistical analysis, conformational analysis and molecular dynamics based clustering were used to predict the potency of these compounds. A score for potency prediction was developed. Appling lipid II classification according to Lipid II conformational energy, a conformation of Teixobactin proved to be energetically favorable, followed by Oritravicin, Dalbavycin, Telvanicin, Teicoplamin and Vancomycin, respectively. Scoring of molecules according to cluster band and PCA produced the same result. Molecules classified according to standard deviations showed Dalbavycin as the most favorable conformation, followed by Teicoplamin, Telvanicin, Teixobactin, Oritravicin and Vancomycin, respectively. Total score showing best energetic efficiency of complex formation shows Teixobactin to have the best conformation (a score of 15 points) followed by Dalbavycin (14 points), Oritravicin (12v points), Telvanicin (10 points), Teicoplamin (9 points), Vancomycin (3 points). Statistical analysis of conformations can be used to predict the efficiency of ligand - target interaction and consecutively to find insight regarding ligand potency and postulate about favorable conformation of ligand and binding site. In this study it was shown that Teixobactin is more efficient in binding with Lipid II compared to Vancomycin, results confirmed by experimental data reported in literature. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Complex network theory for the identification and assessment of candidate protein targets.

PubMed

McGarry, Ken; McDonald, Sharon

2018-06-01

In this work we use complex network theory to provide a statistical model of the connectivity patterns of human proteins and their interaction partners. Our intention is to identify important proteins that may be predisposed to be potential candidates as drug targets for therapeutic interventions. Target proteins usually have more interaction partners than non-target proteins, but there are no hard-and-fast rules for defining the actual number of interactions. We devise a statistical measure for identifying hub proteins, we score our target proteins with gene ontology annotations. The important druggable protein targets are likely to have similar biological functions that can be assessed for their potential therapeutic value. Our system provides a statistical analysis of the local and distant neighborhood protein interactions of the potential targets using complex network measures. This approach builds a more accurate model of drug-to-target activity and therefore the likely impact on treating diseases. We integrate high quality protein interaction data from the HINT database and disease associated proteins from the DrugTarget database. Other sources include biological knowledge from Gene Ontology and drug information from DrugBank. The problem is a very challenging one since the data is highly imbalanced between target proteins and the more numerous nontargets. We use undersampling on the training data and build Random Forest classifier models which are used to identify previously unclassified target proteins. We validate and corroborate these findings from the available literature. Copyright © 2018 Elsevier Ltd. All rights reserved.
An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.

PubMed

Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca

2002-09-01

In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.
Using assemblage data in ecological indicators: A comparison and evaluation of commonly available statistical tools

USGS Publications Warehouse

Smith, Joseph M.; Mather, Martha E.

2012-01-01

Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.
Association Analysis in Rice: From Application to Utilization

PubMed Central

Zhang, Peng; Zhong, Kaizhen; Shahid, Muhammad Qasim; Tong, Hanhua

2016-01-01

Association analysis based on linkage disequilibrium (LD) is an efficient way to dissect complex traits and to identify gene functions in rice. Although association analysis is an effective way to construct fine maps for quantitative traits, there are a few issues which need to be addressed. In this review, we will first summarize type, structure, and LD level of populations used for association analysis of rice, and then discuss the genotyping methods and statistical approaches used for association analysis in rice. Moreover, we will review current shortcomings and benefits of association analysis as well as specific types of future research to overcome these shortcomings. Furthermore, we will analyze the reasons for the underutilization of the results within association analysis in rice breeding. PMID:27582745
Refined generalized multiscale entropy analysis for physiological signals

NASA Astrophysics Data System (ADS)

Liu, Yunxiao; Lin, Youfang; Wang, Jing; Shang, Pengjian

2018-01-01

Multiscale entropy analysis has become a prevalent complexity measurement and been successfully applied in various fields. However, it only takes into account the information of mean values (first moment) in coarse-graining procedure. Then generalized multiscale entropy (MSEn) considering higher moments to coarse-grain a time series was proposed and MSEσ2 has been implemented. However, the MSEσ2 sometimes may yield an imprecise estimation of entropy or undefined entropy, and reduce statistical reliability of sample entropy estimation as scale factor increases. For this purpose, we developed the refined model, RMSEσ2, to improve MSEσ2. Simulations on both white noise and 1 / f noise show that RMSEσ2 provides higher entropy reliability and reduces the occurrence of undefined entropy, especially suitable for short time series. Besides, we discuss the effect on RMSEσ2 analysis from outliers, data loss and other concepts in signal processing. We apply the proposed model to evaluate the complexity of heartbeat interval time series derived from healthy young and elderly subjects, patients with congestive heart failure and patients with atrial fibrillation respectively, compared to several popular complexity metrics. The results demonstrate that RMSEσ2 measured complexity (a) decreases with aging and diseases, and (b) gives significant discrimination between different physiological/pathological states, which may facilitate clinical application.
Change detection in a time series of polarimetric SAR data by an omnibus test statistic and its factorization (Conference Presentation)

NASA Astrophysics Data System (ADS)

Nielsen, Allan A.; Conradsen, Knut; Skriver, Henning

2016-10-01

Test statistics for comparison of real (as opposed to complex) variance-covariance matrices exist in the statistics literature [1]. In earlier publications we have described a test statistic for the equality of two variance-covariance matrices following the complex Wishart distribution with an associated p-value [2]. We showed their application to bitemporal change detection and to edge detection [3] in multilook, polarimetric synthetic aperture radar (SAR) data in the covariance matrix representation [4]. The test statistic and the associated p-value is described in [5] also. In [6] we focussed on the block-diagonal case, we elaborated on some computer implementation issues, and we gave examples on the application to change detection in both full and dual polarization bitemporal, bifrequency, multilook SAR data. In [7] we described an omnibus test statistic Q for the equality of k variance-covariance matrices following the complex Wishart distribution. We also described a factorization of Q = R2 R3 … Rk where Q and Rj determine if and when a difference occurs. Additionally, we gave p-values for Q and Rj. Finally, we demonstrated the use of Q and Rj and the p-values to change detection in truly multitemporal, full polarization SAR data. Here we illustrate the methods by means of airborne L-band SAR data (EMISAR) [8,9]. The methods may be applied to other polarimetric SAR data also such as data from Sentinel-1, COSMO-SkyMed, TerraSAR-X, ALOS, and RadarSat-2 and also to single-pol data. The account given here closely follows that given our recent IEEE TGRS paper [7]. Selected References [1] Anderson, T. W., An Introduction to Multivariate Statistical Analysis, John Wiley, New York, third ed. (2003). [2] Conradsen, K., Nielsen, A. A., Schou, J., and Skriver, H., "A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data," IEEE Transactions on Geoscience and Remote Sensing 41(1): 4-19, 2003. [3] Schou, J., Skriver, H., Nielsen, A. A., and Conradsen, K., "CFAR edge detector for polarimetric SAR images," IEEE Transactions on Geoscience and Remote Sensing 41(1): 20-32, 2003. [4] van Zyl, J. J. and Ulaby, F. T., "Scattering matrix representation for simple targets," in Radar Polarimetry for Geoscience Applications, Ulaby, F. T. and Elachi, C., eds., Artech, Norwood, MA (1990). [5] Canty, M. J., Image Analysis, Classification and Change Detection in Remote Sensing,with Algorithms for ENVI/IDL and Python, Taylor & Francis, CRC Press, third revised ed. (2014). [6] Nielsen, A. A., Conradsen, K., and Skriver, H., "Change detection in full and dual polarization, single- and multi-frequency SAR data," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8(8): 4041-4048, 2015. [7] Conradsen, K., Nielsen, A. A., and Skriver, H., "Determining the points of change in time series of polarimetric SAR data," IEEE Transactions on Geoscience and Remote Sensing 54(5), 3007-3024, 2016. [9] Christensen, E. L., Skou, N., Dall, J., Woelders, K., rgensen, J. H. J., Granholm, J., and Madsen, S. N., "EMISAR: An absolutely calibrated polarimetric L- and C-band SAR," IEEE Transactions on Geoscience and Remote Sensing 36: 1852-1865 (1998).
Applied immuno-epidemiological research: an approach for integrating existing knowledge into the statistical analysis of multiple immune markers.

PubMed

Genser, Bernd; Fischer, Joachim E; Figueiredo, Camila A; Alcântara-Neves, Neuza; Barreto, Mauricio L; Cooper, Philip J; Amorim, Leila D; Saemann, Marcus D; Weichhart, Thomas; Rodrigues, Laura C

2016-05-20

Immunologists often measure several correlated immunological markers, such as concentrations of different cytokines produced by different immune cells and/or measured under different conditions, to draw insights from complex immunological mechanisms. Although there have been recent methodological efforts to improve the statistical analysis of immunological data, a framework is still needed for the simultaneous analysis of multiple, often correlated, immune markers. This framework would allow the immunologists' hypotheses about the underlying biological mechanisms to be integrated. We present an analytical approach for statistical analysis of correlated immune markers, such as those commonly collected in modern immuno-epidemiological studies. We demonstrate i) how to deal with interdependencies among multiple measurements of the same immune marker, ii) how to analyse association patterns among different markers, iii) how to aggregate different measures and/or markers to immunological summary scores, iv) how to model the inter-relationships among these scores, and v) how to use these scores in epidemiological association analyses. We illustrate the application of our approach to multiple cytokine measurements from 818 children enrolled in a large immuno-epidemiological study (SCAALA Salvador), which aimed to quantify the major immunological mechanisms underlying atopic diseases or asthma. We demonstrate how to aggregate systematically the information captured in multiple cytokine measurements to immunological summary scores aimed at reflecting the presumed underlying immunological mechanisms (Th1/Th2 balance and immune regulatory network). We show how these aggregated immune scores can be used as predictors in regression models with outcomes of immunological studies (e.g. specific IgE) and compare the results to those obtained by a traditional multivariate regression approach. The proposed analytical approach may be especially useful to quantify complex immune responses in immuno-epidemiological studies, where investigators examine the relationship among epidemiological patterns, immune response, and disease outcomes.
Magnetorotational dynamo chimeras. The missing link to turbulent accretion disk dynamo models?

NASA Astrophysics Data System (ADS)

Riols, A.; Rincon, F.; Cossu, C.; Lesur, G.; Ogilvie, G. I.; Longaretti, P.-Y.

2017-02-01

In Keplerian accretion disks, turbulence and magnetic fields may be jointly excited through a subcritical dynamo mechanisminvolving magnetorotational instability (MRI). This dynamo may notably contribute to explaining the time-variability of various accreting systems, as high-resolution simulations of MRI dynamo turbulence exhibit statistical self-organization into large-scale cyclic dynamics. However, understanding the physics underlying these statistical states and assessing their exact astrophysical relevance is theoretically challenging. The study of simple periodic nonlinear MRI dynamo solutions has recently proven useful in this respect, and has highlighted the role of turbulent magnetic diffusion in the seeming impossibility of a dynamo at low magnetic Prandtl number (Pm), a common regime in disks. Arguably though, these simple laminar structures may not be fully representative of the complex, statistically self-organized states expected in astrophysical regimes. Here, we aim at closing this seeming discrepancy by reporting the numerical discovery of exactly periodic, yet semi-statistical "chimeral MRI dynamo states" which are the organized outcome of a succession of MRI-unstable, non-axisymmetric dynamical stages of different forms and amplitudes. Interestingly, these states, while reminiscent of the statistical complexity of turbulent simulations, involve the same physical principles as simpler laminar cycles, and their analysis further confirms the theory that subcritical turbulent magnetic diffusion impedes the sustainment of an MRI dynamo at low Pm. Overall, chimera dynamo cycles therefore offer an unprecedented dual physical and statistical perspective on dynamos in rotating shear flows, which may prove useful in devising more accurate, yet intuitive mean-field models of time-dependent turbulent disk dynamos. Movies associated to Fig. 1 are available at http://www.aanda.org
Coupled disease-behavior dynamics on complex networks: A review.

PubMed

Wang, Zhen; Andrews, Michael A; Wu, Zhi-Xi; Wang, Lin; Bauch, Chris T

2015-12-01

It is increasingly recognized that a key component of successful infection control efforts is understanding the complex, two-way interaction between disease dynamics and human behavioral and social dynamics. Human behavior such as contact precautions and social distancing clearly influence disease prevalence, but disease prevalence can in turn alter human behavior, forming a coupled, nonlinear system. Moreover, in many cases, the spatial structure of the population cannot be ignored, such that social and behavioral processes and/or transmission of infection must be represented with complex networks. Research on studying coupled disease-behavior dynamics in complex networks in particular is growing rapidly, and frequently makes use of analysis methods and concepts from statistical physics. Here, we review some of the growing literature in this area. We contrast network-based approaches to homogeneous-mixing approaches, point out how their predictions differ, and describe the rich and often surprising behavior of disease-behavior dynamics on complex networks, and compare them to processes in statistical physics. We discuss how these models can capture the dynamics that characterize many real-world scenarios, thereby suggesting ways that policy makers can better design effective prevention strategies. We also describe the growing sources of digital data that are facilitating research in this area. Finally, we suggest pitfalls which might be faced by researchers in the field, and we suggest several ways in which the field could move forward in the coming years. Copyright © 2015 Elsevier B.V. All rights reserved.
Protein and gene model inference based on statistical modeling in k-partite graphs.

PubMed

Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter

2010-07-06

One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Partitioning heritability by functional annotation using genome-wide association summary statistics.

PubMed

Finucane, Hilary K; Bulik-Sullivan, Brendan; Gusev, Alexander; Trynka, Gosia; Reshef, Yakir; Loh, Po-Ru; Anttila, Verneri; Xu, Han; Zang, Chongzhi; Farh, Kyle; Ripke, Stephan; Day, Felix R; Purcell, Shaun; Stahl, Eli; Lindstrom, Sara; Perry, John R B; Okada, Yukinori; Raychaudhuri, Soumya; Daly, Mark J; Patterson, Nick; Neale, Benjamin M; Price, Alkes L

2015-11-01

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.
Untargeted Metabolic Quantitative Trait Loci Analyses Reveal a Relationship between Primary Metabolism and Potato Tuber Quality1[W][OA

PubMed Central

Carreno-Quintero, Natalia; Acharjee, Animesh; Maliepaard, Chris; Bachem, Christian W.B.; Mumm, Roland; Bouwmeester, Harro; Visser, Richard G.F.; Keurentjes, Joost J.B.

2012-01-01

Recent advances in -omics technologies such as transcriptomics, metabolomics, and proteomics along with genotypic profiling have permitted dissection of the genetics of complex traits represented by molecular phenotypes in nonmodel species. To identify the genetic factors underlying variation in primary metabolism in potato (Solanum tuberosum), we have profiled primary metabolite content in a diploid potato mapping population, derived from crosses between S. tuberosum and wild relatives, using gas chromatography-time of flight-mass spectrometry. In total, 139 polar metabolites were detected, of which we identified metabolite quantitative trait loci for approximately 72% of the detected compounds. In order to obtain an insight into the relationships between metabolic traits and classical phenotypic traits, we also analyzed statistical associations between them. The combined analysis of genetic information through quantitative trait locus coincidence and the application of statistical learning methods provide information on putative indicators associated with the alterations in metabolic networks that affect complex phenotypic traits. PMID:22223596

Dissecting the genetics of complex traits using summary association statistics.

PubMed

Pasaniuc, Bogdan; Price, Alkes L

2017-02-01

During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Dissecting the genetics of complex traits using summary association statistics

PubMed Central

Pasaniuc, Bogdan; Price, Alkes L.

2017-01-01

During the past decade, genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyze summary association statistics. Here we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases. PMID:27840428
A Not-So-Fundamental Limitation on Studying Complex Systems with Statistics: Comment on Rabin (2011)

NASA Astrophysics Data System (ADS)

Thomas, Drew M.

2012-12-01

Although living organisms are affected by many interrelated and unidentified variables, this complexity does not automatically impose a fundamental limitation on statistical inference. Nor need one invoke such complexity as an explanation of the "Truth Wears Off" or "decline" effect; similar "decline" effects occur with far simpler systems studied in physics. Selective reporting and publication bias, and scientists' biases in favor of reporting eye-catching results (in general) or conforming to others' results (in physics) better explain this feature of the "Truth Wears Off" effect than Rabin's suggested limitation on statistical inference.
Multi-objective calibration and uncertainty analysis of hydrologic models; A comparative study between formal and informal methods

NASA Astrophysics Data System (ADS)

Shafii, M.; Tolson, B.; Matott, L. S.

2012-04-01

Hydrologic modeling has benefited from significant developments over the past two decades. This has resulted in building of higher levels of complexity into hydrologic models, which eventually makes the model evaluation process (parameter estimation via calibration and uncertainty analysis) more challenging. In order to avoid unreasonable parameter estimates, many researchers have suggested implementation of multi-criteria calibration schemes. Furthermore, for predictive hydrologic models to be useful, proper consideration of uncertainty is essential. Consequently, recent research has emphasized comprehensive model assessment procedures in which multi-criteria parameter estimation is combined with statistically-based uncertainty analysis routines such as Bayesian inference using Markov Chain Monte Carlo (MCMC) sampling. Such a procedure relies on the use of formal likelihood functions based on statistical assumptions, and moreover, the Bayesian inference structured on MCMC samplers requires a considerably large number of simulations. Due to these issues, especially in complex non-linear hydrological models, a variety of alternative informal approaches have been proposed for uncertainty analysis in the multi-criteria context. This study aims at exploring a number of such informal uncertainty analysis techniques in multi-criteria calibration of hydrological models. The informal methods addressed in this study are (i) Pareto optimality which quantifies the parameter uncertainty using the Pareto solutions, (ii) DDS-AU which uses the weighted sum of objective functions to derive the prediction limits, and (iii) GLUE which describes the total uncertainty through identification of behavioral solutions. The main objective is to compare such methods with MCMC-based Bayesian inference with respect to factors such as computational burden, and predictive capacity, which are evaluated based on multiple comparative measures. The measures for comparison are calculated both for calibration and evaluation periods. The uncertainty analysis methodologies are applied to a simple 5-parameter rainfall-runoff model, called HYMOD.
Variability in Rheumatology day care hospitals in Spain: VALORA study.

PubMed

Hernández Miguel, María Victoria; Martín Martínez, María Auxiliadora; Corominas, Héctor; Sanchez-Piedra, Carlos; Sanmartí, Raimon; Fernandez Martinez, Carmen; García-Vicuña, Rosario

To describe the variability of the day care hospital units (DCHUs) of Rheumatology in Spain, in terms of structural resources and operating processes. Multicenter descriptive study with data from a self-completed questionnaire of DCHUs self-assessment based on DCHUs quality standards of the Spanish Society of Rheumatology. Structural resources and operating processes were analyzed and stratified by hospital complexity (regional, general, major and complex). Variability was determined using the coefficient of variation (CV) of the variable with clinical relevance that presented statistically significant differences when was compared by centers. A total of 89 hospitals (16 autonomous regions and Melilla) were included in the analysis. 11.2% of hospitals are regional, 22,5% general, 27%, major and 39,3% complex. A total of 92% of DCHUs were polyvalent. The number of treatments applied, the coordination between DCHUs and hospital pharmacy and the post graduate training process were the variables that showed statistically significant differences depending on the complexity of hospital. The highest rate of rheumatologic treatments was found in complex hospitals (2.97 per 1,000 population), and the lowest in general hospitals (2.01 per 1,000 population). The CV was 0.88 in major hospitals; 0.86 in regional; 0.76 in general, and 0.72 in the complex. there was variability in the number of treatments delivered in DCHUs, being greater in major hospitals and then in regional centers. Nonetheless, the variability in terms of structure and function does not seem due to differences in center complexity. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Reumatología y Colegio Mexicano de Reumatología. All rights reserved.
An inferentialist perspective on the coordination of actions and reasons involved in making a statistical inference

NASA Astrophysics Data System (ADS)

Bakker, Arthur; Ben-Zvi, Dani; Makar, Katie

2017-12-01

To understand how statistical and other types of reasoning are coordinated with actions to reduce uncertainty, we conducted a case study in vocational education that involved statistical hypothesis testing. We analyzed an intern's research project in a hospital laboratory in which reducing uncertainties was crucial to make a valid statistical inference. In his project, the intern, Sam, investigated whether patients' blood could be sent through pneumatic post without influencing the measurement of particular blood components. We asked, in the process of making a statistical inference, how are reasons and actions coordinated to reduce uncertainty? For the analysis, we used the semantic theory of inferentialism, specifically, the concept of webs of reasons and actions—complexes of interconnected reasons for facts and actions; these reasons include premises and conclusions, inferential relations, implications, motives for action, and utility of tools for specific purposes in a particular context. Analysis of interviews with Sam, his supervisor and teacher as well as video data of Sam in the classroom showed that many of Sam's actions aimed to reduce variability, rule out errors, and thus reduce uncertainties so as to arrive at a valid inference. Interestingly, the decisive factor was not the outcome of a t test but of the reference change value, a clinical chemical measure of analytic and biological variability. With insights from this case study, we expect that students can be better supported in connecting statistics with context and in dealing with uncertainty.
Bayesian selection of Markov models for symbol sequences: application to microsaccadic eye movements.

PubMed

Bettenbühl, Mario; Rusconi, Marco; Engbert, Ralf; Holschneider, Matthias

2012-01-01

Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.
Application of spatial technology in malaria research & control: some new insights.

PubMed

Saxena, Rekha; Nagpal, B N; Srivastava, Aruna; Gupta, S K; Dash, A P

2009-08-01

Geographical information System (GIS) has emerged as the core of the spatial technology which integrates wide range of dataset available from different sources including Remote Sensing (RS) and Global Positioning System (GPS). Literature published during the decade (1998-2007) has been compiled and grouped into six categories according to the usage of the technology in malaria epidemiology. Different GIS modules like spatial data sources, mapping and geo-processing tools, distance calculation, digital elevation model (DEM), buffer zone and geo-statistical analysis have been investigated in detail, illustrated with examples as per the derived results. These GIS tools have contributed immensely in understanding the epidemiological processes of malaria and examples drawn have shown that GIS is now widely used for research and decision making in malaria control. Statistical data analysis currently is the most consistent and established set of tools to analyze spatial datasets. The desired future development of GIS is in line with the utilization of geo-statistical tools which combined with high quality data has capability to provide new insight into malaria epidemiology and the complexity of its transmission potential in endemic areas.
Hydration sites of unpaired RNA bases: a statistical analysis of the PDB structures.

PubMed

Kirillova, Svetlana; Carugo, Oliviero

2011-10-19

Hydration is crucial for RNA structure and function. X-ray crystallography is the most commonly used method to determine RNA structures and hydration and, therefore, statistical surveys are based on crystallographic results, the number of which is quickly increasing. A statistical analysis of the water molecule distribution in high-resolution X-ray structures of unpaired RNA nucleotides showed that: different bases have the same penchant to be surrounded by water molecules; clusters of water molecules indicate possible hydration sites, which, in some cases, match those of the major and minor grooves of RNA and DNA double helices; complex hydrogen bond networks characterize the solvation of the nucleotides, resulting in a significant rigidity of the base and its surrounding water molecules. Interestingly, the hydration sites around unpaired RNA bases do not match, in general, the positions that are occupied by the second nucleotide when the base-pair is formed. The hydration sites around unpaired RNA bases were found. They do not replicate the atom positions of complementary bases in the Watson-Crick pairs.
Hydration sites of unpaired RNA bases: a statistical analysis of the PDB structures

PubMed Central

2011-01-01

Background Hydration is crucial for RNA structure and function. X-ray crystallography is the most commonly used method to determine RNA structures and hydration and, therefore, statistical surveys are based on crystallographic results, the number of which is quickly increasing. Results A statistical analysis of the water molecule distribution in high-resolution X-ray structures of unpaired RNA nucleotides showed that: different bases have the same penchant to be surrounded by water molecules; clusters of water molecules indicate possible hydration sites, which, in some cases, match those of the major and minor grooves of RNA and DNA double helices; complex hydrogen bond networks characterize the solvation of the nucleotides, resulting in a significant rigidity of the base and its surrounding water molecules. Interestingly, the hydration sites around unpaired RNA bases do not match, in general, the positions that are occupied by the second nucleotide when the base-pair is formed. Conclusions The hydration sites around unpaired RNA bases were found. They do not replicate the atom positions of complementary bases in the Watson-Crick pairs. PMID:22011380
Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

NASA Astrophysics Data System (ADS)

Guadagnini, A.; Riva, M.; Dell'Oca, A.

2017-12-01

We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.
Atrial Electrogram Fractionation Distribution before and after Pulmonary Vein Isolation in Human Persistent Atrial Fibrillation-A Retrospective Multivariate Statistical Analysis.

PubMed

Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André

2017-01-01

Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
Evidence of the non-extensive character of Earth's ambient noise.

NASA Astrophysics Data System (ADS)

Koutalonis, Ioannis; Vallianatos, Filippos

2017-04-01

Investigation of dynamical features of ambient seismic noise is one of the important scientific and practical research challenges. In the same time there isgrowing interest concerning an approach to study Earth Physics based on thescience of complex systems and non extensive statistical mechanics which is a generalization of Boltzmann-Gibbs statistical physics (Vallianatos et al., 2016).This seems to be a promising framework for studying complex systems exhibitingphenomena such as, long-range interactions, and memory effects. Inthis work we use non-extensive statistical mechanics and signal analysis methodsto explore the nature of ambient noise as measured in the stations of the HSNC in South Aegean (Chatzopoulos et al., 2016). In the present work we analyzed the de-trended increments time series of ambient seismic noise X(t), in time windows of 20 minutes to 10 seconds within "calm time zones" where the human-induced noise presents a minimum. Following the non extensive statistical physics approach, the probability distribution function of the increments of ambient noise is investigated. Analyzing the probability density function (PDF)p(X), normalized to zero mean and unit varianceresults that the fluctuations of Earth's ambient noise follows a q-Gaussian distribution asdefined in the frame of non-extensive statisticalmechanics indicated the possible existence of memory effects in Earth's ambient noise. References: F. Vallianatos, G. Papadakis, G. Michas, Generalized statistical mechanics approaches to earthquakes and tectonics. Proc. R. Soc. A, 472, 20160497, 2016. G. Chatzopoulos, I.Papadopoulos, F.Vallianatos, The Hellenic Seismological Network of Crete (HSNC): Validation and results of the 2013 aftershock,Advances in Geosciences, 41, 65-72, 2016.
Evaluation of Solid Rocket Motor Component Data Using a Commercially Available Statistical Software Package

NASA Technical Reports Server (NTRS)

Stefanski, Philip L.

2015-01-01

Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.
A perceptual space of local image statistics.

PubMed

Victor, Jonathan D; Thengone, Daniel J; Rizvi, Syed M; Conte, Mary M

2015-12-01

Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice - a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4min. In sum, local image statistics form a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules. Copyright © 2015 Elsevier Ltd. All rights reserved.
A perceptual space of local image statistics

PubMed Central

Victor, Jonathan D.; Thengone, Daniel J.; Rizvi, Syed M.; Conte, Mary M.

2015-01-01

Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice – a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14 min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4 min. In sum, local image statistics forms a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules. PMID:26130606
Spectroscopic and Statistical Techniques for Information Recovery in Metabonomics and Metabolomics

NASA Astrophysics Data System (ADS)

Lindon, John C.; Nicholson, Jeremy K.

2008-07-01

Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.
Spectroscopic and statistical techniques for information recovery in metabonomics and metabolomics.

PubMed

Lindon, John C; Nicholson, Jeremy K

2008-01-01

Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.
The New Maia Detector System: Methods For High Definition Trace Element Imaging Of Natural Material

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ryan, C. G.; School of Physics, University of Melbourne, Parkville VIC; CODES Centre of Excellence, University of Tasmania, Hobart TAS

2010-04-06

Motivated by the need for megapixel high definition trace element imaging to capture intricate detail in natural material, together with faster acquisition and improved counting statistics in elemental imaging, a large energy-dispersive detector array called Maia has been developed by CSIRO and BNL for SXRF imaging on the XFM beamline at the Australian Synchrotron. A 96 detector prototype demonstrated the capacity of the system for real-time deconvolution of complex spectral data using an embedded implementation of the Dynamic Analysis method and acquiring highly detailed images up to 77 M pixels spanning large areas of complex mineral sample sections.
The New Maia Detector System: Methods For High Definition Trace Element Imaging Of Natural Material

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ryan, C.G.; Siddons, D.P.; Kirkham, R.

2010-05-25

Motivated by the need for megapixel high definition trace element imaging to capture intricate detail in natural material, together with faster acquisition and improved counting statistics in elemental imaging, a large energy-dispersive detector array called Maia has been developed by CSIRO and BNL for SXRF imaging on the XFM beamline at the Australian Synchrotron. A 96 detector prototype demonstrated the capacity of the system for real-time deconvolution of complex spectral data using an embedded implementation of the Dynamic Analysis method and acquiring highly detailed images up to 77 M pixels spanning large areas of complex mineral sample sections.

Arsenic distribution and valence state variation studied by fast hierarchical length-scale morphological, compositional, and speciation imaging at the Nanoscopium, Synchrotron Soleil

NASA Astrophysics Data System (ADS)

Somogyi, Andrea; Medjoubi, Kadda; Sancho-Tomas, Maria; Visscher, P. T.; Baranton, Gil; Philippot, Pascal

2017-09-01

The understanding of real complex geological, environmental and geo-biological processes depends increasingly on in-depth non-invasive study of chemical composition and morphology. In this paper we used scanning hard X-ray nanoprobe techniques in order to study the elemental composition, morphology and As speciation in complex highly heterogeneous geological samples. Multivariate statistical analytical techniques, such as principal component analysis and clustering were used for data interpretation. These measurements revealed the quantitative and valance state inhomogeneity of As and its relation to the total compositional and morphological variation of the sample at sub-μm scales.
Controls of multi-modal wave conditions in a complex coastal setting

USGS Publications Warehouse

Hegermiller, Christie; Rueda, Ana C.; Erikson, Li H.; Barnard, Patrick L.; Antolinez, J.A.A.; Mendez, Fernando J.

2017-01-01

Coastal hazards emerge from the combined effect of wave conditions and sea level anomalies associated with storms or low-frequency atmosphere-ocean oscillations. Rigorous characterization of wave climate is limited by the availability of spectral wave observations, the computational cost of dynamical simulations, and the ability to link wave-generating atmospheric patterns with coastal conditions. We present a hybrid statistical-dynamical approach to simulating nearshore wave climate in complex coastal settings, demonstrated in the Southern California Bight, where waves arriving from distant, disparate locations are refracted over complex bathymetry and shadowed by offshore islands. Contributions of wave families and large-scale atmospheric drivers to nearshore wave energy flux are analyzed. Results highlight the variability of influences controlling wave conditions along neighboring coastlines. The universal method demonstrated here can be applied to complex coastal settings worldwide, facilitating analysis of the effects of climate change on nearshore wave climate.
Controls of Multimodal Wave Conditions in a Complex Coastal Setting

NASA Astrophysics Data System (ADS)

Hegermiller, C. A.; Rueda, A.; Erikson, L. H.; Barnard, P. L.; Antolinez, J. A. A.; Mendez, F. J.

2017-12-01

Coastal hazards emerge from the combined effect of wave conditions and sea level anomalies associated with storms or low-frequency atmosphere-ocean oscillations. Rigorous characterization of wave climate is limited by the availability of spectral wave observations, the computational cost of dynamical simulations, and the ability to link wave-generating atmospheric patterns with coastal conditions. We present a hybrid statistical-dynamical approach to simulating nearshore wave climate in complex coastal settings, demonstrated in the Southern California Bight, where waves arriving from distant, disparate locations are refracted over complex bathymetry and shadowed by offshore islands. Contributions of wave families and large-scale atmospheric drivers to nearshore wave energy flux are analyzed. Results highlight the variability of influences controlling wave conditions along neighboring coastlines. The universal method demonstrated here can be applied to complex coastal settings worldwide, facilitating analysis of the effects of climate change on nearshore wave climate.
Ariadne's Thread: A Robust Software Solution Leading to Automated Absolute and Relative Quantification of SRM Data.

PubMed

Nasso, Sara; Goetze, Sandra; Martens, Lennart

2015-09-04

Selected reaction monitoring (SRM) MS is a highly selective and sensitive technique to quantify protein abundances in complex biological samples. To enhance the pace of SRM large studies, a validated, robust method to fully automate absolute quantification and to substitute for interactive evaluation would be valuable. To address this demand, we present Ariadne, a Matlab software. To quantify monitored targets, Ariadne exploits metadata imported from the transition lists, and targets can be filtered according to mProphet output. Signal processing and statistical learning approaches are combined to compute peptide quantifications. To robustly estimate absolute abundances, the external calibration curve method is applied, ensuring linearity over the measured dynamic range. Ariadne was benchmarked against mProphet and Skyline by comparing its quantification performance on three different dilution series, featuring either noisy/smooth traces without background or smooth traces with complex background. Results, evaluated as efficiency, linearity, accuracy, and precision of quantification, showed that Ariadne's performance is independent of data smoothness and complex background presence and that Ariadne outperforms mProphet on the noisier data set and improved 2-fold Skyline's accuracy and precision for the lowest abundant dilution with complex background. Remarkably, Ariadne could statistically distinguish from each other all different abundances, discriminating dilutions as low as 0.1 and 0.2 fmol. These results suggest that Ariadne offers reliable and automated analysis of large-scale SRM differential expression studies.
Analyzing the causation of a railway accident based on a complex network

NASA Astrophysics Data System (ADS)

Ma, Xin; Li, Ke-Ping; Luo, Zi-Yan; Zhou, Jin

2014-02-01

In this paper, a new model is constructed for the causation analysis of railway accident based on the complex network theory. In the model, the nodes are defined as various manifest or latent accident causal factors. By employing the complex network theory, especially its statistical indicators, the railway accident as well as its key causations can be analyzed from the overall perspective. As a case, the “7.23” China—Yongwen railway accident is illustrated based on this model. The results show that the inspection of signals and the checking of line conditions before trains run played an important role in this railway accident. In conclusion, the constructed model gives a theoretical clue for railway accident prediction and, hence, greatly reduces the occurrence of railway accidents.
Functional Genomics Assistant (FUGA): a toolbox for the analysis of complex biological networks

PubMed Central

2011-01-01

Background Cellular constituents such as proteins, DNA, and RNA form a complex web of interactions that regulate biochemical homeostasis and determine the dynamic cellular response to external stimuli. It follows that detailed understanding of these patterns is critical for the assessment of fundamental processes in cell biology and pathology. Representation and analysis of cellular constituents through network principles is a promising and popular analytical avenue towards a deeper understanding of molecular mechanisms in a system-wide context. Findings We present Functional Genomics Assistant (FUGA) - an extensible and portable MATLAB toolbox for the inference of biological relationships, graph topology analysis, random network simulation, network clustering, and functional enrichment statistics. In contrast to conventional differential expression analysis of individual genes, FUGA offers a framework for the study of system-wide properties of biological networks and highlights putative molecular targets using concepts of systems biology. Conclusion FUGA offers a simple and customizable framework for network analysis in a variety of systems biology applications. It is freely available for individual or academic use at http://code.google.com/p/fuga. PMID:22035155
Sorting of Streptomyces Cell Pellets Using a Complex Object Parametric Analyzer and Sorter

PubMed Central

Petrus, Marloes L. C.; van Veluw, G. Jerre; Wösten, Han A. B.; Claessen, Dennis

2014-01-01

Streptomycetes are filamentous soil bacteria that are used in industry for the production of enzymes and antibiotics. When grown in bioreactors, these organisms form networks of interconnected hyphae, known as pellets, which are heterogeneous in size. Here we describe a method to analyze and sort mycelial pellets using a Complex Object Parametric Analyzer and Sorter (COPAS). Detailed instructions are given for the use of the instrument and the basic statistical analysis of the data. We furthermore describe how pellets can be sorted according to user-defined settings, which enables downstream processing such as the analysis of the RNA or protein content. Using this methodology the mechanism underlying heterogeneous growth can be tackled. This will be instrumental for improving streptomycetes as a cell factory, considering the fact that productivity correlates with pellet size. PMID:24561666
Trends in Mediation Analysis in Nursing Research: Improving Current Practice.

PubMed

Hertzog, Melody

2018-06-01

The purpose of this study was to describe common approaches used by nursing researchers to test mediation models and evaluate them within the context of current methodological advances. MEDLINE was used to locate studies testing a mediation model and published from 2004 to 2015 in nursing journals. Design (experimental/correlation, cross-sectional/longitudinal, model complexity) and analysis (method, inclusion of test of mediated effect, violations/discussion of assumptions, sample size/power) characteristics were coded for 456 studies. General trends were identified using descriptive statistics. Consistent with findings of reviews in other disciplines, evidence was found that nursing researchers may not be aware of the strong assumptions and serious limitations of their analyses. Suggestions for strengthening the rigor of such studies and an overview of current methods for testing more complex models, including longitudinal mediation processes, are presented.
Trade Studies of Space Launch Architectures using Modular Probabilistic Risk Analysis

NASA Technical Reports Server (NTRS)

Mathias, Donovan L.; Go, Susie

2006-01-01

A top-down risk assessment in the early phases of space exploration architecture development can provide understanding and intuition of the potential risks associated with new designs and technologies. In this approach, risk analysts draw from their past experience and the heritage of similar existing systems as a source for reliability data. This top-down approach captures the complex interactions of the risk driving parts of the integrated system without requiring detailed knowledge of the parts themselves, which is often unavailable in the early design stages. Traditional probabilistic risk analysis (PRA) technologies, however, suffer several drawbacks that limit their timely application to complex technology development programs. The most restrictive of these is a dependence on static planning scenarios, expressed through fault and event trees. Fault trees incorporating comprehensive mission scenarios are routinely constructed for complex space systems, and several commercial software products are available for evaluating fault statistics. These static representations cannot capture the dynamic behavior of system failures without substantial modification of the initial tree. Consequently, the development of dynamic models using fault tree analysis has been an active area of research in recent years. This paper discusses the implementation and demonstration of dynamic, modular scenario modeling for integration of subsystem fault evaluation modules using the Space Architecture Failure Evaluation (SAFE) tool. SAFE is a C++ code that was originally developed to support NASA s Space Launch Initiative. It provides a flexible framework for system architecture definition and trade studies. SAFE supports extensible modeling of dynamic, time-dependent risk drivers of the system and functions at the level of fidelity for which design and failure data exists. The approach is scalable, allowing inclusion of additional information as detailed data becomes available. The tool performs a Monte Carlo analysis to provide statistical estimates. Example results of an architecture system reliability study are summarized for an exploration system concept using heritage data from liquid-fueled expendable Saturn V/Apollo launch vehicles.
Characterization of Native Protein Complexes and Protein Isoform Variation Using Size-fractionation-based Quantitative Proteomics*

PubMed Central

Kirkwood, Kathryn J.; Ahmad, Yasmeen; Larance, Mark; Lamond, Angus I.

2013-01-01

Proteins form a diverse array of complexes that mediate cellular function and regulation. A largely unexplored feature of such protein complexes is the selective participation of specific protein isoforms and/or post-translationally modified forms. In this study, we combined native size-exclusion chromatography (SEC) with high-throughput proteomic analysis to characterize soluble protein complexes isolated from human osteosarcoma (U2OS) cells. Using this approach, we have identified over 71,500 peptides and 1,600 phosphosites, corresponding to over 8,000 proteins, distributed across 40 SEC fractions. This represents >50% of the predicted U2OS cell proteome, identified with a mean peptide sequence coverage of 27% per protein. Three biological replicates were performed, allowing statistical evaluation of the data and demonstrating a high degree of reproducibility in the SEC fractionation procedure. Specific proteins were detected interacting with multiple independent complexes, as typified by the separation of distinct complexes for the MRFAP1-MORF4L1-MRGBP interaction network. The data also revealed protein isoforms and post-translational modifications that selectively associated with distinct subsets of protein complexes. Surprisingly, there was clear enrichment for specific Gene Ontology terms associated with differential size classes of protein complexes. This study demonstrates that combined SEC/MS analysis can be used for the system-wide annotation of protein complexes and to predict potential isoform-specific interactions. All of these SEC data on the native separation of protein complexes have been integrated within the Encyclopedia of Proteome Dynamics, an online, multidimensional data-sharing resource available to the community. PMID:24043423
Characterization of native protein complexes and protein isoform variation using size-fractionation-based quantitative proteomics.

PubMed

Kirkwood, Kathryn J; Ahmad, Yasmeen; Larance, Mark; Lamond, Angus I

2013-12-01

Proteins form a diverse array of complexes that mediate cellular function and regulation. A largely unexplored feature of such protein complexes is the selective participation of specific protein isoforms and/or post-translationally modified forms. In this study, we combined native size-exclusion chromatography (SEC) with high-throughput proteomic analysis to characterize soluble protein complexes isolated from human osteosarcoma (U2OS) cells. Using this approach, we have identified over 71,500 peptides and 1,600 phosphosites, corresponding to over 8,000 proteins, distributed across 40 SEC fractions. This represents >50% of the predicted U2OS cell proteome, identified with a mean peptide sequence coverage of 27% per protein. Three biological replicates were performed, allowing statistical evaluation of the data and demonstrating a high degree of reproducibility in the SEC fractionation procedure. Specific proteins were detected interacting with multiple independent complexes, as typified by the separation of distinct complexes for the MRFAP1-MORF4L1-MRGBP interaction network. The data also revealed protein isoforms and post-translational modifications that selectively associated with distinct subsets of protein complexes. Surprisingly, there was clear enrichment for specific Gene Ontology terms associated with differential size classes of protein complexes. This study demonstrates that combined SEC/MS analysis can be used for the system-wide annotation of protein complexes and to predict potential isoform-specific interactions. All of these SEC data on the native separation of protein complexes have been integrated within the Encyclopedia of Proteome Dynamics, an online, multidimensional data-sharing resource available to the community.
Forensic-paternity effectiveness and genetics population analysis of six non-CODIS mini-STR loci (D1S1656, D2S441, D6S1043, D10S1248, D12S391, D22S1045) and SE33 in Mestizo and Amerindian populations from Mexico.

PubMed

Burguete-Argueta, Nelsi; Martínez De la Cruz, Braulio; Camacho-Mejorado, Rafael; Santana, Carla; Noris, Gino; López-Bayghen, Esther; Arellano-Galindo, José; Majluf-Cruz, Abraham; Antonio Meraz-Ríos, Marco; Gómez, Rocío

2016-11-01

STRs are powerful tools intensively used in forensic and kinship studies. In order to assess the effectiveness of non-CODIS genetic markers in forensic and paternity tests, the genetic composition of six mini short tandem repeats-mini-STRs-(D1S1656, D2S441, D6S1043, D10S1248, D12S391, D22S1045) and the microsatellite SE33 in Mestizo and Amerindian populations from Mexico were studied. Using multiplex polymerase chain reactions and capillary electrophoresis, this study genotyped all loci from 870 chromosomes and evaluated the statistical genetic parameters. All mini-STRs studied were in agreement with HW and linkage equilibrium; however, an important HW departure for SE33 was found in the Mestizo population (p ≤ 0.0001). Regarding paternity and forensic statistical parameters, high values of combined power discrimination and mean power of exclusion were found using these seven markers. The principal co-ordinate analysis based on allele frequencies of three mini-STRs showed the complex genetic architecture of the Mestizo population. The results indicate that this set of loci is suitable to genetically identify individuals in the Mexican population, supporting its effectiveness in human identification casework. In addition, these findings add new statistical values and emphasise the importance of the use of non-CODIS markers in complex populations in order to avoid erroneous assumptions.
Selection of nontarget arthropod taxa for field research on transgenic insecticidal crops: using empirical data and statistical power.

PubMed

Prasifka, J R; Hellmich, R L; Dively, G P; Higgins, L S; Dixon, P M; Duan, J J

2008-02-01

One of the possible adverse effects of transgenic insecticidal crops is the unintended decline in the abundance of nontarget arthropods. Field trials designed to evaluate potential nontarget effects can be more complex than expected because decisions to conduct field trials and the selection of taxa to include are not always guided by the results of laboratory tests. Also, recent studies emphasize the potential for indirect effects (adverse impacts to nontarget arthropods without feeding directly on plant tissues), which are difficult to predict because of interactions among nontarget arthropods, target pests, and transgenic crops. As a consequence, field studies may attempt to monitor expansive lists of arthropod taxa, making the design of such broad studies more difficult and reducing the likelihood of detecting any negative effects that might be present. To improve the taxonomic focus and statistical rigor of future studies, existing field data and corresponding power analysis may provide useful guidance. Analysis of control data from several nontarget field trials using repeated-measures designs suggests that while detection of small effects may require considerable increases in replication, there are taxa from different ecological roles that are sampled effectively using standard methods. The use of statistical power to guide selection of taxa for nontarget trials reflects scientists' inability to predict the complex interactions among arthropod taxa, particularly when laboratory trials fail to provide guidance on which groups are more likely to be affected. However, scientists still may exercise judgment, including taxa that are not included in or supported by power analyses.
General Blending Models for Data From Mixture Experiments

PubMed Central

Brown, L.; Donev, A. N.; Bissett, A. C.

2015-01-01

We propose a new class of models providing a powerful unification and extension of existing statistical methodology for analysis of data obtained in mixture experiments. These models, which integrate models proposed by Scheffé and Becker, extend considerably the range of mixture component effects that may be described. They become complex when the studied phenomenon requires it, but remain simple whenever possible. This article has supplementary material online. PMID:26681812
Statistical physics of the symmetric group.

PubMed

Williams, Mobolaji

2017-04-01

Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
Statistical physics of the symmetric group

NASA Astrophysics Data System (ADS)

Williams, Mobolaji

2017-04-01

Ordered chains (such as chains of amino acids) are ubiquitous in biological cells, and these chains perform specific functions contingent on the sequence of their components. Using the existence and general properties of such sequences as a theoretical motivation, we study the statistical physics of systems whose state space is defined by the possible permutations of an ordered list, i.e., the symmetric group, and whose energy is a function of how certain permutations deviate from some chosen correct ordering. Such a nonfactorizable state space is quite different from the state spaces typically considered in statistical physics systems and consequently has novel behavior in systems with interacting and even noninteracting Hamiltonians. Various parameter choices of a mean-field model reveal the system to contain five different physical regimes defined by two transition temperatures, a triple point, and a quadruple point. Finally, we conclude by discussing how the general analysis can be extended to state spaces with more complex combinatorial properties and to other standard questions of statistical mechanics models.
A weighted U statistic for association analyses considering genetic heterogeneity.

PubMed

Wei, Changshuai; Elston, Robert C; Lu, Qing

2016-07-20

Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
[New design of the Health Survey of Catalonia (Spain, 2010-2014): a step forward in health planning and evaluation].

PubMed

Alcañiz-Zanón, Manuela; Mompart-Penina, Anna; Guillén-Estany, Montserrat; Medina-Bustos, Antonia; Aragay-Barbany, Josep M; Brugulat-Guiteras, Pilar; Tresserras-Gaju, Ricard

2014-01-01

This article presents the genesis of the Health Survey of Catalonia (Spain, 2010-2014) with its semiannual subsamples and explains the basic characteristics of its multistage sampling design. In comparison with previous surveys, the organizational advantages of this new statistical operation include rapid data availability and the ability to continuously monitor the population. The main benefits are timeliness in the production of indicators and the possibility of introducing new topics through the supplemental questionnaire as a function of needs. Limitations consist of the complexity of the sample design and the lack of longitudinal follow-up of the sample. Suitable sampling weights for each specific subsample are necessary for any statistical analysis of micro-data. Accuracy in the analysis of territorial disaggregation or population subgroups increases if annual samples are accumulated. Copyright © 2013 SESPAS. Published by Elsevier Espana. All rights reserved.
Fluorescent biopsy of biological tissues in differentiation of benign and malignant tumors of prostate

NASA Astrophysics Data System (ADS)

Trifoniuk, L. I.; Ushenko, Yu. A.; Sidor, M. I.; Minzer, O. P.; Gritsyuk, M. V.; Novakovskaya, O. Y.

2014-08-01

The work consists of investigation results of diagnostic efficiency of a new azimuthally stable Mueller-matrix method of analysis of laser autofluorescence coordinate distributions of biological tissues histological sections. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of histological sections of uterus wall tumor - group 1 (dysplasia) and group 2 (adenocarcinoma) are estimated.
Statistical mapping of zones of focused groundwater/surface-water exchange using fiber-optic distributed temperature sensing

USGS Publications Warehouse

Mwakanyamale, Kisa; Day-Lewis, Frederick D.; Slater, Lee D.

2013-01-01

Fiber-optic distributed temperature sensing (FO-DTS) increasingly is used to map zones of focused groundwater/surface-water exchange (GWSWE). Previous studies of GWSWE using FO-DTS involved identification of zones of focused GWSWE based on arbitrary cutoffs of FO-DTS time-series statistics (e.g., variance, cross-correlation between temperature and stage, or spectral power). New approaches are needed to extract more quantitative information from large, complex FO-DTS data sets while concurrently providing an assessment of uncertainty associated with mapping zones of focused GSWSE. Toward this end, we present a strategy combining discriminant analysis (DA) and spectral analysis (SA). We demonstrate the approach using field experimental data from a reach of the Columbia River adjacent to the Hanford 300 Area site. Results of the combined SA/DA approach are shown to be superior to previous results from qualitative interpretation of FO-DTS spectra alone.

System of polarization correlometry of polycrystalline layers of urine in the differentiation stage of diabetes

NASA Astrophysics Data System (ADS)

Ushenko, Yu. O.; Pashkovskaya, N. V.; Marchuk, Y. F.; Dubolazov, O. V.; Savich, V. O.

2015-08-01

The work consists of investigation results of diagnostic efficiency of a new azimuthally stable Muellermatrix method of analysis of laser autofluorescence coordinate distributions of biological liquid layers. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of human urine polycrystalline layers for the sake of diagnosing and differentiating cholelithiasis with underlying chronic cholecystitis (group 1) and diabetes mellitus of degree II (group 2) are estimated.
Enhanced Higgs boson to τ(+)τ(-) search with deep learning.

PubMed

Baldi, P; Sadowski, P; Whiteson, D

2015-03-20

The Higgs boson is thought to provide the interaction that imparts mass to the fundamental fermions, but while measurements at the Large Hadron Collider (LHC) are consistent with this hypothesis, current analysis techniques lack the statistical power to cross the traditional 5σ significance barrier without more data. Deep learning techniques have the potential to increase the statistical power of this analysis by automatically learning complex, high-level data representations. In this work, deep neural networks are used to detect the decay of the Higgs boson to a pair of tau leptons. A Bayesian optimization algorithm is used to tune the network architecture and training algorithm hyperparameters, resulting in a deep network of eight nonlinear processing layers that improves upon the performance of shallow classifiers even without the use of features specifically engineered by physicists for this application. The improvement in discovery significance is equivalent to an increase in the accumulated data set of 25%.
Mueller-matrix of laser-induced autofluorescence of polycrystalline films of dried peritoneal fluid in diagnostics of endometriosis

NASA Astrophysics Data System (ADS)

Ushenko, Yuriy A.; Koval, Galina D.; Ushenko, Alexander G.; Dubolazov, Olexander V.; Ushenko, Vladimir A.; Novakovskaia, Olga Yu.

2016-07-01

This research presents investigation results of the diagnostic efficiency of an azimuthally stable Mueller-matrix method of analysis of laser autofluorescence of polycrystalline films of dried uterine cavity peritoneal fluid. A model of the generalized optical anisotropy of films of dried peritoneal fluid is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase (linear and circular birefringence) and amplitude (linear and circular dichroism) anisotropies is taken into consideration. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistical analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the first to the fourth order) of differentiation of polycrystalline films of dried peritoneal fluid, group 1 (healthy donors) and group 2 (uterus endometriosis patients), are determined.
Methods and means of Fourier-Stokes polarimetry and the spatial-frequency filtering of phase anisotropy manifestations in endometriosis diagnostics

NASA Astrophysics Data System (ADS)

Ushenko, A. G.; Dubolazov, O. V.; Ushenko, Vladimir A.; Ushenko, Yu. A.; Sakhnovskiy, M. Yu.; Prydiy, O. G.; Lakusta, I. I.; Novakovskaya, O. Yu.; Melenko, S. R.

2016-12-01

This research presents investigation results of diagnostic efficiency of a new azimuthally stable Mueller-matrix method of laser autofluorescence coordinate distributions analysis of dried polycrystalline films of uterine cavity peritoneal fluid. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of dried polycrystalline films of peritoneal fluid - group 1 (healthy donors) and group 2 (uterus endometriosis patients) are estimated.
Sampling and Data Analysis for Environmental Microbiology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murray, Christopher J.

2001-06-01

A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less
[Factor Analysis: Principles to Evaluate Measurement Tools for Mental Health].

PubMed

Campo-Arias, Adalberto; Herazo, Edwin; Oviedo, Heidi Celina

2012-09-01

The validation of a measurement tool in mental health is a complex process that usually starts by estimating reliability, to later approach its validity. Factor analysis is a way to know the number of dimensions, domains or factors of a measuring tool, generally related to the construct validity of the scale. The analysis could be exploratory or confirmatory, and helps in the selection of the items with better performance. For an acceptable factor analysis, it is necessary to follow some steps and recommendations, conduct some statistical tests, and rely on a proper sample of participants. Copyright © 2012 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.
A Guided Tour of Mathematical Methods for the Physical Sciences

NASA Astrophysics Data System (ADS)

Snieder, Roel; van Wijk, Kasper

2015-05-01

1. Introduction; 2. Dimensional analysis; 3. Power series; 4. Spherical and cylindrical coordinates; 5. Gradient; 6. Divergence of a vector field; 7. Curl of a vector field; 8. Theorem of Gauss; 9. Theorem of Stokes; 10. The Laplacian; 11. Scale analysis; 12. Linear algebra; 13. Dirac delta function; 14. Fourier analysis; 15. Analytic functions; 16. Complex integration; 17. Green's functions: principles; 18. Green's functions: examples; 19. Normal modes; 20. Potential-field theory; 21. Probability and statistics; 22. Inverse problems; 23. Perturbation theory; 24. Asymptotic evaluation of integrals; 25. Conservation laws; 26. Cartesian tensors; 27. Variational calculus; 28. Epilogue on power and knowledge.
An architecture for a brain-image database

NASA Technical Reports Server (NTRS)

Herskovits, E. H.

2000-01-01

The widespread availability of methods for noninvasive assessment of brain structure has enabled researchers to investigate neuroimaging correlates of normal aging, cerebrovascular disease, and other processes; we designate such studies as image-based clinical trials (IBCTs). We propose an architecture for a brain-image database, which integrates image processing and statistical operators, and thus supports the implementation and analysis of IBCTs. The implementation of this architecture is described and results from the analysis of image and clinical data from two IBCTs are presented. We expect that systems such as this will play a central role in the management and analysis of complex research data sets.
Controlling reactivity of nanoporous catalyst materials by tuning reaction product-pore interior interactions: Statistical mechanical modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Jing; Ackerman, David M.; Lin, Victor S.-Y.

2013-04-02

Statistical mechanical modeling is performed of a catalytic conversion reaction within a functionalized nanoporous material to assess the effect of varying the reaction product-pore interior interaction from attractive to repulsive. A strong enhancement in reactivity is observed not just due to the shift in reaction equilibrium towards completion but also due to enhanced transport within the pore resulting from reduced loading. The latter effect is strongest for highly restricted transport (single-file diffusion), and applies even for irreversible reactions. The analysis is performed utilizing a generalized hydrodynamic formulation of the reaction-diffusion equations which can reliably capture the complex interplay between reactionmore » and restricted transport.« less
Statistical analysis of EGFR structures' performance in virtual screening

NASA Astrophysics Data System (ADS)

Li, Yan; Li, Xiang; Dong, Zigang

2015-11-01

In this work the ability of EGFR structures to distinguish true inhibitors from decoys in docking and MM-PBSA is assessed by statistical procedures. The docking performance depends critically on the receptor conformation and bound state. The enrichment of known inhibitors is well correlated with the difference between EGFR structures rather than the bound-ligand property. The optimal structures for virtual screening can be selected based purely on the complex information. And the mixed combination of distinct EGFR conformations is recommended for ensemble docking. In MM-PBSA, a variety of EGFR structures have identically good performance in the scoring and ranking of known inhibitors, indicating that the choice of the receptor structure has little effect on the screening.
[Evaluation of using statistical methods in selected national medical journals].

PubMed

Sych, Z

1996-01-01

The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.
Preparing and Presenting Effective Research Posters

PubMed Central

Miller, Jane E

2007-01-01

Objectives Posters are a common way to present results of a statistical analysis, program evaluation, or other project at professional conferences. Often, researchers fail to recognize the unique nature of the format, which is a hybrid of a published paper and an oral presentation. This methods note demonstrates how to design research posters to convey study objectives, methods, findings, and implications effectively to varied professional audiences. Methods A review of existing literature on research communication and poster design is used to identify and demonstrate important considerations for poster content and layout. Guidelines on how to write about statistical methods, results, and statistical significance are illustrated with samples of ineffective writing annotated to point out weaknesses, accompanied by concrete examples and explanations of improved presentation. A comparison of the content and format of papers, speeches, and posters is also provided. Findings Each component of a research poster about a quantitative analysis should be adapted to the audience and format, with complex statistical results translated into simplified charts, tables, and bulleted text to convey findings as part of a clear, focused story line. Conclusions Effective research posters should be designed around two or three key findings with accompanying handouts and narrative description to supply additional technical detail and encourage dialog with poster viewers. PMID:17355594
A weighted U-statistic for genetic association analyses of sequencing data.

PubMed

Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

2014-12-01

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.
A 20-year period of orthotopic liver transplantation activity in a single center: a time series analysis performed using the R Statistical Software.

PubMed

Santori, G; Andorno, E; Morelli, N; Casaccia, M; Bottino, G; Di Domenico, S; Valente, U

2009-05-01

In many Western countries a "minimum volume rule" policy has been adopted as a quality measure for complex surgical procedures. In Italy, the National Transplant Centre set the minimum number of orthotopic liver transplantation (OLT) procedures/y at 25/center. OLT procedures performed in a single center for a reasonably large period may be treated as a time series to evaluate trend, seasonal cycles, and nonsystematic fluctuations. Between January 1, 1987 and December 31, 2006, we performed 563 cadaveric donor OLTs to adult recipients. During 2007, there were another 28 procedures. The greatest numbers of OLTs/y were performed in 2001 (n = 51), 2005 (n = 50), and 2004 (n = 49). A time series analysis performed using R Statistical Software (Foundation for Statistical Computing, Vienna, Austria), a free software environment for statistical computing and graphics, showed an incremental trend after exponential smoothing as well as after seasonal decomposition. The predicted OLT/mo for 2007 calculated with the Holt-Winters exponential smoothing applied to the previous period 1987-2006 helped to identify the months where there was a major difference between predicted and performed procedures. The time series approach may be helpful to establish a minimum volume/y at a single-center level.
Approximate median regression for complex survey data with skewed response.

PubMed

Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi

2016-12-01

The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.
A High-Confidence Interaction Map Identifies SIRT1 as a Mediator of Acetylation of USP22 and the SAGA Coactivator Complex

PubMed Central

Armour, Sean M.; Bennett, Eric J.; Braun, Craig R.; Zhang, Xiao-Yong; McMahon, Steven B.; Gygi, Steven P.; Harper, J. Wade

2013-01-01

Although many functions and targets have been attributed to the histone and protein deacetylase SIRT1, a comprehensive analysis of SIRT1 binding proteins yielding a high-confidence interaction map has not been established. Using a comparative statistical analysis of binding partners, we have assembled a high-confidence SIRT1 interactome. Employing this method, we identified the deubiquitinating enzyme ubiquitin-specific protease 22 (USP22), a component of the deubiquitinating module (DUBm) of the SAGA transcriptional coactivating complex, as a SIRT1-interacting partner. We found that this interaction is highly specific, requires the ZnF-UBP domain of USP22, and is disrupted by the inactivating H363Y mutation within SIRT1. Moreover, we show that USP22 is acetylated on multiple lysine residues and that alteration of a single lysine (K129) within the ZnF-UBP domain is sufficient to alter interaction of the DUBm with the core SAGA complex. Furthermore, USP22-mediated recruitment of SIRT1 activity promotes the deacetylation of individual SAGA complex components. Our results indicate an important role of SIRT1-mediated deacetylation in regulating the formation of DUBm subcomplexes within the larger SAGA complex. PMID:23382074
Approximate Median Regression for Complex Survey Data with Skewed Response

PubMed Central

Fraser, Raphael André; Lipsitz, Stuart R.; Sinha, Debajyoti; Fitzmaurice, Garrett M.; Pan, Yi

2016-01-01

Summary The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling and weighting. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS) based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. PMID:27062562
An Observation-Driven Agent-Based Modeling and Analysis Framework for C. elegans Embryogenesis.

PubMed

Wang, Zi; Ramsey, Benjamin J; Wang, Dali; Wong, Kwai; Li, Husheng; Wang, Eric; Bao, Zhirong

2016-01-01

With cutting-edge live microscopy and image analysis, biologists can now systematically track individual cells in complex tissues and quantify cellular behavior over extended time windows. Computational approaches that utilize the systematic and quantitative data are needed to understand how cells interact in vivo to give rise to the different cell types and 3D morphology of tissues. An agent-based, minimum descriptive modeling and analysis framework is presented in this paper to study C. elegans embryogenesis. The framework is designed to incorporate the large amounts of experimental observations on cellular behavior and reserve data structures/interfaces that allow regulatory mechanisms to be added as more insights are gained. Observed cellular behaviors are organized into lineage identity, timing and direction of cell division, and path of cell movement. The framework also includes global parameters such as the eggshell and a clock. Division and movement behaviors are driven by statistical models of the observations. Data structures/interfaces are reserved for gene list, cell-cell interaction, cell fate and landscape, and other global parameters until the descriptive model is replaced by a regulatory mechanism. This approach provides a framework to handle the ongoing experiments of single-cell analysis of complex tissues where mechanistic insights lag data collection and need to be validated on complex observations.
Dynamics and causalities of atmospheric and oceanic data identified by complex networks and Granger causality analysis

NASA Astrophysics Data System (ADS)

Charakopoulos, A. K.; Katsouli, G. A.; Karakasidis, T. E.

2018-04-01

Understanding the underlying processes and extracting detailed characteristics of spatiotemporal dynamics of ocean and atmosphere as well as their interaction is of significant interest and has not been well thoroughly established. The purpose of this study was to examine the performance of two main additional methodologies for the identification of spatiotemporal underlying dynamic characteristics and patterns among atmospheric and oceanic variables from Seawatch buoys from Aegean and Ionian Sea, provided by the Hellenic Center for Marine Research (HCMR). The first approach involves the estimation of cross correlation analysis in an attempt to investigate time-lagged relationships, and further in order to identify the direction of interactions between the variables we performed the Granger causality method. According to the second approach the time series are converted into complex networks and then the main topological network properties such as degree distribution, average path length, diameter, modularity and clustering coefficient are evaluated. Our results show that the proposed analysis of complex network analysis of time series can lead to the extraction of hidden spatiotemporal characteristics. Also our findings indicate high level of positive and negative correlations and causalities among variables, both from the same buoy and also between buoys from different stations, which cannot be determined from the use of simple statistical measures.
Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications.

PubMed

Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil

2016-11-17

Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends' preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors.

Caenorhabditis elegans chronically exposed to a Mn/Zn ethylene-bis-dithiocarbamate fungicide show mitochondrial Complex I inhibition and increased reactive oxygen species.

PubMed

Bailey, Denise C; Todt, Callie E; Orfield, Sarah E; Denney, Rachel D; Snapp, Isaac B; Negga, Rekek; Montgomery, Kara M; Bailey, Andrew C; Pressley, Aireal S; Traynor, Wendy L; Fitsanakis, Vanessa A

2016-09-01

Reports have linked human exposure to Mn/Zn ethylene-bis-dithiocarbamate (Mn/Zn-EBDC) fungicides with multiple pathologies, from dermatitis to central nervous system dysfunction. Although members of this family of agrochemicals have been available for over 50 years, their mechanism of toxicity in humans is still unclear. Since mitochondrial inhibition and oxidative stress are implicated in a wide variety of diseases, we hypothesized that Caenorhabditis elegans (C. elegans) exposed to a commercially-available formulation of an Mn/Zn-EBDC-containing fungicide (Manzate; MZ) would also show these endpoints. Thus, worms were treated chronically (24h) with various MZ concentrations and assayed for reduced mitochondrial function and increased levels of reactive oxygen species (ROS). Oxygen consumption studies suggested Complex I inhibition in all treatment groups compared to controls ( ** p<0.01). In order to verify these findings, assays specific for Complex II or Complex IV activity were also completed. Data analysis from these studies indicated that neither complex was adversely affected by MZ treatment. Additional data from ATP assays indicated a statistically significant decrease ( *** p<0.001) in ATP levels in all treatment groups when compared to control worms. Further studies were completed to determine if exposure of C. elegans to MZ also resulted in increased ROS concentrations. Studies demonstrated that hydrogen peroxide, but not superoxide or hydroxyl radical, levels were statistically significantly increased (*p<0.05). Since hydrogen peroxide is known to up-regulate glutathione-S-transferase (GST), we used a GST:green fluorescent protein transgenic worm strain to test this hypothesis. Results from these studies indicated a statistically significant increase ( *** p<0.001) in green pixel number following MZ exposure. Taken together, these data indicate that C. elegans treated with MZ concentrations to which humans are exposed show mitochondrial Complex I inhibition with concomitant hydrogen peroxide production. Since these mechanisms are associated with numerous human diseases, we suggest further studies to determine if MZ exposure induces similar toxic mechanisms in mammals. Copyright © 2016 Elsevier B.V. All rights reserved.
Verification of forecast ensembles in complex terrain including observation uncertainty

NASA Astrophysics Data System (ADS)

Dorninger, Manfred; Kloiber, Simon

2017-04-01

Traditionally, verification means to verify a forecast (ensemble) with the truth represented by observations. The observation errors are quite often neglected arguing that they are small when compared to the forecast error. In this study as part of the MesoVICT (Mesoscale Verification Inter-comparison over Complex Terrain) project it will be shown, that observation errors have to be taken into account for verification purposes. The observation uncertainty is estimated from the VERA (Vienna Enhanced Resolution Analysis) and represented via two analysis ensembles which are compared to the forecast ensemble. For the whole study results from COSMO-LEPS provided by Arpae-SIMC Emilia-Romagna are used as forecast ensemble. The time period covers the MesoVICT core case from 20-22 June 2007. In a first step, all ensembles are investigated concerning their distribution. Several tests have been executed (Kolmogorov-Smirnov-Test, Finkelstein-Schafer Test, Chi-Square Test etc.) showing no exact mathematical distribution. So the main focus is on non-parametric statistics (e.g. Kernel density estimation, Boxplots etc.) and also the deviation between "forced" normal distributed data and the kernel density estimations. In a next step the observational deviations due to the analysis ensembles are analysed. In a first approach scores are multiple times calculated with every single ensemble member from the analysis ensemble regarded as "true" observation. The results are presented as boxplots for the different scores and parameters. Additionally, the bootstrapping method is also applied to the ensembles. These possible approaches to incorporating observational uncertainty into the computation of statistics will be discussed in the talk.
Epidemiology Characteristics, Methodological Assessment and Reporting of Statistical Analysis of Network Meta-Analyses in the Field of Cancer

PubMed Central

Ge, Long; Tian, Jin-hui; Li, Xiu-xia; Song, Fujian; Li, Lun; Zhang, Jun; Li, Ge; Pei, Gai-qin; Qiu, Xia; Yang, Ke-hu

2016-01-01

Because of the methodological complexity of network meta-analyses (NMAs), NMAs may be more vulnerable to methodological risks than conventional pair-wise meta-analysis. Our study aims to investigate epidemiology characteristics, conduction of literature search, methodological quality and reporting of statistical analysis process in the field of cancer based on PRISMA extension statement and modified AMSTAR checklist. We identified and included 102 NMAs in the field of cancer. 61 NMAs were conducted using a Bayesian framework. Of them, more than half of NMAs did not report assessment of convergence (60.66%). Inconsistency was assessed in 27.87% of NMAs. Assessment of heterogeneity in traditional meta-analyses was more common (42.62%) than in NMAs (6.56%). Most of NMAs did not report assessment of similarity (86.89%) and did not used GRADE tool to assess quality of evidence (95.08%). 43 NMAs were adjusted indirect comparisons, the methods used were described in 53.49% NMAs. Only 4.65% NMAs described the details of handling of multi group trials and 6.98% described the methods of similarity assessment. The median total AMSTAR-score was 8.00 (IQR: 6.00–8.25). Methodological quality and reporting of statistical analysis did not substantially differ by selected general characteristics. Overall, the quality of NMAs in the field of cancer was generally acceptable. PMID:27848997
Separate-channel analysis of two-channel microarrays: recovering inter-spot information.

PubMed

Smyth, Gordon K; Altman, Naomi S

2013-05-26

Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
Interaction effects of metals and salinity on biodegradation of a complex hydrocarbon waste.

PubMed

Amatya, Prasanna L; Hettiaratchi, Joseph Patrick A; Joshi, Ramesh C

2006-02-01

The presence of high levels of salts because of produced brine water disposal at flare pits and the presence of metals at sufficient concentrations to impact microbial activity are of concern to bioremediation of flare pit waste in the upstream oil and gas industry. Two slurry-phase biotreatment experiments based on three-level factorial statistical experimental design were conducted with a flare pit waste. The experiments separately studied the primary effect of cadmium [Cd(II)] and interaction effect between Cd(II) and salinity and the primary effect of zinc [Zn(II)] and interaction effect between Zn(II) and salinity on hydrocarbon biodegradation. The results showed 42-52.5% hydrocarbon removal in slurries spiked with Cd and 47-62.5% in the slurries spiked with Zn. The analysis of variance showed that the primary effects of Cd and Cd-salinity interaction were statistically significant on hydrocarbon degradation. The primary effects of Zn and the Zn-salinity interaction were statistically insignificant, whereas the quadratic effect of Zn was highly significant on hydrocarbon degradation. The study on effects of metallic chloro-complexes showed that the total aqueous concentration of Cd or Zn does not give a reliable indication of overall toxicity to the microbial activity in the presence of high salinity levels.
Acute effect of a complex training protocol of back squats on 30-m sprint times of elite male military athletes

PubMed Central

Ojeda, Álvaro Huerta; Ríos, Luis Chirosa; Barrilao, Rafael Guisado; Serrano, Pablo Cáceres

2016-01-01

[Purpose] The aim of this study was to determine the acute effect temporal of a complex training protocol on 30 meter sprint times. A secondary objective was to evaluate the fatigue indexes of military athletes. [Subjects and Methods] Seven military athletes were the subjects of this study. The variables measured were times in 30-meter sprint, and average power and peak power of squats. The intervention session with complex training consisted of 4 sets of 5 repetitions at 30% 1RM + 4 repetitions at 60% 1RM + 3 repetitions of 30 meters with 120-second rests. For the statistical analysis repeated measures of ANOVA was used, and for the post hoc analysis, student’s t-test was used. [Results] Times in 30 meter sprints showed a significant reduction between the control set and the four experimental sets, but the average power and peak power of squats did not show significant changes. [Conclusion] The results of the study show the acute positive effect of complex training, over time, in 30-meter sprint by military athletes. This effect is due to the post activation potentiation of the lower limbs’ muscles in the 30 meters sprint. PMID:27134353
Acute effect of a complex training protocol of back squats on 30-m sprint times of elite male military athletes.

PubMed

Ojeda, Álvaro Huerta; Ríos, Luis Chirosa; Barrilao, Rafael Guisado; Serrano, Pablo Cáceres

2016-03-01

[Purpose] The aim of this study was to determine the acute effect temporal of a complex training protocol on 30 meter sprint times. A secondary objective was to evaluate the fatigue indexes of military athletes. [Subjects and Methods] Seven military athletes were the subjects of this study. The variables measured were times in 30-meter sprint, and average power and peak power of squats. The intervention session with complex training consisted of 4 sets of 5 repetitions at 30% 1RM + 4 repetitions at 60% 1RM + 3 repetitions of 30 meters with 120-second rests. For the statistical analysis repeated measures of ANOVA was used, and for the post hoc analysis, student's t-test was used. [Results] Times in 30 meter sprints showed a significant reduction between the control set and the four experimental sets, but the average power and peak power of squats did not show significant changes. [Conclusion] The results of the study show the acute positive effect of complex training, over time, in 30-meter sprint by military athletes. This effect is due to the post activation potentiation of the lower limbs' muscles in the 30 meters sprint.
THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures

PubMed Central

Theobald, Douglas L.; Wuttke, Deborah S.

2008-01-01

Summary THESEUS is a command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While conventional superpositioning methods use ordinary least-squares (LS) as the optimization criterion, ML superpositions provide substantially improved accuracy by down-weighting variable structural regions and by correcting for correlations among atoms. ML superpositioning is robust and insensitive to the specific atoms included in the analysis, and thus it does not require subjective pruning of selected variable atomic coordinates. Output includes both likelihood-based and frequentist statistics for accurate evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. THESEUS performs principal components analysis for analyzing the complex correlations found among atoms within a structural ensemble. PMID:16777907
On the potential for the Partial Triadic Analysis to grasp the spatio-temporal variability of groundwater hydrochemistry

NASA Astrophysics Data System (ADS)

Gourdol, L.; Hissler, C.; Pfister, L.

2012-04-01

The Luxembourg sandstone aquifer is of major relevance for the national supply of drinking water in Luxembourg. The city of Luxembourg (20% of the country's population) gets almost 2/3 of its drinking water from this aquifer. As a consequence, the study of both the groundwater hydrochemistry, as well as its spatial and temporal variations, are considered as of highest priority. Since 2005, a monitoring network has been implemented by the Water Department of Luxembourg City, with a view to a more sustainable management of this strategic water resource. The data collected to date forms a large and complex dataset, describing spatial and temporal variations of many hydrochemical parameters. The data treatment issue is tightly connected to this kind of water monitoring programs and complex databases. Standard multivariate statistical techniques, such as principal components analysis and hierarchical cluster analysis, have been widely used as unbiased methods for extracting meaningful information from groundwater quality data and are now classically used in many hydrogeological studies, in particular to characterize temporal or spatial hydrochemical variations induced by natural and anthropogenic factors. But these classical multivariate methods deal with two-way matrices, usually parameters/sites or parameters/time, while often the dataset resulting from qualitative water monitoring programs should be seen as a datacube parameters/sites/time. Three-way matrices, such as the one we propose here, are difficult to handle and to analyse by classical multivariate statistical tools and thus should be treated with approaches dealing with three-way data structures. One possible analysis approach consists in the use of partial triadic analysis (PTA). The PTA was previously used with success in many ecological studies but never to date in the domain of hydrogeology. Applied to the dataset of the Luxembourg Sandstone aquifer, the PTA appears as a new promising statistical instrument for hydrogeologists, in particular to characterize temporal and spatial hydrochemical variations induced by natural and anthropogenic factors. This new approach for groundwater management offers potential for 1) identifying a common multivariate spatial structure, 2) untapping the different hydrochemical patterns and explaining their controlling factors and 3) analysing the temporal variability of this structure and grasping hydrochemical changes.
Stan: Statistical inference

NASA Astrophysics Data System (ADS)

Stan Development Team

2018-01-01

Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.
Grain-Size Based Additivity Models for Scaling Multi-rate Uranyl Surface Complexation in Subsurface Sediments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.

This study statistically analyzed a grain-size based additivity model that has been proposed to scale reaction rates and parameters from laboratory to field. The additivity model assumed that reaction properties in a sediment including surface area, reactive site concentration, reaction rate, and extent can be predicted from field-scale grain size distribution by linearly adding reaction properties for individual grain size fractions. This study focused on the statistical analysis of the additivity model with respect to reaction rate constants using multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment as an example. Experimental data of rate-limited U(VI) desorption in amore » stirred flow-cell reactor were used to estimate the statistical properties of multi-rate parameters for individual grain size fractions. The statistical properties of the rate constants for the individual grain size fractions were then used to analyze the statistical properties of the additivity model to predict rate-limited U(VI) desorption in the composite sediment, and to evaluate the relative importance of individual grain size fractions to the overall U(VI) desorption. The result indicated that the additivity model provided a good prediction of the U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model, and U(VI) desorption in individual grain size fractions have to be simulated in order to apply the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel size fraction (2-8mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less
Detection of changes of high-frequency activity by statistical time-frequency analysis in epileptic spikes

PubMed Central

Kobayashi, Katsuhiro; Jacobs, Julia; Gotman, Jean

2013-01-01

Objective A novel type of statistical time-frequency analysis was developed to elucidate changes of high-frequency EEG activity associated with epileptic spikes. Methods The method uses the Gabor Transform and detects changes of power in comparison to background activity using t-statistics that are controlled by the false discovery rate (FDR) to correct type I error of multiple testing. The analysis was applied to EEGs recorded at 2000 Hz from three patients with mesial temporal lobe epilepsy. Results Spike-related increase of high-frequency oscillations (HFOs) was clearly shown in the FDR-controlled t-spectra: it was most dramatic in spikes recorded from the hippocampus when the hippocampus was the seizure onset zone (SOZ). Depression of fast activity was observed immediately after the spikes, especially consistently in the discharges from the hippocampal SOZ. It corresponded to the slow wave part in case of spike-and-slow-wave complexes, but it was noted even in spikes without apparent slow waves. In one patient, a gradual increase of power above 200 Hz preceded spikes. Conclusions FDR-controlled t-spectra clearly detected the spike-related changes of HFOs that were unclear in standard power spectra. Significance We developed a promising tool to study the HFOs that may be closely linked to the pathophysiology of epileptogenesis. PMID:19394892
Analysis strategies for longitudinal attachment loss data.

PubMed

Beck, J D; Elter, J R

2000-02-01

The purpose of this invited review is to describe and discuss methods currently in use to quantify the progression of attachment loss in epidemiological studies of periodontal disease, and to make recommendations for specific analytic methods based upon the particular design of the study and structure of the data. The review concentrates on the definition of incident attachment loss (ALOSS) and its component parts; measurement issues including thresholds and regression to the mean; methods of accounting for longitudinal change, including changes in means, changes in proportions of affected sites, incidence density, the effect of tooth loss and reversals, and repeated events; statistical models of longitudinal change, including the incorporation of the time element, use of linear, logistic or Poisson regression or survival analysis, and statistical tests; site vs person level of analysis, including statistical adjustment for correlated data; the strengths and limitations of ALOSS data. Examples from the Piedmont 65+ Dental Study are used to illustrate specific concepts. We conclude that incidence density is the preferred methodology to use for periodontal studies with more than one period of follow-up and that the use of studies not employing methods for dealing with complex samples, correlated data, and repeated measures does not take advantage of our current understanding of the site- and person-level variables important in periodontal disease and may generate biased results.
Correcting for population structure and kinship using the linear mixed model: theory and extensions.

PubMed

Hoffman, Gabriel E

2013-01-01

Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.
Statistical analysis of magnetically soft particles in magnetorheological elastomers

NASA Astrophysics Data System (ADS)

Gundermann, T.; Cremer, P.; Löwen, H.; Menzel, A. M.; Odenbach, S.

2017-04-01

The physical properties of magnetorheological elastomers (MRE) are a complex issue and can be influenced and controlled in many ways, e.g. by applying a magnetic field, by external mechanical stimuli, or by an electric potential. In general, the response of MRE materials to these stimuli is crucially dependent on the distribution of the magnetic particles inside the elastomer. Specific knowledge of the interactions between particles or particle clusters is of high relevance for understanding the macroscopic rheological properties and provides an important input for theoretical calculations. In order to gain a better insight into the correlation between the macroscopic effects and microstructure and to generate a database for theoretical analysis, x-ray micro-computed tomography (X-μCT) investigations as a base for a statistical analysis of the particle configurations were carried out. Different MREs with quantities of 2-15 wt% (0.27-2.3 vol%) of iron powder and different allocations of the particles inside the matrix were prepared. The X-μCT results were edited by an image processing software regarding the geometrical properties of the particles with and without the influence of an external magnetic field. Pair correlation functions for the positions of the particles inside the elastomer were calculated to statistically characterize the distributions of the particles in the samples.
Nearfield Summary and Statistical Analysis of the Second AIAA Sonic Boom Prediction Workshop

NASA Technical Reports Server (NTRS)

Park, Michael A.; Nemec, Marian

2017-01-01

A summary is provided for the Second AIAA Sonic Boom Workshop held 8-9 January 2017 in conjunction with AIAA SciTech 2017. The workshop used three required models of increasing complexity: an axisymmetric body, a wing body, and a complete configuration with flow-through nacelle. An optional complete configuration with propulsion boundary conditions is also provided. These models are designed with similar nearfield signatures to isolate geometry and shock/expansion interaction effects. Eleven international participant groups submitted nearfield signatures with forces, pitching moment, and iterative convergence norms. Statistics and grid convergence of these nearfield signatures are presented. These submissions are propagated to the ground, and noise levels are computed. This allows the grid convergence and the statistical distribution of a noise level to be computed. While progress is documented since the first workshop, improvement to the analysis methods for a possible subsequent workshop are provided. The complete configuration with flow-through nacelle showed the most dramatic improvement between the two workshops. The current workshop cases are more relevant to vehicles with lower loudness and have the potential for lower annoyance than the first workshop cases. The models for this workshop with quieter ground noise levels than the first workshop exposed weaknesses in analysis, particularly in convective discretization.
Statistical mixture design and multivariate analysis of inkjet printed a-WO3/TiO2/WOX electrochromic films.

PubMed

Wojcik, Pawel Jerzy; Pereira, Luís; Martins, Rodrigo; Fortunato, Elvira

2014-01-13

An efficient mathematical strategy in the field of solution processed electrochromic (EC) films is outlined as a combination of an experimental work, modeling, and information extraction from massive computational data via statistical software. Design of Experiment (DOE) was used for statistical multivariate analysis and prediction of mixtures through a multiple regression model, as well as the optimization of a five-component sol-gel precursor subjected to complex constraints. This approach significantly reduces the number of experiments to be realized, from 162 in the full factorial (L=3) and 72 in the extreme vertices (D=2) approach down to only 30 runs, while still maintaining a high accuracy of the analysis. By carrying out a finite number of experiments, the empirical modeling in this study shows reasonably good prediction ability in terms of the overall EC performance. An optimized ink formulation was employed in a prototype of a passive EC matrix fabricated in order to test and trial this optically active material system together with a solid-state electrolyte for the prospective application in EC displays. Coupling of DOE with chromogenic material formulation shows the potential to maximize the capabilities of these systems and ensures increased productivity in many potential solution-processed electrochemical applications.
Multivariate Statistical Analysis of Orthogonal Mass Spectral Data for the Identification of Chemical Attribution Signatures of 3-Methylfentanyl

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayer, B. P.; Valdez, C. A.; DeHope, A. J.

Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less
A powerful score-based test statistic for detecting gene-gene co-association.

PubMed

Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

2016-01-29

The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.
Statistical process control: separating signal from noise in emergency department operations.

PubMed

Pimentel, Laura; Barrueto, Fermin

2015-05-01

Statistical process control (SPC) is a visually appealing and statistically rigorous methodology very suitable to the analysis of emergency department (ED) operations. We demonstrate that the control chart is the primary tool of SPC; it is constructed by plotting data measuring the key quality indicators of operational processes in rationally ordered subgroups such as units of time. Control limits are calculated using formulas reflecting the variation in the data points from one another and from the mean. SPC allows managers to determine whether operational processes are controlled and predictable. We review why the moving range chart is most appropriate for use in the complex ED milieu, how to apply SPC to ED operations, and how to determine when performance improvement is needed. SPC is an excellent tool for operational analysis and quality improvement for these reasons: 1) control charts make large data sets intuitively coherent by integrating statistical and visual descriptions; 2) SPC provides analysis of process stability and capability rather than simple comparison with a benchmark; 3) SPC allows distinction between special cause variation (signal), indicating an unstable process requiring action, and common cause variation (noise), reflecting a stable process; and 4) SPC keeps the focus of quality improvement on process rather than individual performance. Because data have no meaning apart from their context, and every process generates information that can be used to improve it, we contend that SPC should be seriously considered for driving quality improvement in emergency medicine. Copyright © 2015 Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.