Sample records for complex statistical methods

  1. Statistical methods used in articles published by the Journal of Periodontal and Implant Science.

    PubMed

    Choi, Eunsil; Lyu, Jiyoung; Park, Jinyoung; Kim, Hae-Young

    2014-12-01

    The purposes of this study were to assess the trend of use of statistical methods including parametric and nonparametric methods and to evaluate the use of complex statistical methodology in recent periodontal studies. This study analyzed 123 articles published in the Journal of Periodontal & Implant Science (JPIS) between 2010 and 2014. Frequencies and percentages were calculated according to the number of statistical methods used, the type of statistical method applied, and the type of statistical software used. Most of the published articles considered (64.4%) used statistical methods. Since 2011, the percentage of JPIS articles using statistics has increased. On the basis of multiple counting, we found that the percentage of studies in JPIS using parametric methods was 61.1%. Further, complex statistical methods were applied in only 6 of the published studies (5.0%), and nonparametric statistical methods were applied in 77 of the published studies (38.9% of a total of 198 studies considered). We found an increasing trend towards the application of statistical methods and nonparametric methods in recent periodontal studies and thus, concluded that increased use of complex statistical methodology might be preferred by the researchers in the fields of study covered by JPIS.

  2. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  3. Dissecting the genetics of complex traits using summary association statistics.

    PubMed

    Pasaniuc, Bogdan; Price, Alkes L

    2017-02-01

    During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.

  4. Dissecting the genetics of complex traits using summary association statistics

    PubMed Central

    Pasaniuc, Bogdan; Price, Alkes L.

    2017-01-01

    During the past decade, genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyze summary association statistics. Here we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases. PMID:27840428

  5. Physics-based statistical model and simulation method of RF propagation in urban environments

    DOEpatents

    Pao, Hsueh-Yuan; Dvorak, Steven L.

    2010-09-14

    A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.

  6. Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

    PubMed Central

    Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

    2015-01-01

    Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175

  7. Method and system for efficient video compression with low-complexity encoder

    NASA Technical Reports Server (NTRS)

    Chen, Jun (Inventor); He, Dake (Inventor); Sheinin, Vadim (Inventor); Jagmohan, Ashish (Inventor); Lu, Ligang (Inventor)

    2012-01-01

    Disclosed are a method and system for video compression, wherein the video encoder has low computational complexity and high compression efficiency. The disclosed system comprises a video encoder and a video decoder, wherein the method for encoding includes the steps of converting a source frame into a space-frequency representation; estimating conditional statistics of at least one vector of space-frequency coefficients; estimating encoding rates based on the said conditional statistics; and applying Slepian-Wolf codes with the said computed encoding rates. The preferred method for decoding includes the steps of; generating a side-information vector of frequency coefficients based on previously decoded source data, encoder statistics, and previous reconstructions of the source frequency vector; and performing Slepian-Wolf decoding of at least one source frequency vector based on the generated side-information, the Slepian-Wolf code bits and the encoder statistics.

  8. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

    PubMed

    Zeng, Irene Sui Lan; Lumley, Thomas

    2018-01-01

    Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.

  9. Reconciling statistical and systems science approaches to public health.

    PubMed

    Ip, Edward H; Rahmandad, Hazhir; Shoham, David A; Hammond, Ross; Huang, Terry T-K; Wang, Youfa; Mabry, Patricia L

    2013-10-01

    Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision.

  10. Reconciling Statistical and Systems Science Approaches to Public Health

    PubMed Central

    Ip, Edward H.; Rahmandad, Hazhir; Shoham, David A.; Hammond, Ross; Huang, Terry T.-K.; Wang, Youfa; Mabry, Patricia L.

    2016-01-01

    Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision. PMID:24084395

  11. Markov Logic Networks in the Analysis of Genetic Data

    PubMed Central

    Sakhanenko, Nikita A.

    2010-01-01

    Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249

  12. Statistically Validated Networks in Bipartite Complex Systems

    PubMed Central

    Tumminello, Michele; Miccichè, Salvatore; Lillo, Fabrizio; Piilo, Jyrki; Mantegna, Rosario N.

    2011-01-01

    Many complex systems present an intrinsic bipartite structure where elements of one set link to elements of the second set. In these complex systems, such as the system of actors and movies, elements of one set are qualitatively different than elements of the other set. The properties of these complex systems are typically investigated by constructing and analyzing a projected network on one of the two sets (for example the actor network or the movie network). Complex systems are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set, and this heterogeneity makes it very difficult to discriminate links of the projected network that are just reflecting system's heterogeneity from links relevant to unveil the properties of the system. Here we introduce an unsupervised method to statistically validate each link of a projected network against a null hypothesis that takes into account system heterogeneity. We apply the method to a biological, an economic and a social complex system. The method we propose is able to detect network structures which are very informative about the organization and specialization of the investigated systems, and identifies those relationships between elements of the projected network that cannot be explained simply by system heterogeneity. We also show that our method applies to bipartite systems in which different relationships might have different qualitative nature, generating statistically validated networks in which such difference is preserved. PMID:21483858

  13. Exploiting Complexity Information for Brain Activation Detection

    PubMed Central

    Zhang, Yan; Liang, Jiali; Lin, Qiang; Hu, Zhenghui

    2016-01-01

    We present a complexity-based approach for the analysis of fMRI time series, in which sample entropy (SampEn) is introduced as a quantification of the voxel complexity. Under this hypothesis the voxel complexity could be modulated in pertinent cognitive tasks, and it changes through experimental paradigms. We calculate the complexity of sequential fMRI data for each voxel in two distinct experimental paradigms and use a nonparametric statistical strategy, the Wilcoxon signed rank test, to evaluate the difference in complexity between them. The results are compared with the well known general linear model based Statistical Parametric Mapping package (SPM12), where a decided difference has been observed. This is because SampEn method detects brain complexity changes in two experiments of different conditions and the data-driven method SampEn evaluates just the complexity of specific sequential fMRI data. Also, the larger and smaller SampEn values correspond to different meanings, and the neutral-blank design produces higher predictability than threat-neutral. Complexity information can be considered as a complementary method to the existing fMRI analysis strategies, and it may help improving the understanding of human brain functions from a different perspective. PMID:27045838

  14. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

    PubMed

    Rivas, Elena; Lang, Raymond; Eddy, Sean R

    2012-02-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.

  15. A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

    PubMed Central

    Rivas, Elena; Lang, Raymond; Eddy, Sean R.

    2012-01-01

    The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308

  16. Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

    PubMed

    Nariai, N; Kim, S; Imoto, S; Miyano, S

    2004-01-01

    We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.

  17. graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture.

    PubMed

    Chung, Dongjun; Kim, Hang J; Zhao, Hongyu

    2017-02-01

    Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.

  18. An Investigation of the Variety and Complexity of Statistical Methods Used in Current Internal Medicine Literature.

    PubMed

    Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth

    2015-10-01

    Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations. These papers used 128 statistical terms and context-defined concepts, including some from data analysis (56), epidemiology-biostatistics (31), modeling (24), data collection (12), and meta-analysis (5). Ten different software programs were used in these articles. Based on usual undergraduate and graduate statistics curricula, 64.3% of the concepts and methods used in these papers required at least a master's degree-level statistics education. The interpretation of the current medical literature can require an extensive background in statistical methods at an education level exceeding the material and resources provided to most medical students and residents. Given the complexity and time pressure of medical education, these deficiencies will be hard to correct, but this project can serve as a basis for developing a curriculum in study design and statistical methods needed by physicians-in-training.

  19. Statistical assessment of the learning curves of health technologies.

    PubMed

    Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T

    2001-01-01

    (1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)

  20. [Methods of statistical analysis in differential diagnostics of the degree of brain glioma anaplasia during preoperative stage].

    PubMed

    Glavatskiĭ, A Ia; Guzhovskaia, N V; Lysenko, S N; Kulik, A V

    2005-12-01

    The authors proposed a possible preoperative diagnostics of the degree of supratentorial brain gliom anaplasia using statistical analysis methods. It relies on a complex examination of 934 patients with I-IV degree anaplasias, which had been treated in the Institute of Neurosurgery from 1990 to 2004. The use of statistical analysis methods for differential diagnostics of the degree of brain gliom anaplasia may optimize a diagnostic algorithm, increase reliability of obtained data and in some cases avoid carrying out irrational operative intrusions. Clinically important signs for the use of statistical analysis methods directed to preoperative diagnostics of brain gliom anaplasia have been defined

  1. Evaluation of Two Statistical Methods Provides Insights into the Complex Patterns of Alternative Polyadenylation Site Switching

    PubMed Central

    Li, Jie; Li, Rui; You, Leiming; Xu, Anlong; Fu, Yonggui; Huang, Shengfeng

    2015-01-01

    Switching between different alternative polyadenylation (APA) sites plays an important role in the fine tuning of gene expression. New technologies for the execution of 3’-end enriched RNA-seq allow genome-wide detection of the genes that exhibit significant APA site switching between different samples. Here, we show that the independence test gives better results than the linear trend test in detecting APA site-switching events. Further examination suggests that the discrepancy between these two statistical methods arises from complex APA site-switching events that cannot be represented by a simple change of average 3’-UTR length. In theory, the linear trend test is only effective in detecting these simple changes. We classify the switching events into four switching patterns: two simple patterns (3’-UTR shortening and lengthening) and two complex patterns. By comparing the results of the two statistical methods, we show that complex patterns account for 1/4 of all observed switching events that happen between normal and cancerous human breast cell lines. Because simple and complex switching patterns may convey different biological meanings, they merit separate study. We therefore propose to combine both the independence test and the linear trend test in practice. First, the independence test should be used to detect APA site switching; second, the linear trend test should be invoked to identify simple switching events; and third, those complex switching events that pass independence testing but fail linear trend testing can be identified. PMID:25875641

  2. Unification of the complex Langevin method and the Lefschetzthimble method

    NASA Astrophysics Data System (ADS)

    Nishimura, Jun; Shimasaki, Shinji

    2018-03-01

    Recently there has been remarkable progress in solving the sign problem, which occurs in investigating statistical systems with a complex weight. The two promising methods, the complex Langevin method and the Lefschetz thimble method, share the idea of complexifying the dynamical variables, but their relationship has not been clear. Here we propose a unified formulation, in which the sign problem is taken care of by both the Langevin dynamics and the holomorphic gradient flow. We apply our formulation to a simple model in three different ways and show that one of them interpolates the two methods by changing the flow time.

  3. A Complex Network Approach to Stylometry

    PubMed Central

    Amancio, Diego Raphael

    2015-01-01

    Statistical methods have been widely employed to study the fundamental properties of language. In recent years, methods from complex and dynamical systems proved useful to create several language models. Despite the large amount of studies devoted to represent texts with physical models, only a limited number of studies have shown how the properties of the underlying physical systems can be employed to improve the performance of natural language processing tasks. In this paper, I address this problem by devising complex networks methods that are able to improve the performance of current statistical methods. Using a fuzzy classification strategy, I show that the topological properties extracted from texts complement the traditional textual description. In several cases, the performance obtained with hybrid approaches outperformed the results obtained when only traditional or networked methods were used. Because the proposed model is generic, the framework devised here could be straightforwardly used to study similar textual applications where the topology plays a pivotal role in the description of the interacting agents. PMID:26313921

  4. Testing for independence in J×K contingency tables with complex sample survey data.

    PubMed

    Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C

    2015-09-01

    The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.

  5. Statistical inference approach to structural reconstruction of complex networks from binary time series

    NASA Astrophysics Data System (ADS)

    Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

    2018-02-01

    Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.

  6. Statistical inference approach to structural reconstruction of complex networks from binary time series.

    PubMed

    Ma, Chuang; Chen, Han-Shuang; Lai, Ying-Cheng; Zhang, Hai-Feng

    2018-02-01

    Complex networks hosting binary-state dynamics arise in a variety of contexts. In spite of previous works, to fully reconstruct the network structure from observed binary data remains challenging. We articulate a statistical inference based approach to this problem. In particular, exploiting the expectation-maximization (EM) algorithm, we develop a method to ascertain the neighbors of any node in the network based solely on binary data, thereby recovering the full topology of the network. A key ingredient of our method is the maximum-likelihood estimation of the probabilities associated with actual or nonexistent links, and we show that the EM algorithm can distinguish the two kinds of probability values without any ambiguity, insofar as the length of the available binary time series is reasonably long. Our method does not require any a priori knowledge of the detailed dynamical processes, is parameter-free, and is capable of accurate reconstruction even in the presence of noise. We demonstrate the method using combinations of distinct types of binary dynamical processes and network topologies, and provide a physical understanding of the underlying reconstruction mechanism. Our statistical inference based reconstruction method contributes an additional piece to the rapidly expanding "toolbox" of data based reverse engineering of complex networked systems.

  7. Statistical Literacy in the Data Science Workplace

    ERIC Educational Resources Information Center

    Grant, Robert

    2017-01-01

    Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…

  8. Deconstructing Statistical Analysis

    ERIC Educational Resources Information Center

    Snell, Joel

    2014-01-01

    Using a very complex statistical analysis and research method for the sake of enhancing the prestige of an article or making a new product or service legitimate needs to be monitored and questioned for accuracy. 1) The more complicated the statistical analysis, and research the fewer the number of learned readers can understand it. This adds a…

  9. Student Solution Manual for Essential Mathematical Methods for the Physical Sciences

    NASA Astrophysics Data System (ADS)

    Riley, K. F.; Hobson, M. P.

    2011-02-01

    1. Matrices and vector spaces; 2. Vector calculus; 3. Line, surface and volume integrals; 4. Fourier series; 5. Integral transforms; 6. Higher-order ODEs; 7. Series solutions of ODEs; 8. Eigenfunction methods; 9. Special functions; 10. Partial differential equations; 11. Solution methods for PDEs; 12. Calculus of variations; 13. Integral equations; 14. Complex variables; 15. Applications of complex variables; 16. Probability; 17. Statistics.

  10. Essential Mathematical Methods for the Physical Sciences

    NASA Astrophysics Data System (ADS)

    Riley, K. F.; Hobson, M. P.

    2011-02-01

    1. Matrices and vector spaces; 2. Vector calculus; 3. Line, surface and volume integrals; 4. Fourier series; 5. Integral transforms; 6. Higher-order ODEs; 7. Series solutions of ODEs; 8. Eigenfunction methods; 9. Special functions; 10. Partial differential equations; 11. Solution methods for PDEs; 12. Calculus of variations; 13. Integral equations; 14. Complex variables; 15. Applications of complex variables; 16. Probability; 17. Statistics; Appendices; Index.

  11. Statistical physics of hard combinatorial optimization: Vertex cover problem

    NASA Astrophysics Data System (ADS)

    Zhao, Jin-Hua; Zhou, Hai-Jun

    2014-07-01

    Typical-case computation complexity is a research topic at the boundary of computer science, applied mathematics, and statistical physics. In the last twenty years, the replica-symmetry-breaking mean field theory of spin glasses and the associated message-passing algorithms have greatly deepened our understanding of typical-case computation complexity. In this paper, we use the vertex cover problem, a basic nondeterministic-polynomial (NP)-complete combinatorial optimization problem of wide application, as an example to introduce the statistical physical methods and algorithms. We do not go into the technical details but emphasize mainly the intuitive physical meanings of the message-passing equations. A nonfamiliar reader shall be able to understand to a large extent the physics behind the mean field approaches and to adjust the mean field methods in solving other optimization problems.

  12. A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

    PubMed

    Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

    2014-06-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

  13. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

    PubMed Central

    Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608

  14. Phase locking route behind complex periodic windows in a forced oscillator

    NASA Astrophysics Data System (ADS)

    Jan, Hengtai; Tsai, Kuo-Ting; Kuo, Li-wei

    2013-09-01

    Chaotic systems have complex reactions against an external driving force; even in cases with low-dimension oscillators, the routes to synchronization are diverse. We proposed a stroboscope-based method for analyzing driven chaotic systems in their phase space. According to two statistic quantities generated from time series, we could realize the system state and the driving behavior simultaneously. We demonstrated our method in a driven bi-stable system, which showed complex period windows under a proper driving force. With increasing periodic driving force, a route from interior periodic oscillation to phase synchronization through the chaos state could be found. Periodic windows could also be identified and the circumstances under which they occurred distinguished. Statistical results were supported by conditional Lyapunov exponent analysis to show the power in analyzing the unknown time series.

  15. Pattern-Based Inverse Modeling for Characterization of Subsurface Flow Models with Complex Geologic Heterogeneity

    NASA Astrophysics Data System (ADS)

    Golmohammadi, A.; Jafarpour, B.; M Khaninezhad, M. R.

    2017-12-01

    Calibration of heterogeneous subsurface flow models leads to ill-posed nonlinear inverse problems, where too many unknown parameters are estimated from limited response measurements. When the underlying parameters form complex (non-Gaussian) structured spatial connectivity patterns, classical variogram-based geostatistical techniques cannot describe the underlying connectivity patterns. Modern pattern-based geostatistical methods that incorporate higher-order spatial statistics are more suitable for describing such complex spatial patterns. Moreover, when the underlying unknown parameters are discrete (geologic facies distribution), conventional model calibration techniques that are designed for continuous parameters cannot be applied directly. In this paper, we introduce a novel pattern-based model calibration method to reconstruct discrete and spatially complex facies distributions from dynamic flow response data. To reproduce complex connectivity patterns during model calibration, we impose a feasibility constraint to ensure that the solution follows the expected higher-order spatial statistics. For model calibration, we adopt a regularized least-squares formulation, involving data mismatch, pattern connectivity, and feasibility constraint terms. Using an alternating directions optimization algorithm, the regularized objective function is divided into a continuous model calibration problem, followed by mapping the solution onto the feasible set. The feasibility constraint to honor the expected spatial statistics is implemented using a supervised machine learning algorithm. The two steps of the model calibration formulation are repeated until the convergence criterion is met. Several numerical examples are used to evaluate the performance of the developed method.

  16. Text mining by Tsallis entropy

    NASA Astrophysics Data System (ADS)

    Jamaati, Maryam; Mehri, Ali

    2018-01-01

    Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

  17. GAPIT version 2: an enhanced integrated tool for genomic association and prediction

    USDA-ARS?s Scientific Manuscript database

    Most human diseases and agriculturally important traits are complex. Dissecting their genetic architecture requires continued development of innovative and powerful statistical methods. Corresponding advances in computing tools are critical to efficiently use these statistical innovations and to enh...

  18. Applied statistics in agricultural, biological, and environmental sciences.

    USDA-ARS?s Scientific Manuscript database

    Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...

  19. Capturing rogue waves by multi-point statistics

    NASA Astrophysics Data System (ADS)

    Hadjihosseini, A.; Wächter, Matthias; Hoffmann, N. P.; Peinke, J.

    2016-01-01

    As an example of a complex system with extreme events, we investigate ocean wave states exhibiting rogue waves. We present a statistical method of data analysis based on multi-point statistics which for the first time allows the grasping of extreme rogue wave events in a highly satisfactory statistical manner. The key to the success of the approach is mapping the complexity of multi-point data onto the statistics of hierarchically ordered height increments for different time scales, for which we can show that a stochastic cascade process with Markov properties is governed by a Fokker-Planck equation. Conditional probabilities as well as the Fokker-Planck equation itself can be estimated directly from the available observational data. With this stochastic description surrogate data sets can in turn be generated, which makes it possible to work out arbitrary statistical features of the complex sea state in general, and extreme rogue wave events in particular. The results also open up new perspectives for forecasting the occurrence probability of extreme rogue wave events, and even for forecasting the occurrence of individual rogue waves based on precursory dynamics.

  20. Structure and Randomness of Continuous-Time, Discrete-Event Processes

    NASA Astrophysics Data System (ADS)

    Marzen, Sarah E.; Crutchfield, James P.

    2017-10-01

    Loosely speaking, the Shannon entropy rate is used to gauge a stochastic process' intrinsic randomness; the statistical complexity gives the cost of predicting the process. We calculate, for the first time, the entropy rate and statistical complexity of stochastic processes generated by finite unifilar hidden semi-Markov models—memoryful, state-dependent versions of renewal processes. Calculating these quantities requires introducing novel mathematical objects (ɛ -machines of hidden semi-Markov processes) and new information-theoretic methods to stochastic processes.

  1. Statistical complex fatigue data for SAE 4340 steel and its use in design by reliability

    NASA Technical Reports Server (NTRS)

    Kececioglu, D.; Smith, J. L.

    1970-01-01

    A brief description of the complex fatigue machines used in the test program is presented. The data generated from these machines are given and discussed. Two methods of obtaining strength distributions from the data are also discussed. Then follows a discussion of the construction of statistical fatigue diagrams and their use in designing by reliability. Finally, some of the problems encountered in the test equipment and a corrective modification are presented.

  2. Multivariate analysis: greater insights into complex systems

    USDA-ARS?s Scientific Manuscript database

    Many agronomic researchers measure and collect multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate (MV) statistical methods encompass the simultaneous analysis of all random variables (RV) measured on each experimental or sampling ...

  3. Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples

    PubMed Central

    White, James Robert; Nagarajan, Niranjan; Pop, Mihai

    2009-01-01

    Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them. We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied to digital gene expression studies (e.g. SAGE). A web server implementation of our methods and freely available source code can be found at http://metastats.cbcb.umd.edu/. PMID:19360128

  4. Statistical complexity measure of pseudorandom bit generators

    NASA Astrophysics Data System (ADS)

    González, C. M.; Larrondo, H. A.; Rosso, O. A.

    2005-08-01

    Pseudorandom number generators (PRNG) are extensively used in Monte Carlo simulations, gambling machines and cryptography as substitutes of ideal random number generators (RNG). Each application imposes different statistical requirements to PRNGs. As L’Ecuyer clearly states “the main goal for Monte Carlo methods is to reproduce the statistical properties on which these methods are based whereas for gambling machines and cryptology, observing the sequence of output values for some time should provide no practical advantage for predicting the forthcoming numbers better than by just guessing at random”. In accordance with different applications several statistical test suites have been developed to analyze the sequences generated by PRNGs. In a recent paper a new statistical complexity measure [Phys. Lett. A 311 (2003) 126] has been defined. Here we propose this measure, as a randomness quantifier of a PRNGs. The test is applied to three very well known and widely tested PRNGs available in the literature. All of them are based on mathematical algorithms. Another PRNGs based on Lorenz 3D chaotic dynamical system is also analyzed. PRNGs based on chaos may be considered as a model for physical noise sources and important new results are recently reported. All the design steps of this PRNG are described, and each stage increase the PRNG randomness using different strategies. It is shown that the MPR statistical complexity measure is capable to quantify this randomness improvement. The PRNG based on the chaotic 3D Lorenz dynamical system is also evaluated using traditional digital signal processing tools for comparison.

  5. Improving binding mode and binding affinity predictions of docking by ligand-based search of protein conformations: evaluation in D3R grand challenge 2015

    NASA Astrophysics Data System (ADS)

    Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin

    2017-08-01

    The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.

  6. Statistical mechanics of complex neural systems and high dimensional data

    NASA Astrophysics Data System (ADS)

    Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya

    2013-03-01

    Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.

  7. Bayesian Statistics and Uncertainty Quantification for Safety Boundary Analysis in Complex Systems

    NASA Technical Reports Server (NTRS)

    He, Yuning; Davies, Misty Dawn

    2014-01-01

    The analysis of a safety-critical system often requires detailed knowledge of safe regions and their highdimensional non-linear boundaries. We present a statistical approach to iteratively detect and characterize the boundaries, which are provided as parameterized shape candidates. Using methods from uncertainty quantification and active learning, we incrementally construct a statistical model from only few simulation runs and obtain statistically sound estimates of the shape parameters for safety boundaries.

  8. Information geometric methods for complexity

    NASA Astrophysics Data System (ADS)

    Felice, Domenico; Cafaro, Carlo; Mancini, Stefano

    2018-03-01

    Research on the use of information geometry (IG) in modern physics has witnessed significant advances recently. In this review article, we report on the utilization of IG methods to define measures of complexity in both classical and, whenever available, quantum physical settings. A paradigmatic example of a dramatic change in complexity is given by phase transitions (PTs). Hence, we review both global and local aspects of PTs described in terms of the scalar curvature of the parameter manifold and the components of the metric tensor, respectively. We also report on the behavior of geodesic paths on the parameter manifold used to gain insight into the dynamics of PTs. Going further, we survey measures of complexity arising in the geometric framework. In particular, we quantify complexity of networks in terms of the Riemannian volume of the parameter space of a statistical manifold associated with a given network. We are also concerned with complexity measures that account for the interactions of a given number of parts of a system that cannot be described in terms of a smaller number of parts of the system. Finally, we investigate complexity measures of entropic motion on curved statistical manifolds that arise from a probabilistic description of physical systems in the presence of limited information. The Kullback-Leibler divergence, the distance to an exponential family and volumes of curved parameter manifolds, are examples of essential IG notions exploited in our discussion of complexity. We conclude by discussing strengths, limits, and possible future applications of IG methods to the physics of complexity.

  9. Estimation of Global Network Statistics from Incomplete Data

    PubMed Central

    Bliss, Catherine A.; Danforth, Christopher M.; Dodds, Peter Sheridan

    2014-01-01

    Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. PMID:25338183

  10. Evaluation of forensic DNA mixture evidence: protocol for evaluation, interpretation, and statistical calculations using the combined probability of inclusion.

    PubMed

    Bieber, Frederick R; Buckleton, John S; Budowle, Bruce; Butler, John M; Coble, Michael D

    2016-08-31

    The evaluation and interpretation of forensic DNA mixture evidence faces greater interpretational challenges due to increasingly complex mixture evidence. Such challenges include: casework involving low quantity or degraded evidence leading to allele and locus dropout; allele sharing of contributors leading to allele stacking; and differentiation of PCR stutter artifacts from true alleles. There is variation in statistical approaches used to evaluate the strength of the evidence when inclusion of a specific known individual(s) is determined, and the approaches used must be supportable. There are concerns that methods utilized for interpretation of complex forensic DNA mixtures may not be implemented properly in some casework. Similar questions are being raised in a number of U.S. jurisdictions, leading to some confusion about mixture interpretation for current and previous casework. Key elements necessary for the interpretation and statistical evaluation of forensic DNA mixtures are described. Given the most common method for statistical evaluation of DNA mixtures in many parts of the world, including the USA, is the Combined Probability of Inclusion/Exclusion (CPI/CPE). Exposition and elucidation of this method and a protocol for use is the focus of this article. Formulae and other supporting materials are provided. Guidance and details of a DNA mixture interpretation protocol is provided for application of the CPI/CPE method in the analysis of more complex forensic DNA mixtures. This description, in turn, should help reduce the variability of interpretation with application of this methodology and thereby improve the quality of DNA mixture interpretation throughout the forensic community.

  11. Statistical Analysis of Complexity Generators for Cost Estimation

    NASA Technical Reports Server (NTRS)

    Rowell, Ginger Holmes

    1999-01-01

    Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.

  12. Modeling Complex Phenomena Using Multiscale Time Sequences

    DTIC Science & Technology

    2009-08-24

    measures based on Hurst and Holder exponents , auto-regressive methods and Fourier and wavelet decomposition methods. The applications for this technology...relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and Holder exponents , auto-regressive...different scales and how these scales relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and

  13. Use of Statistical Analyses in the Ophthalmic Literature

    PubMed Central

    Lisboa, Renato; Meira-Freitas, Daniel; Tatham, Andrew J.; Marvasti, Amir H.; Sharpsten, Lucie; Medeiros, Felipe A.

    2014-01-01

    Purpose To identify the most commonly used statistical analyses in the ophthalmic literature and to determine the likely gain in comprehension of the literature that readers could expect if they were to sequentially add knowledge of more advanced techniques to their statistical repertoire. Design Cross-sectional study Methods All articles published from January 2012 to December 2012 in Ophthalmology, American Journal of Ophthalmology and Archives of Ophthalmology were reviewed. A total of 780 peer-reviewed articles were included. Two reviewers examined each article and assigned categories to each one depending on the type of statistical analyses used. Discrepancies between reviewers were resolved by consensus. Main Outcome Measures Total number and percentage of articles containing each category of statistical analysis were obtained. Additionally we estimated the accumulated number and percentage of articles that a reader would be expected to be able to interpret depending on their statistical repertoire. Results Readers with little or no statistical knowledge would be expected to be able to interpret the statistical methods presented in only 20.8% of articles. In order to understand more than half (51.4%) of the articles published, readers were expected to be familiar with at least 15 different statistical methods. Knowledge of 21 categories of statistical methods was necessary to comprehend 70.9% of articles, while knowledge of more than 29 categories was necessary to comprehend more than 90% of articles. Articles in retina and glaucoma subspecialties showed a tendency for using more complex analysis when compared to cornea. Conclusions Readers of clinical journals in ophthalmology need to have substantial knowledge of statistical methodology to understand the results of published studies in the literature. The frequency of use of complex statistical analyses also indicates that those involved in the editorial peer-review process must have sound statistical knowledge in order to critically appraise articles submitted for publication. The results of this study could provide guidance to direct the statistical learning of clinical ophthalmologists, researchers and educators involved in the design of courses for residents and medical students. PMID:24612977

  14. Learning process mapping heuristics under stochastic sampling overheads

    NASA Technical Reports Server (NTRS)

    Ieumwananonthachai, Arthur; Wah, Benjamin W.

    1991-01-01

    A statistical method was developed previously for improving process mapping heuristics. The method systematically explores the space of possible heuristics under a specified time constraint. Its goal is to get the best possible heuristics while trading between the solution quality of the process mapping heuristics and their execution time. The statistical selection method is extended to take into consideration the variations in the amount of time used to evaluate heuristics on a problem instance. The improvement in performance is presented using the more realistic assumption along with some methods that alleviate the additional complexity.

  15. Dynamic Uncertain Causality Graph for Knowledge Representation and Reasoning: Utilization of Statistical Data and Domain Knowledge in Complex Cases.

    PubMed

    Zhang, Qin; Yao, Quanying

    2018-05-01

    The dynamic uncertain causality graph (DUCG) is a newly presented framework for uncertain causality representation and probabilistic reasoning. It has been successfully applied to online fault diagnoses of large, complex industrial systems, and decease diagnoses. This paper extends the DUCG to model more complex cases than what could be previously modeled, e.g., the case in which statistical data are in different groups with or without overlap, and some domain knowledge and actions (new variables with uncertain causalities) are introduced. In other words, this paper proposes to use -mode, -mode, and -mode of the DUCG to model such complex cases and then transform them into either the standard -mode or the standard -mode. In the former situation, if no directed cyclic graph is involved, the transformed result is simply a Bayesian network (BN), and existing inference methods for BNs can be applied. In the latter situation, an inference method based on the DUCG is proposed. Examples are provided to illustrate the methodology.

  16. Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

    NASA Astrophysics Data System (ADS)

    Donges, Jonathan F.; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik V.; Marwan, Norbert; Dijkstra, Henk A.; Kurths, Jürgen

    2015-11-01

    We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology.

  17. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  18. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  19. Identification of Major Histocompatibility Complex-Regulated Body Odorants by Statistical Analysis of a Comparative Gas Chromatography/Mass Spectrometry Experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Willse, Alan R.; Belcher, Ann; Preti, George

    2005-04-15

    Gas chromatography (GC), combined with mass spectrometry (MS) detection, is a powerful analytical technique that can be used to separate, quantify, and identify volatile compounds in complex mixtures. This paper examines the application of GC-MS in a comparative experiment to identify volatiles that differ in concentration between two groups. A complex mixture might comprise several hundred or even thousands of volatile compounds. Because their number and location in a chromatogram generally are unknown, and because components overlap in populous chromatograms, the statistical problems offer significant challenges beyond traditional two-group screening procedures. We describe a statistical procedure to compare two-dimensional GC-MSmore » profiles between groups, which entails (1) signal processing: baseline correction and peak detection in single ion chromatograms; (2) aligning chromatograms in time; (3) normalizing differences in overall signal intensities; and (4) detecting chromatographic regions that differ between groups. Compared to existing approaches, the proposed method is robust to errors made at earlier stages of analysis, such as missed peaks or slightly misaligned chromatograms. To illustrate the method, we identify differences in GC-MS chromatograms of ether-extracted urine collected from two nearly identical inbred groups of mice, to investigate the relationship between odor and genetics of the major histocompatibility complex.« less

  20. Modeling synthetic lethality

    PubMed Central

    Le Meur, Nolwenn; Gentleman, Robert

    2008-01-01

    Background Synthetic lethality defines a genetic interaction where the combination of mutations in two or more genes leads to cell death. The implications of synthetic lethal screens have been discussed in the context of drug development as synthetic lethal pairs could be used to selectively kill cancer cells, but leave normal cells relatively unharmed. A challenge is to assess genome-wide experimental data and integrate the results to better understand the underlying biological processes. We propose statistical and computational tools that can be used to find relationships between synthetic lethality and cellular organizational units. Results In Saccharomyces cerevisiae, we identified multi-protein complexes and pairs of multi-protein complexes that share an unusually high number of synthetic genetic interactions. As previously predicted, we found that synthetic lethality can arise from subunits of an essential multi-protein complex or between pairs of multi-protein complexes. Finally, using multi-protein complexes allowed us to take into account the pleiotropic nature of the gene products. Conclusions Modeling synthetic lethality using current estimates of the yeast interactome is an efficient approach to disentangle some of the complex molecular interactions that drive a cell. Our model in conjunction with applied statistical methods and computational methods provides new tools to better characterize synthetic genetic interactions. PMID:18789146

  1. A survey of statistics in three UK general practice journal

    PubMed Central

    Rigby, Alan S; Armstrong, Gillian K; Campbell, Michael J; Summerton, Nick

    2004-01-01

    Background Many medical specialities have reviewed the statistical content of their journals. To our knowledge this has not been done in general practice. Given the main role of a general practitioner as a diagnostician we thought it would be of interest to see whether the statistical methods reported reflect the diagnostic process. Methods Hand search of three UK journals of general practice namely the British Medical Journal (general practice section), British Journal of General Practice and Family Practice over a one-year period (1 January to 31 December 2000). Results A wide variety of statistical techniques were used. The most common methods included t-tests and Chi-squared tests. There were few articles reporting likelihood ratios and other useful diagnostic methods. There was evidence that the journals with the more thorough statistical review process reported a more complex and wider variety of statistical techniques. Conclusions The BMJ had a wider range and greater diversity of statistical methods than the other two journals. However, in all three journals there was a dearth of papers reflecting the diagnostic process. Across all three journals there were relatively few papers describing randomised controlled trials thus recognising the difficulty of implementing this design in general practice. PMID:15596014

  2. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

    EPA Science Inventory

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...

  3. Robust Strategy for Rocket Engine Health Monitoring

    NASA Technical Reports Server (NTRS)

    Santi, L. Michael

    2001-01-01

    Monitoring the health of rocket engine systems is essentially a two-phase process. The acquisition phase involves sensing physical conditions at selected locations, converting physical inputs to electrical signals, conditioning the signals as appropriate to establish scale or filter interference, and recording results in a form that is easy to interpret. The inference phase involves analysis of results from the acquisition phase, comparison of analysis results to established health measures, and assessment of health indications. A variety of analytical tools may be employed in the inference phase of health monitoring. These tools can be separated into three broad categories: statistical, rule based, and model based. Statistical methods can provide excellent comparative measures of engine operating health. They require well-characterized data from an ensemble of "typical" engines, or "golden" data from a specific test assumed to define the operating norm in order to establish reliable comparative measures. Statistical methods are generally suitable for real-time health monitoring because they do not deal with the physical complexities of engine operation. The utility of statistical methods in rocket engine health monitoring is hindered by practical limits on the quantity and quality of available data. This is due to the difficulty and high cost of data acquisition, the limited number of available test engines, and the problem of simulating flight conditions in ground test facilities. In addition, statistical methods incur a penalty for disregarding flow complexity and are therefore limited in their ability to define performance shift causality. Rule based methods infer the health state of the engine system based on comparison of individual measurements or combinations of measurements with defined health norms or rules. This does not mean that rule based methods are necessarily simple. Although binary yes-no health assessment can sometimes be established by relatively simple rules, the causality assignment needed for refined health monitoring often requires an exceptionally complex rule base involving complicated logical maps. Structuring the rule system to be clear and unambiguous can be difficult, and the expert input required to maintain a large logic network and associated rule base can be prohibitive.

  4. A new universality class in corpus of texts; A statistical physics study

    NASA Astrophysics Data System (ADS)

    Najafi, Elham; Darooneh, Amir H.

    2018-05-01

    Text can be regarded as a complex system. There are some methods in statistical physics which can be used to study this system. In this work, by means of statistical physics methods, we reveal new universal behaviors of texts associating with the fractality values of words in a text. The fractality measure indicates the importance of words in a text by considering distribution pattern of words throughout the text. We observed a power law relation between fractality of text and vocabulary size for texts and corpora. We also observed this behavior in studying biological data.

  5. Statistical significance of the rich-club phenomenon in complex networks

    NASA Astrophysics Data System (ADS)

    Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2008-04-01

    We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.

  6. Methods of Information Geometry to model complex shapes

    NASA Astrophysics Data System (ADS)

    De Sanctis, A.; Gattone, S. A.

    2016-09-01

    In this paper, a new statistical method to model patterns emerging in complex systems is proposed. A framework for shape analysis of 2- dimensional landmark data is introduced, in which each landmark is represented by a bivariate Gaussian distribution. From Information Geometry we know that Fisher-Rao metric endows the statistical manifold of parameters of a family of probability distributions with a Riemannian metric. Thus this approach allows to reconstruct the intermediate steps in the evolution between observed shapes by computing the geodesic, with respect to the Fisher-Rao metric, between the corresponding distributions. Furthermore, the geodesic path can be used for shape predictions. As application, we study the evolution of the rat skull shape. A future application in Ophthalmology is introduced.

  7. [Applications of mathematical statistics methods on compatibility researches of traditional Chinese medicines formulae].

    PubMed

    Mai, Lan-Yin; Li, Yi-Xuan; Chen, Yong; Xie, Zhen; Li, Jie; Zhong, Ming-Yu

    2014-05-01

    The compatibility of traditional Chinese medicines (TCMs) formulae containing enormous information, is a complex component system. Applications of mathematical statistics methods on the compatibility researches of traditional Chinese medicines formulae have great significance for promoting the modernization of traditional Chinese medicines and improving clinical efficacies and optimizations of formulae. As a tool for quantitative analysis, data inference and exploring inherent rules of substances, the mathematical statistics method can be used to reveal the working mechanisms of the compatibility of traditional Chinese medicines formulae in qualitatively and quantitatively. By reviewing studies based on the applications of mathematical statistics methods, this paper were summarized from perspective of dosages optimization, efficacies and changes of chemical components as well as the rules of incompatibility and contraindication of formulae, will provide the references for further studying and revealing the working mechanisms and the connotations of traditional Chinese medicines.

  8. Bayesian demography 250 years after Bayes

    PubMed Central

    Bijak, Jakub; Bryant, John

    2016-01-01

    Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889

  9. Small sample estimation of the reliability function for technical products

    NASA Astrophysics Data System (ADS)

    Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.

    2017-12-01

    It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.

  10. Cerebral blood flow during paroxysmal EEG activation induced by sleep in patients with complex partial seizures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gozukirmizi, E.; Meyer, J.S.; Okabe, T.

    1982-01-01

    Cerebral blood flow (CBF) measurements were combined with sleep polysomnography in nine patients with complex partial seizures. Two methods were used: the 133Xe method for measuring regional (rCBF) and the stable xenon CT method for local (LCBF). Compared to nonepileptic subjects, who show diffuse CBF decreases during stages I-II, non-REM sleep onset, patients with complex partial seizures show statistically significant increases in CBF which are maximal in regions where the EEG focus is localized and are predominantly seen in one temporal region but are also propagated to other cerebral areas. Both CBF methods gave comparable results, but greater statistical significancemore » was achieved by stable xenon CT methodology. CBF increases are more diffuse than predicted by EEG paroxysmal activity recorded from scalp electrodes. An advantage of the 133Xe inhalation method was achievement of reliable data despite movement of the head. This was attributed to the use of a helmet which maintained the probes approximated to the scalp. Disadvantages were poor resolution (7 cm3) and two-dimensional information. The advantage of stable xenon CT method is excellent resolution (80 mm3) in three dimensions, but a disadvantage is that movement of the head in patients with seizure disorders may limit satisfactory measurements.« less

  11. Quantifying the statistical complexity of low-frequency fluctuations in semiconductor lasers with optical feedback

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tiana-Alsina, J.; Torrent, M. C.; Masoller, C.

    Low-frequency fluctuations (LFFs) represent a dynamical instability that occurs in semiconductor lasers when they are operated near the lasing threshold and subject to moderate optical feedback. LFFs consist of sudden power dropouts followed by gradual, stepwise recoveries. We analyze experimental time series of intensity dropouts and quantify the complexity of the underlying dynamics employing two tools from information theory, namely, Shannon's entropy and the Martin, Plastino, and Rosso statistical complexity measure. These measures are computed using a method based on ordinal patterns, by which the relative length and ordering of consecutive interdropout intervals (i.e., the time intervals between consecutive intensitymore » dropouts) are analyzed, disregarding the precise timing of the dropouts and the absolute durations of the interdropout intervals. We show that this methodology is suitable for quantifying subtle characteristics of the LFFs, and in particular the transition to fully developed chaos that takes place when the laser's pump current is increased. Our method shows that the statistical complexity of the laser does not increase continuously with the pump current, but levels off before reaching the coherence collapse regime. This behavior coincides with that of the first- and second-order correlations of the interdropout intervals, suggesting that these correlations, and not the chaotic behavior, are what determine the level of complexity of the laser's dynamics. These results hold for two different dynamical regimes, namely, sustained LFFs and coexistence between LFFs and steady-state emission.« less

  12. Statistical methods to estimate treatment effects from multichannel electroencephalography (EEG) data in clinical trials.

    PubMed

    Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir

    2010-07-15

    With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.

  13. Multifactor-Dimensionality Reduction Reveals High-Order Interactions among Estrogen-Metabolism Genes in Sporadic Breast Cancer

    PubMed Central

    Ritchie, Marylyn D.; Hahn, Lance W.; Roodi, Nady; Bailey, L. Renee; Dupont, William D.; Parl, Fritz F.; Moore, Jason H.

    2001-01-01

    One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly due to the limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes and with environmental exposures. We introduce multifactor-dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus information, to improve the identification of polymorphism combinations associated with disease risk. The MDR method is nonparametric (i.e., no hypothesis about the value of a statistical parameter is made), is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, we demonstrate that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When it was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes. To our knowledge, this is the first report of a four-locus interaction associated with a common complex multifactorial disease. PMID:11404819

  14. Estimating survival probabilities by exposure levels: utilizing vital statistics and complex survey data with mortality follow-up.

    PubMed

    Landsman, V; Lou, W Y W; Graubard, B I

    2015-05-20

    We present a two-step approach for estimating hazard rates and, consequently, survival probabilities, by levels of general categorical exposure. The resulting estimator utilizes three sources of data: vital statistics data and census data are used at the first step to estimate the overall hazard rate for a given combination of gender and age group, and cohort data constructed from a nationally representative complex survey with linked mortality records, are used at the second step to divide the overall hazard rate by exposure levels. We present an explicit expression for the resulting estimator and consider two methods for variance estimation that account for complex multistage sample design: (1) the leaving-one-out jackknife method, and (2) the Taylor linearization method, which provides an analytic formula for the variance estimator. The methods are illustrated with smoking and all-cause mortality data from the US National Health Interview Survey Linked Mortality Files, and the proposed estimator is compared with a previously studied crude hazard rate estimator that uses survey data only. The advantages of a two-step approach and possible extensions of the proposed estimator are discussed. Copyright © 2015 John Wiley & Sons, Ltd.

  15. A clinicomicrobiological study to evaluate the efficacy of manual and powered toothbrushes among autistic patients

    PubMed Central

    Vajawat, Mayuri; Deepika, P. C.; Kumar, Vijay; Rajeshwari, P.

    2015-01-01

    Aim: To compare the efficacy of powered toothbrushes in improving gingival health and reducing salivary red complex counts as compared to manual toothbrushes, among autistic individuals. Materials and Methods: Forty autistics was selected. Test group received powered toothbrushes, and control group received manual toothbrushes. Plaque index and gingival index were recorded. Unstimulated saliva was collected for analysis of red complex organisms using polymerase chain reaction. Results: A statistically significant reduction in the plaque scores was seen over a period of 12 weeks in both the groups (P < 0.001 for tests and P = 0.002 for controls). This reduction was statistically more significant in the test group (P = 0.024). A statistically significant reduction in the gingival scores was seen over a period of 12 weeks in both the groups (P < 0.001 for tests and P = 0.001 for controls). This reduction was statistically more significant in the test group (P = 0.042). No statistically significant reduction in the detection rate of red complex organisms were seen at 4 weeks in both the groups. Conclusion: Powered toothbrushes result in a significant overall improvement in gingival health when constant reinforcement of oral hygiene instructions is given. PMID:26681855

  16. Adaptive Distributed Video Coding with Correlation Estimation using Expectation Propagation

    PubMed Central

    Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2013-01-01

    Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method. PMID:23750314

  17. Adaptive distributed video coding with correlation estimation using expectation propagation

    NASA Astrophysics Data System (ADS)

    Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2012-10-01

    Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method.

  18. Adaptive Distributed Video Coding with Correlation Estimation using Expectation Propagation.

    PubMed

    Cui, Lijuan; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2012-10-15

    Distributed video coding (DVC) is rapidly increasing in popularity by the way of shifting the complexity from encoder to decoder, whereas no compression performance degrades, at least in theory. In contrast with conventional video codecs, the inter-frame correlation in DVC is explored at decoder based on the received syndromes of Wyner-Ziv (WZ) frame and side information (SI) frame generated from other frames available only at decoder. However, the ultimate decoding performances of DVC are based on the assumption that the perfect knowledge of correlation statistic between WZ and SI frames should be available at decoder. Therefore, the ability of obtaining a good statistical correlation estimate is becoming increasingly important in practical DVC implementations. Generally, the existing correlation estimation methods in DVC can be classified into two main types: pre-estimation where estimation starts before decoding and on-the-fly (OTF) estimation where estimation can be refined iteratively during decoding. As potential changes between frames might be unpredictable or dynamical, OTF estimation methods usually outperforms pre-estimation techniques with the cost of increased decoding complexity (e.g., sampling methods). In this paper, we propose a low complexity adaptive DVC scheme using expectation propagation (EP), where correlation estimation is performed OTF as it is carried out jointly with decoding of the factor graph-based DVC code. Among different approximate inference methods, EP generally offers better tradeoff between accuracy and complexity. Experimental results show that our proposed scheme outperforms the benchmark state-of-the-art DISCOVER codec and other cases without correlation tracking, and achieves comparable decoding performance but with significantly low complexity comparing with sampling method.

  19. Process optimization using combinatorial design principles: parallel synthesis and design of experiment methods.

    PubMed

    Gooding, Owen W

    2004-06-01

    The use of parallel synthesis techniques with statistical design of experiment (DoE) methods is a powerful combination for the optimization of chemical processes. Advances in parallel synthesis equipment and easy to use software for statistical DoE have fueled a growing acceptance of these techniques in the pharmaceutical industry. As drug candidate structures become more complex at the same time that development timelines are compressed, these enabling technologies promise to become more important in the future.

  20. Routine Discovery of Complex Genetic Models using Genetic Algorithms

    PubMed Central

    Moore, Jason H.; Hahn, Lance W.; Ritchie, Marylyn D.; Thornton, Tricia A.; White, Bill C.

    2010-01-01

    Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes (i.e. epistasis or gene-gene interaction). Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. We have previously developed a genetic algorithm approach to discovering complex genetic models in which two single nucleotide polymorphisms (SNPs) influence disease risk solely through nonlinear interactions. In this paper, we extend this approach for the discovery of high-order epistasis models involving three to five SNPs. We demonstrate that the genetic algorithm is capable of routinely discovering interesting high-order epistasis models in which each SNP influences risk of disease only through interactions with the other SNPs in the model. This study opens the door for routine simulation of complex gene-gene interactions among SNPs for the development and evaluation of new statistical and computational approaches for identifying common, complex multifactorial disease susceptibility genes. PMID:20948983

  1. Phase Transitions in Combinatorial Optimization Problems: Basics, Algorithms and Statistical Mechanics

    NASA Astrophysics Data System (ADS)

    Hartmann, Alexander K.; Weigt, Martin

    2005-10-01

    A concise, comprehensive introduction to the topic of statistical physics of combinatorial optimization, bringing together theoretical concepts and algorithms from computer science with analytical methods from physics. The result bridges the gap between statistical physics and combinatorial optimization, investigating problems taken from theoretical computing, such as the vertex-cover problem, with the concepts and methods of theoretical physics. The authors cover rapid developments and analytical methods that are both extremely complex and spread by word-of-mouth, providing all the necessary basics in required detail. Throughout, the algorithms are shown with examples and calculations, while the proofs are given in a way suitable for graduate students, post-docs, and researchers. Ideal for newcomers to this young, multidisciplinary field.

  2. Simulation methods to estimate design power: an overview for applied research

    PubMed Central

    2011-01-01

    Background Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. Methods We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. Results We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Conclusions Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research. PMID:21689447

  3. xTract: software for characterizing conformational changes of protein complexes by quantitative cross-linking mass spectrometry.

    PubMed

    Walzthoeni, Thomas; Joachimiak, Lukasz A; Rosenberger, George; Röst, Hannes L; Malmström, Lars; Leitner, Alexander; Frydman, Judith; Aebersold, Ruedi

    2015-12-01

    Chemical cross-linking in combination with mass spectrometry generates distance restraints of amino acid pairs in close proximity on the surface of native proteins and protein complexes. In this study we used quantitative mass spectrometry and chemical cross-linking to quantify differences in cross-linked peptides obtained from complexes in spatially discrete states. We describe a generic computational pipeline for quantitative cross-linking mass spectrometry consisting of modules for quantitative data extraction and statistical assessment of the obtained results. We used the method to detect conformational changes in two model systems: firefly luciferase and the bovine TRiC complex. Our method discovers and explains the structural heterogeneity of protein complexes using only sparse structural information.

  4. Steganalysis based on reducing the differences of image statistical characteristics

    NASA Astrophysics Data System (ADS)

    Wang, Ran; Niu, Shaozhang; Ping, Xijian; Zhang, Tao

    2018-04-01

    Compared with the process of embedding, the image contents make a more significant impact on the differences of image statistical characteristics. This makes the image steganalysis to be a classification problem with bigger withinclass scatter distances and smaller between-class scatter distances. As a result, the steganalysis features will be inseparate caused by the differences of image statistical characteristics. In this paper, a new steganalysis framework which can reduce the differences of image statistical characteristics caused by various content and processing methods is proposed. The given images are segmented to several sub-images according to the texture complexity. Steganalysis features are separately extracted from each subset with the same or close texture complexity to build a classifier. The final steganalysis result is figured out through a weighted fusing process. The theoretical analysis and experimental results can demonstrate the validity of the framework.

  5. Data Science in the Research Domain Criteria Era: Relevance of Machine Learning to the Study of Stress Pathology, Recovery, and Resilience

    PubMed Central

    Galatzer-Levy, Isaac R.; Ruggles, Kelly; Chen, Zhe

    2017-01-01

    Diverse environmental and biological systems interact to influence individual differences in response to environmental stress. Understanding the nature of these complex relationships can enhance the development of methods to: (1) identify risk, (2) classify individuals as healthy or ill, (3) understand mechanisms of change, and (4) develop effective treatments. The Research Domain Criteria (RDoC) initiative provides a theoretical framework to understand health and illness as the product of multiple inter-related systems but does not provide a framework to characterize or statistically evaluate such complex relationships. Characterizing and statistically evaluating models that integrate multiple levels (e.g. synapses, genes, environmental factors) as they relate to outcomes that a free from prior diagnostic benchmarks represents a challenge requiring new computational tools that are capable to capture complex relationships and identify clinically relevant populations. In the current review, we will summarize machine learning methods that can achieve these goals. PMID:29527592

  6. Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics

    NASA Astrophysics Data System (ADS)

    Yang, Albert C.-C.; Hseu, Shu-Shya; Yien, Huey-Wen; Goldberger, Ary L.; Peng, C.-K.

    2003-03-01

    Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.

  7. Proposal for a biometrics of the cortical surface: a statistical method for relative surface distance metrics

    NASA Astrophysics Data System (ADS)

    Bookstein, Fred L.

    1995-08-01

    Recent advances in computational geometry have greatly extended the range of neuroanatomical questions that can be approached by rigorous quantitative methods. One of the major current challenges in this area is to describe the variability of human cortical surface form and its implications for individual differences in neurophysiological functioning. Existing techniques for representation of stochastically invaginated surfaces do not conduce to the necessary parametric statistical summaries. In this paper, following a hint from David Van Essen and Heather Drury, I sketch a statistical method customized for the constraints of this complex data type. Cortical surface form is represented by its Riemannian metric tensor and averaged according to parameters of a smooth averaged surface. Sulci are represented by integral trajectories of the smaller principal strains of this metric, and their statistics follow the statistics of that relative metric. The diagrams visualizing this tensor analysis look like alligator leather but summarize all aspects of cortical surface form in between the principal sulci, the reliable ones; no flattening is required.

  8. Statistical Methods Applied to Gamma-ray Spectroscopy Algorithms in Nuclear Security Missions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fagan, Deborah K.; Robinson, Sean M.; Runkle, Robert C.

    2012-10-01

    In a wide range of nuclear security missions, gamma-ray spectroscopy is a critical research and development priority. One particularly relevant challenge is the interdiction of special nuclear material for which gamma-ray spectroscopy supports the goals of detecting and identifying gamma-ray sources. This manuscript examines the existing set of spectroscopy methods, attempts to categorize them by the statistical methods on which they rely, and identifies methods that have yet to be considered. Our examination shows that current methods effectively estimate the effect of counting uncertainty but in many cases do not address larger sources of decision uncertainty—ones that are significantly moremore » complex. We thus explore the premise that significantly improving algorithm performance requires greater coupling between the problem physics that drives data acquisition and statistical methods that analyze such data. Untapped statistical methods, such as Bayes Modeling Averaging and hierarchical and empirical Bayes methods have the potential to reduce decision uncertainty by more rigorously and comprehensively incorporating all sources of uncertainty. We expect that application of such methods will demonstrate progress in meeting the needs of nuclear security missions by improving on the existing numerical infrastructure for which these analyses have not been conducted.« less

  9. Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

    PubMed Central

    Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul

    2016-01-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257

  10. Predicting Physical Interactions between Protein Complexes*

    PubMed Central

    Clancy, Trevor; Rødland, Einar Andreas; Nygard, Ståle; Hovig, Eivind

    2013-01-01

    Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships. PMID:23438732

  11. A novel approach to simulate gene-environment interactions in complex diseases.

    PubMed

    Amato, Roberto; Pinelli, Michele; D'Andrea, Daniel; Miele, Gennaro; Nicodemi, Mario; Raiconi, Giancarlo; Cocozza, Sergio

    2010-01-05

    Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones. We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful. By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study.

  12. Simulation methods to estimate design power: an overview for applied research.

    PubMed

    Arnold, Benjamin F; Hogan, Daniel R; Colford, John M; Hubbard, Alan E

    2011-06-20

    Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research.

  13. Section Height Determination Methods of the Isotopographic Surface in a Complex Terrain Relief

    ERIC Educational Resources Information Center

    Syzdykova, Guldana D.; Kurmankozhaev, Azimhan K.

    2016-01-01

    A new method for determining the vertical interval of isotopographic surfaces on rugged terrain was developed. The method is based on the concept of determining the differentiated size of the vertical interval using spatial-statistical properties inherent in the modal characteristic, the degree of variability of apical heights and the chosen map…

  14. An approach for the assessment of the statistical aspects of the SEA coupling loss factors and the vibrational energy transmission in complex aircraft structures: Experimental investigation and methods benchmark

    NASA Astrophysics Data System (ADS)

    Bouhaj, M.; von Estorff, O.; Peiffer, A.

    2017-09-01

    In the application of Statistical Energy Analysis "SEA" to complex assembled structures, a purely predictive model often exhibits errors. These errors are mainly due to a lack of accurate modelling of the power transmission mechanism described through the Coupling Loss Factors (CLF). Experimental SEA (ESEA) is practically used by the automotive and aerospace industry to verify and update the model or to derive the CLFs for use in an SEA predictive model when analytical estimates cannot be made. This work is particularly motivated by the lack of procedures that allow an estimate to be made of the variance and confidence intervals of the statistical quantities when using the ESEA technique. The aim of this paper is to introduce procedures enabling a statistical description of measured power input, vibration energies and the derived SEA parameters. Particular emphasis is placed on the identification of structural CLFs of complex built-up structures comparing different methods. By adopting a Stochastic Energy Model (SEM), the ensemble average in ESEA is also addressed. For this purpose, expressions are obtained to randomly perturb the energy matrix elements and generate individual samples for the Monte Carlo (MC) technique applied to derive the ensemble averaged CLF. From results of ESEA tests conducted on an aircraft fuselage section, the SEM approach provides a better performance of estimated CLFs compared to classical matrix inversion methods. The expected range of CLF values and the synthesized energy are used as quality criteria of the matrix inversion, allowing to assess critical SEA subsystems, which might require a more refined statistical description of the excitation and the response fields. Moreover, the impact of the variance of the normalized vibration energy on uncertainty of the derived CLFs is outlined.

  15. SU-E-J-261: Statistical Analysis and Chaotic Dynamics of Respiratory Signal of Patients in BodyFix

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Michalski, D; Huq, M; Bednarz, G

    Purpose: To quantify respiratory signal of patients in BodyFix undergoing 4DCT scan with and without immobilization cover. Methods: 20 pairs of respiratory tracks recorded with RPM system during 4DCT scan were analyzed. Descriptive statistic was applied to selected parameters of exhale-inhale decomposition. Standardized signals were used with the delay method to build orbits in embedded space. Nonlinear behavior was tested with surrogate data. Sample entropy SE, Lempel-Ziv complexity LZC and the largest Lyapunov exponents LLE were compared. Results: Statistical tests show difference between scans for inspiration time and its variability, which is bigger for scans without cover. The same ismore » for variability of the end of exhalation and inhalation. Other parameters fail to show the difference. For both scans respiratory signals show determinism and nonlinear stationarity. Statistical test on surrogate data reveals their nonlinearity. LLEs show signals chaotic nature and its correlation with breathing period and its embedding delay time. SE, LZC and LLE measure respiratory signal complexity. Nonlinear characteristics do not differ between scans. Conclusion: Contrary to expectation cover applied to patients in BodyFix appears to have limited effect on signal parameters. Analysis based on trajectories of delay vectors shows respiratory system nonlinear character and its sensitive dependence on initial conditions. Reproducibility of respiratory signal can be evaluated with measures of signal complexity and its predictability window. Longer respiratory period is conducive for signal reproducibility as shown by these gauges. Statistical independence of the exhale and inhale times is also supported by the magnitude of LLE. The nonlinear parameters seem more appropriate to gauge respiratory signal complexity since its deterministic chaotic nature. It contrasts with measures based on harmonic analysis that are blind for nonlinear features. Dynamics of breathing, so crucial for 4D-based clinical technologies, can be better controlled if nonlinear-based methodology, which reflects respiration characteristic, is applied. Funding provided by Varian Medical Systems via Investigator Initiated Research Project.« less

  16. Inference on network statistics by restricting to the network space: applications to sexual history data.

    PubMed

    Goyal, Ravi; De Gruttola, Victor

    2018-01-30

    Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  17. Comparison of two fractal interpolation methods

    NASA Astrophysics Data System (ADS)

    Fu, Yang; Zheng, Zeyu; Xiao, Rui; Shi, Haibo

    2017-03-01

    As a tool for studying complex shapes and structures in nature, fractal theory plays a critical role in revealing the organizational structure of the complex phenomenon. Numerous fractal interpolation methods have been proposed over the past few decades, but they differ substantially in the form features and statistical properties. In this study, we simulated one- and two-dimensional fractal surfaces by using the midpoint displacement method and the Weierstrass-Mandelbrot fractal function method, and observed great differences between the two methods in the statistical characteristics and autocorrelation features. From the aspect of form features, the simulations of the midpoint displacement method showed a relatively flat surface which appears to have peaks with different height as the fractal dimension increases. While the simulations of the Weierstrass-Mandelbrot fractal function method showed a rough surface which appears to have dense and highly similar peaks as the fractal dimension increases. From the aspect of statistical properties, the peak heights from the Weierstrass-Mandelbrot simulations are greater than those of the middle point displacement method with the same fractal dimension, and the variances are approximately two times larger. When the fractal dimension equals to 1.2, 1.4, 1.6, and 1.8, the skewness is positive with the midpoint displacement method and the peaks are all convex, but for the Weierstrass-Mandelbrot fractal function method the skewness is both positive and negative with values fluctuating in the vicinity of zero. The kurtosis is less than one with the midpoint displacement method, and generally less than that of the Weierstrass-Mandelbrot fractal function method. The autocorrelation analysis indicated that the simulation of the midpoint displacement method is not periodic with prominent randomness, which is suitable for simulating aperiodic surface. While the simulation of the Weierstrass-Mandelbrot fractal function method has strong periodicity, which is suitable for simulating periodic surface.

  18. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

    PubMed

    Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

    2017-12-07

    Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  19. Application of a data-mining method based on Bayesian networks to lesion-deficit analysis

    NASA Technical Reports Server (NTRS)

    Herskovits, Edward H.; Gerring, Joan P.

    2003-01-01

    Although lesion-deficit analysis (LDA) has provided extensive information about structure-function associations in the human brain, LDA has suffered from the difficulties inherent to the analysis of spatial data, i.e., there are many more variables than subjects, and data may be difficult to model using standard distributions, such as the normal distribution. We herein describe a Bayesian method for LDA; this method is based on data-mining techniques that employ Bayesian networks to represent structure-function associations. These methods are computationally tractable, and can represent complex, nonlinear structure-function associations. When applied to the evaluation of data obtained from a study of the psychiatric sequelae of traumatic brain injury in children, this method generates a Bayesian network that demonstrates complex, nonlinear associations among lesions in the left caudate, right globus pallidus, right side of the corpus callosum, right caudate, and left thalamus, and subsequent development of attention-deficit hyperactivity disorder, confirming and extending our previous statistical analysis of these data. Furthermore, analysis of simulated data indicates that methods based on Bayesian networks may be more sensitive and specific for detecting associations among categorical variables than methods based on chi-square and Fisher exact statistics.

  20. Neuronal couplings between retinal ganglion cells inferred by efficient inverse statistical physics methods

    PubMed Central

    Cocco, Simona; Leibler, Stanislas; Monasson, Rémi

    2009-01-01

    Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487

  1. Capture approximations beyond a statistical quantum mechanical method for atom-diatom reactions

    NASA Astrophysics Data System (ADS)

    Barrios, Lizandra; Rubayo-Soneira, Jesús; González-Lezana, Tomás

    2016-03-01

    Statistical techniques constitute useful approaches to investigate atom-diatom reactions mediated by insertion dynamics which involves complex-forming mechanisms. Different capture schemes based on energy considerations regarding the specific diatom rovibrational states are suggested to evaluate the corresponding probabilities of formation of such collision species between reactants and products in an attempt to test reliable alternatives for computationally demanding processes. These approximations are tested in combination with a statistical quantum mechanical method for the S + H2(v = 0 ,j = 1) → SH + H and Si + O2(v = 0 ,j = 1) → SiO + O reactions, where this dynamical mechanism plays a significant role, in order to probe their validity.

  2. Unravelling the complex MRI pattern in glutaric aciduria type I using statistical models-a cohort study in 180 patients.

    PubMed

    Garbade, Sven F; Greenberg, Cheryl R; Demirkol, Mübeccel; Gökçay, Gülden; Ribes, Antonia; Campistol, Jaume; Burlina, Alberto B; Burgard, Peter; Kölker, Stefan

    2014-09-01

    Glutaric aciduria type I (GA-I) is a cerebral organic aciduria caused by inherited deficiency of glutaryl-CoA dehydrogenase and is characterized biochemically by an accumulation of putatively neurotoxic dicarboxylic metabolites. The majority of untreated patients develops a complex movement disorder with predominant dystonia during age 3-36 months. Magnetic resonance imaging (MRI) studies have demonstrated striatal and extrastriatal abnormalities. The major aim of this study was to elucidate the complex neuroradiological pattern of patients with GA-I and to associate the MRI findings with the severity of predominant neurological symptoms. In 180 patients, detailed information about the neurological presentation and brain region-specific MRI abnormalities were obtained via a standardized questionnaire. Patients with a movement disorder had more often MRI abnormalities in putamen, caudate, cortex, ventricles and external CSF spaces than patients without or with minor neurological symptoms. Putaminal MRI changes and strongly dilated ventricles were identified as the most reliable predictors of a movement disorder. In contrast, abnormalities in globus pallidus were not clearly associated with a movement disorder. Caudate and putamen as well as cortex, ventricles and external CSF spaces clearly collocalized on a two-dimensional map demonstrating statistical similarity and suggesting the same underlying pathomechanism. This study demonstrates that complex statistical methods are useful to decipher the age-dependent and region-specific MRI patterns of rare neurometabolic diseases and that these methods are helpful to elucidate the clinical relevance of specific MRI findings.

  3. Transmit Designs for the MIMO Broadcast Channel With Statistical CSI

    NASA Astrophysics Data System (ADS)

    Wu, Yongpeng; Jin, Shi; Gao, Xiqi; McKay, Matthew R.; Xiao, Chengshan

    2014-09-01

    We investigate the multiple-input multiple-output broadcast channel with statistical channel state information available at the transmitter. The so-called linear assignment operation is employed, and necessary conditions are derived for the optimal transmit design under general fading conditions. Based on this, we introduce an iterative algorithm to maximize the linear assignment weighted sum-rate by applying a gradient descent method. To reduce complexity, we derive an upper bound of the linear assignment achievable rate of each receiver, from which a simplified closed-form expression for a near-optimal linear assignment matrix is derived. This reveals an interesting construction analogous to that of dirty-paper coding. In light of this, a low complexity transmission scheme is provided. Numerical examples illustrate the significant performance of the proposed low complexity scheme.

  4. Mathematical Methods for Physics and Engineering Third Edition Paperback Set

    NASA Astrophysics Data System (ADS)

    Riley, Ken F.; Hobson, Mike P.; Bence, Stephen J.

    2006-06-01

    Prefaces; 1. Preliminary algebra; 2. Preliminary calculus; 3. Complex numbers and hyperbolic functions; 4. Series and limits; 5. Partial differentiation; 6. Multiple integrals; 7. Vector algebra; 8. Matrices and vector spaces; 9. Normal modes; 10. Vector calculus; 11. Line, surface and volume integrals; 12. Fourier series; 13. Integral transforms; 14. First-order ordinary differential equations; 15. Higher-order ordinary differential equations; 16. Series solutions of ordinary differential equations; 17. Eigenfunction methods for differential equations; 18. Special functions; 19. Quantum operators; 20. Partial differential equations: general and particular; 21. Partial differential equations: separation of variables; 22. Calculus of variations; 23. Integral equations; 24. Complex variables; 25. Application of complex variables; 26. Tensors; 27. Numerical methods; 28. Group theory; 29. Representation theory; 30. Probability; 31. Statistics; Index.

  5. Recurrence Density Enhanced Complex Networks for Nonlinear Time Series Analysis

    NASA Astrophysics Data System (ADS)

    Costa, Diego G. De B.; Reis, Barbara M. Da F.; Zou, Yong; Quiles, Marcos G.; Macau, Elbert E. N.

    We introduce a new method, which is entitled Recurrence Density Enhanced Complex Network (RDE-CN), to properly analyze nonlinear time series. Our method first transforms a recurrence plot into a figure of a reduced number of points yet preserving the main and fundamental recurrence properties of the original plot. This resulting figure is then reinterpreted as a complex network, which is further characterized by network statistical measures. We illustrate the computational power of RDE-CN approach by time series by both the logistic map and experimental fluid flows, which show that our method distinguishes different dynamics sufficiently well as the traditional recurrence analysis. Therefore, the proposed methodology characterizes the recurrence matrix adequately, while using a reduced set of points from the original recurrence plots.

  6. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    PubMed Central

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  7. Regularized learning of linear ordered-statistic constant false alarm rate filters (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Havens, Timothy C.; Cummings, Ian; Botts, Jonathan; Summers, Jason E.

    2017-05-01

    The linear ordered statistic (LOS) is a parameterized ordered statistic (OS) that is a weighted average of a rank-ordered sample. LOS operators are useful generalizations of aggregation as they can represent any linear aggregation, from minimum to maximum, including conventional aggregations, such as mean and median. In the fuzzy logic field, these aggregations are called ordered weighted averages (OWAs). Here, we present a method for learning LOS operators from training data, viz., data for which you know the output of the desired LOS. We then extend the learning process with regularization, such that a lower complexity or sparse LOS can be learned. Hence, we discuss what 'lower complexity' means in this context and how to represent that in the optimization procedure. Finally, we apply our learning methods to the well-known constant-false-alarm-rate (CFAR) detection problem, specifically for the case of background levels modeled by long-tailed distributions, such as the K-distribution. These backgrounds arise in several pertinent imaging problems, including the modeling of clutter in synthetic aperture radar and sonar (SAR and SAS) and in wireless communications.

  8. A fast boosting-based screening method for large-scale association study in complex traits with genetic heterogeneity.

    PubMed

    Wang, Lu-Yong; Fasulo, D

    2006-01-01

    Genome-wide association study for complex diseases will generate massive amount of single nucleotide polymorphisms (SNPs) data. Univariate statistical test (i.e. Fisher exact test) was used to single out non-associated SNPs. However, the disease-susceptible SNPs may have little marginal effects in population and are unlikely to retain after the univariate tests. Also, model-based methods are impractical for large-scale dataset. Moreover, genetic heterogeneity makes the traditional methods harder to identify the genetic causes of diseases. A more recent random forest method provides a more robust method for screening the SNPs in thousands scale. However, for more large-scale data, i.e., Affymetrix Human Mapping 100K GeneChip data, a faster screening method is required to screening SNPs in whole-genome large scale association analysis with genetic heterogeneity. We propose a boosting-based method for rapid screening in large-scale analysis of complex traits in the presence of genetic heterogeneity. It provides a relatively fast and fairly good tool for screening and limiting the candidate SNPs for further more complex computational modeling task.

  9. Dendrimer-paclitaxel complexes for efficient treatment in ovarian cancer: study on OVCAR-3 and HEK293T cells.

    PubMed

    Yao, Hua; Ma, Jinqi

    2018-01-01

    The present paper investigates the enhancement of the therapeutic effect of Paclitaxel (a potent anticancer drug) by increasing its cellular uptake in the cancerous cells with subsequent reduction in its cytotoxic effects. To fulfill these goals the Paclitaxel (PTX)-Biotinylated PAMAM dendrimer complexes were prepared using biotinylation method. The primary parameter of Biotinylated PAMAM with a terminal HN 2 group - the degree of biotinylation - was evaluated using HABA assay. The basic integrity of the complex was studied using DSC. The Drug Loading (DL) and Drug Release (DR) parameters of Biotinylated PAMAM dendrimer-PTX complexes were also examined. Cellular uptake study was performed in OVCAR-3 and HEK293T cells using fluorescence technique. The statistical analysis was also performed to support the experimental data. The results obtained from HABA assay showed the complete biotinylation of PAMAM dendrimer. DSC study confirmed the integrity of the complex as compared with pure drug, biotinylated complex and their physical mixture. Batch 9 showed the highest DL (12.09%) and DR (70%) for 72 h as compared to different concentrations of drug and biotinylated complex. The OVCAR-3 (cancerous) cells were characterized by more intensive cellular uptake of the complexes than HEK293T (normal) cells. The obtained experimental results were supported by the statistical data. The results obtained from both experimental and statistical evaluation confirmed that the biotinylated PAMAM NH 2 dendrimer-PTX complex not only displays increased cellular uptake but has also enhanced release up to 72 h with the reduction in cytotoxicity.

  10. Mapping of epistatic quantitative trait loci in four-way crosses.

    PubMed

    He, Xiao-Hong; Qin, Hongde; Hu, Zhongli; Zhang, Tianzhen; Zhang, Yuan-Ming

    2011-01-01

    Four-way crosses (4WC) involving four different inbred lines often appear in plant and animal commercial breeding programs. Direct mapping of quantitative trait loci (QTL) in these commercial populations is both economical and practical. However, the existing statistical methods for mapping QTL in a 4WC population are built on the single-QTL genetic model. This simple genetic model fails to take into account QTL interactions, which play an important role in the genetic architecture of complex traits. In this paper, therefore, we attempted to develop a statistical method to detect epistatic QTL in 4WC population. Conditional probabilities of QTL genotypes, computed by the multi-point single locus method, were used to sample the genotypes of all putative QTL in the entire genome. The sampled genotypes were used to construct the design matrix for QTL effects. All QTL effects, including main and epistatic effects, were simultaneously estimated by the penalized maximum likelihood method. The proposed method was confirmed by a series of Monte Carlo simulation studies and real data analysis of cotton. The new method will provide novel tools for the genetic dissection of complex traits, construction of QTL networks, and analysis of heterosis.

  11. Power Analysis for Complex Mediational Designs Using Monte Carlo Methods

    ERIC Educational Resources Information Center

    Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

    2010-01-01

    Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex…

  12. AAS and spectrophotometric methods for the determination metoprolol tartrate in tablets

    NASA Astrophysics Data System (ADS)

    Alpdoğan, Güzin; Sungur, Sidika

    1999-11-01

    Sensitive and specific atomic adsorption spectroscopy (AAS) and spectrophotometric methods have been developed for the determination of beta adrenergic blocking drug, metoprolol tartrate.The method is based on the formation of Cu(II) dithiocarbamate complex by derivatization of the secondary amino group of metoprolol with CS 2 and CuCl 2 in the presence of ammonia.The copper-bis(dithiocarbamate) complex was extracted into chloroform and the concentration of metoprolol tartrate was determined directly by spectrophotometric and indirectly by AAS measurement of copper.The two methods developed were applied to the assay of metoprolol tartrate in commercial tablet formulations.The methods were compared statistically with each other and with the high performance liquid chromatography (HPLC) method of USPXXII using t- and F-tests.

  13. Predicting Binding Free Energy Change Caused by Point Mutations with Knowledge-Modified MM/PBSA Method.

    PubMed

    Petukh, Marharyta; Li, Minghui; Alexov, Emil

    2015-07-01

    A new methodology termed Single Amino Acid Mutation based change in Binding free Energy (SAAMBE) was developed to predict the changes of the binding free energy caused by mutations. The method utilizes 3D structures of the corresponding protein-protein complexes and takes advantage of both approaches: sequence- and structure-based methods. The method has two components: a MM/PBSA-based component, and an additional set of statistical terms delivered from statistical investigation of physico-chemical properties of protein complexes. While the approach is rigid body approach and does not explicitly consider plausible conformational changes caused by the binding, the effect of conformational changes, including changes away from binding interface, on electrostatics are mimicked with amino acid specific dielectric constants. This provides significant improvement of SAAMBE predictions as indicated by better match against experimentally determined binding free energy changes over 1300 mutations in 43 proteins. The final benchmarking resulted in a very good agreement with experimental data (correlation coefficient 0.624) while the algorithm being fast enough to allow for large-scale calculations (the average time is less than a minute per mutation).

  14. Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.

    PubMed

    Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R

    2016-12-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  15. Gender Integration on U.S. Navy Submarines: Views of the First Wave

    DTIC Science & Technology

    2015-06-01

    Radiological Controls Assistant DACOWITS Defense Advisory Committee on Women in the Services DH Department Head DINQ delinquent DO Division...Previous studies have attempted to build statistical models based on surface fleet data to forecast female sustainability in the submarine fleet, yet 2...their integration? Such questions cannot be answered by collecting the type of quantitative data that can be analyzed using statistical methods. Complex

  16. Quasi-experimental study designs series-paper 10: synthesizing evidence for effects collected from quasi-experimental studies presents surmountable challenges.

    PubMed

    Becker, Betsy Jane; Aloe, Ariel M; Duvendack, Maren; Stanley, T D; Valentine, Jeffrey C; Fretheim, Atle; Tugwell, Peter

    2017-09-01

    To outline issues of importance to analytic approaches to the synthesis of quasi-experiments (QEs) and to provide a statistical model for use in analysis. We drew on studies of statistics, epidemiology, and social-science methodology to outline methods for synthesis of QE studies. The design and conduct of QEs, effect sizes from QEs, and moderator variables for the analysis of those effect sizes were discussed. Biases, confounding, design complexities, and comparisons across designs offer serious challenges to syntheses of QEs. Key components of meta-analyses of QEs were identified, including the aspects of QE study design to be coded and analyzed. Of utmost importance are the design and statistical controls implemented in the QEs. Such controls and any potential sources of bias and confounding must be modeled in analyses, along with aspects of the interventions and populations studied. Because of such controls, effect sizes from QEs are more complex than those from randomized experiments. A statistical meta-regression model that incorporates important features of the QEs under review was presented. Meta-analyses of QEs provide particular challenges, but thorough coding of intervention characteristics and study methods, along with careful analysis, should allow for sound inferences. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

    NASA Astrophysics Data System (ADS)

    Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

    2017-11-01

    The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.

  18. Bayesian approach to inverse statistical mechanics.

    PubMed

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  19. Bayesian approach to inverse statistical mechanics

    NASA Astrophysics Data System (ADS)

    Habeck, Michael

    2014-05-01

    Inverse statistical mechanics aims to determine particle interactions from ensemble properties. This article looks at this inverse problem from a Bayesian perspective and discusses several statistical estimators to solve it. In addition, a sequential Monte Carlo algorithm is proposed that draws the interaction parameters from their posterior probability distribution. The posterior probability involves an intractable partition function that is estimated along with the interactions. The method is illustrated for inverse problems of varying complexity, including the estimation of a temperature, the inverse Ising problem, maximum entropy fitting, and the reconstruction of molecular interaction potentials.

  20. Statistical dielectronic recombination rates for multielectron ions in plasma

    NASA Astrophysics Data System (ADS)

    Demura, A. V.; Leont'iev, D. S.; Lisitsa, V. S.; Shurygin, V. A.

    2017-10-01

    We describe the general analytic derivation of the dielectronic recombination (DR) rate coefficient for multielectron ions in a plasma based on the statistical theory of an atom in terms of the spatial distribution of the atomic electron density. The dielectronic recombination rates for complex multielectron tungsten ions are calculated numerically in a wide range of variation of the plasma temperature, which is important for modern nuclear fusion studies. The results of statistical theory are compared with the data obtained using level-by-level codes ADPAK, FAC, HULLAC, and experimental results. We consider different statistical DR models based on the Thomas-Fermi distribution, viz., integral and differential with respect to the orbital angular momenta of the ion core and the trapped electron, as well as the Rost model, which is an analog of the Frank-Condon model as applied to atomic structures. In view of its universality and relative simplicity, the statistical approach can be used for obtaining express estimates of the dielectronic recombination rate coefficients in complex calculations of the parameters of the thermonuclear plasmas. The application of statistical methods also provides information for the dielectronic recombination rates with much smaller computer time expenditures as compared to available level-by-level codes.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less

  2. On system behaviour using complex networks of a compression algorithm

    NASA Astrophysics Data System (ADS)

    Walker, David M.; Correa, Debora C.; Small, Michael

    2018-01-01

    We construct complex networks of scalar time series using a data compression algorithm. The structure and statistics of the resulting networks can be used to help characterize complex systems, and one property, in particular, appears to be a useful discriminating statistic in surrogate data hypothesis tests. We demonstrate these ideas on systems with known dynamical behaviour and also show that our approach is capable of identifying behavioural transitions within electroencephalogram recordings as well as changes due to a bifurcation parameter of a chaotic system. The technique we propose is dependent on a coarse grained quantization of the original time series and therefore provides potential for a spatial scale-dependent characterization of the data. Finally the method is as computationally efficient as the underlying compression algorithm and provides a compression of the salient features of long time series.

  3. Advanced functional network analysis in the geosciences: The pyunicorn package

    NASA Astrophysics Data System (ADS)

    Donges, Jonathan F.; Heitzig, Jobst; Runge, Jakob; Schultz, Hanna C. H.; Wiedermann, Marc; Zech, Alraune; Feldhoff, Jan; Rheinwalt, Aljoscha; Kutza, Hannes; Radebach, Alexander; Marwan, Norbert; Kurths, Jürgen

    2013-04-01

    Functional networks are a powerful tool for analyzing large geoscientific datasets such as global fields of climate time series originating from observations or model simulations. pyunicorn (pythonic unified complex network and recurrence analysis toolbox) is an open-source, fully object-oriented and easily parallelizable package written in the language Python. It allows for constructing functional networks (aka climate networks) representing the structure of statistical interrelationships in large datasets and, subsequently, investigating this structure using advanced methods of complex network theory such as measures for networks of interacting networks, node-weighted statistics or network surrogates. Additionally, pyunicorn allows to study the complex dynamics of geoscientific systems as recorded by time series by means of recurrence networks and visibility graphs. The range of possible applications of the package is outlined drawing on several examples from climatology.

  4. Student Solution Manual for Mathematical Methods for Physics and Engineering Third Edition

    NASA Astrophysics Data System (ADS)

    Riley, K. F.; Hobson, M. P.

    2006-03-01

    Preface; 1. Preliminary algebra; 2. Preliminary calculus; 3. Complex numbers and hyperbolic functions; 4. Series and limits; 5. Partial differentiation; 6. Multiple integrals; 7. Vector algebra; 8. Matrices and vector spaces; 9. Normal modes; 10. Vector calculus; 11. Line, surface and volume integrals; 12. Fourier series; 13. Integral transforms; 14. First-order ordinary differential equations; 15. Higher-order ordinary differential equations; 16. Series solutions of ordinary differential equations; 17. Eigenfunction methods for differential equations; 18. Special functions; 19. Quantum operators; 20. Partial differential equations: general and particular; 21. Partial differential equations: separation of variables; 22. Calculus of variations; 23. Integral equations; 24. Complex variables; 25. Application of complex variables; 26. Tensors; 27. Numerical methods; 28. Group theory; 29. Representation theory; 30. Probability; 31. Statistics.

  5. Use of Multiscale Entropy to Facilitate Artifact Detection in Electroencephalographic Signals

    PubMed Central

    Mariani, Sara; Borges, Ana F. T.; Henriques, Teresa; Goldberger, Ary L.; Costa, Madalena D.

    2016-01-01

    Electroencephalographic (EEG) signals present a myriad of challenges to analysis, beginning with the detection of artifacts. Prior approaches to noise detection have utilized multiple techniques, including visual methods, independent component analysis and wavelets. However, no single method is broadly accepted, inviting alternative ways to address this problem. Here, we introduce a novel approach based on a statistical physics method, multiscale entropy (MSE) analysis, which quantifies the complexity of a signal. We postulate that noise corrupted EEG signals have lower information content, and, therefore, reduced complexity compared with their noise free counterparts. We test the new method on an open-access database of EEG signals with and without added artifacts due to electrode motion. PMID:26738116

  6. Forecasting daily source air quality using multivariate statistical analysis and radial basis function networks.

    PubMed

    Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A

    2008-12-01

    It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.

  7. Comparison of six methods for isolating mycobacteria from swine lymph nodes.

    PubMed

    Thoen, C O; Richards, W D; Jarnagin, J L

    1974-03-01

    Six laboratory methods were compared for isolating acid-fast bacteria. Tuberculous lymph nodes from each of 48 swine as identified by federal meat inspectors were processed by each of the methods. Treated tissue suspensions were inoculated onto each of eight media which were observed at 7-day intervals for 9 weeks. There were no statistically significant differences between the number of Mycobacterium avium complex bacteria isolated by each of the six methods. Rapid tissue preparation methods involving treatment with 2% sodium hydroxide or treatment with 0.2% zephiran required only one-third to one-fourth the processing time as a standard method. There were small differences in the amount of contamination among the six methods, but no detectable differences in the time of first appearance of M. avium complex colonies.

  8. Using the U.S. Geological Survey National Water Quality Laboratory LT-MDL to Evaluate and Analyze Data

    USGS Publications Warehouse

    Bonn, Bernadine A.

    2008-01-01

    A long-term method detection level (LT-MDL) and laboratory reporting level (LRL) are used by the U.S. Geological Survey?s National Water Quality Laboratory (NWQL) when reporting results from most chemical analyses of water samples. Changing to this method provided data users with additional information about their data and often resulted in more reported values in the low concentration range. Before this method was implemented, many of these values would have been censored. The use of the LT-MDL and LRL presents some challenges for the data user. Interpreting data in the low concentration range increases the need for adequate quality assurance because even small contamination or recovery problems can be relatively large compared to concentrations near the LT-MDL and LRL. In addition, the definition of the LT-MDL, as well as the inclusion of low values, can result in complex data sets with multiple censoring levels and reported values that are less than a censoring level. Improper interpretation or statistical manipulation of low-range results in these data sets can result in bias and incorrect conclusions. This document is designed to help data users use and interpret data reported with the LTMDL/ LRL method. The calculation and application of the LT-MDL and LRL are described. This document shows how to extract statistical information from the LT-MDL and LRL and how to use that information in USGS investigations, such as assessing the quality of field data, interpreting field data, and planning data collection for new projects. A set of 19 detailed examples are included in this document to help data users think about their data and properly interpret lowrange data without introducing bias. Although this document is not meant to be a comprehensive resource of statistical methods, several useful methods of analyzing censored data are demonstrated, including Regression on Order Statistics and Kaplan-Meier Estimation. These two statistical methods handle complex censored data sets without resorting to substitution, thereby avoiding a common source of bias and inaccuracy.

  9. Rogue waves in terms of multi-point statistics and nonequilibrium thermodynamics

    NASA Astrophysics Data System (ADS)

    Hadjihosseini, Ali; Lind, Pedro; Mori, Nobuhito; Hoffmann, Norbert P.; Peinke, Joachim

    2017-04-01

    Ocean waves, which lead to rogue waves, are investigated on the background of complex systems. In contrast to deterministic approaches based on the nonlinear Schroedinger equation or focusing effects, we analyze this system in terms of a noisy stochastic system. In particular we present a statistical method that maps the complexity of multi-point data into the statistics of hierarchically ordered height increments for different time scales. We show that the stochastic cascade process with Markov properties is governed by a Fokker-Planck equation. Conditional probabilities as well as the Fokker-Planck equation itself can be estimated directly from the available observational data. This stochastic description enables us to show several new aspects of wave states. Surrogate data sets can in turn be generated allowing to work out different statistical features of the complex sea state in general and extreme rogue wave events in particular. The results also open up new perspectives for forecasting the occurrence probability of extreme rogue wave events, and even for forecasting the occurrence of individual rogue waves based on precursory dynamics. As a new outlook the ocean wave states will be considered in terms of nonequilibrium thermodynamics, for which the entropy production of different wave heights will be considered. We show evidence that rogue waves are characterized by negative entropy production. The statistics of the entropy production can be used to distinguish different wave states.

  10. How multiplicity determines entropy and the derivation of the maximum entropy principle for complex systems.

    PubMed

    Hanel, Rudolf; Thurner, Stefan; Gell-Mann, Murray

    2014-05-13

    The maximum entropy principle (MEP) is a method for obtaining the most likely distribution functions of observables from statistical systems by maximizing entropy under constraints. The MEP has found hundreds of applications in ergodic and Markovian systems in statistical mechanics, information theory, and statistics. For several decades there has been an ongoing controversy over whether the notion of the maximum entropy principle can be extended in a meaningful way to nonextensive, nonergodic, and complex statistical systems and processes. In this paper we start by reviewing how Boltzmann-Gibbs-Shannon entropy is related to multiplicities of independent random processes. We then show how the relaxation of independence naturally leads to the most general entropies that are compatible with the first three Shannon-Khinchin axioms, the (c,d)-entropies. We demonstrate that the MEP is a perfectly consistent concept for nonergodic and complex statistical systems if their relative entropy can be factored into a generalized multiplicity and a constraint term. The problem of finding such a factorization reduces to finding an appropriate representation of relative entropy in a linear basis. In a particular example we show that path-dependent random processes with memory naturally require specific generalized entropies. The example is to our knowledge the first exact derivation of a generalized entropy from the microscopic properties of a path-dependent random process.

  11. Artificial Neural Network Based Group Contribution Method for Estimating Cetane and Octane Numbers of Hydrocarbons and Oxygenated Organic Compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kubic, William Louis; Jenkins, Rhodri W.; Moore, Cameron M.

    Chemical pathways for converting biomass into fuels produce compounds for which key physical and chemical property data are unavailable. We developed an artificial neural network based group contribution method for estimating cetane and octane numbers that captures the complex dependence of fuel properties of pure compounds on chemical structure and is statistically superior to current methods.

  12. Artificial Neural Network Based Group Contribution Method for Estimating Cetane and Octane Numbers of Hydrocarbons and Oxygenated Organic Compounds

    DOE PAGES

    Kubic, William Louis; Jenkins, Rhodri W.; Moore, Cameron M.; ...

    2017-09-28

    Chemical pathways for converting biomass into fuels produce compounds for which key physical and chemical property data are unavailable. We developed an artificial neural network based group contribution method for estimating cetane and octane numbers that captures the complex dependence of fuel properties of pure compounds on chemical structure and is statistically superior to current methods.

  13. Robust hierarchical state-space models reveal diel variation in travel rates of migrating leatherback turtles.

    PubMed

    Jonsen, Ian D; Myers, Ransom A; James, Michael C

    2006-09-01

    1. Biological and statistical complexity are features common to most ecological data that hinder our ability to extract meaningful patterns using conventional tools. Recent work on implementing modern statistical methods for analysis of such ecological data has focused primarily on population dynamics but other types of data, such as animal movement pathways obtained from satellite telemetry, can also benefit from the application of modern statistical tools. 2. We develop a robust hierarchical state-space approach for analysis of multiple satellite telemetry pathways obtained via the Argos system. State-space models are time-series methods that allow unobserved states and biological parameters to be estimated from data observed with error. We show that the approach can reveal important patterns in complex, noisy data where conventional methods cannot. 3. Using the largest Atlantic satellite telemetry data set for critically endangered leatherback turtles, we show that the diel pattern in travel rates of these turtles changes over different phases of their migratory cycle. While foraging in northern waters the turtles show similar travel rates during day and night, but on their southward migration to tropical waters travel rates are markedly faster during the day. These patterns are generally consistent with diving data, and may be related to changes in foraging behaviour. Interestingly, individuals that migrate southward to breed generally show higher daytime travel rates than individuals that migrate southward in a non-breeding year. 4. Our approach is extremely flexible and can be applied to many ecological analyses that use complex, sequential data.

  14. Emerging technologies for pediatric and adult trauma care.

    PubMed

    Moulton, Steven L; Haley-Andrews, Stephanie; Mulligan, Jane

    2010-06-01

    Current Emergency Medical Service protocols rely on provider-directed care for evaluation, management and triage of injured patients from the field to a trauma center. New methods to quickly diagnose, support and coordinate the movement of trauma patients from the field to the most appropriate trauma center are in development. These methods will enhance trauma care and promote trauma system development. Recent advances in machine learning, statistical methods, device integration and wireless communication are giving rise to new methods for vital sign data analysis and a new generation of transport monitors. These monitors will collect and synchronize exponentially growing amounts of vital sign data with electronic patient care information. The application of advanced statistical methods to these complex clinical data sets has the potential to reveal many important physiological relationships and treatment effects. Several emerging technologies are converging to yield a new generation of smart sensors and tightly integrated transport monitors. These technologies will assist prehospital providers in quickly identifying and triaging the most severely injured children and adults to the most appropriate trauma centers. They will enable the development of real-time clinical support systems of increasing complexity, able to provide timelier, more cost-effective, autonomous care.

  15. Complexity control algorithm based on adaptive mode selection for interframe coding in high efficiency video coding

    NASA Astrophysics Data System (ADS)

    Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong

    2017-07-01

    The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.

  16. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    PubMed

    Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-03-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.

  17. Visual Perception-Based Statistical Modeling of Complex Grain Image for Product Quality Monitoring and Supervision on Assembly Production Line

    PubMed Central

    Chen, Qing; Xu, Pengfei; Liu, Wenzhong

    2016-01-01

    Computer vision as a fast, low-cost, noncontact, and online monitoring technology has been an important tool to inspect product quality, particularly on a large-scale assembly production line. However, the current industrial vision system is far from satisfactory in the intelligent perception of complex grain images, comprising a large number of local homogeneous fragmentations or patches without distinct foreground and background. We attempt to solve this problem based on the statistical modeling of spatial structures of grain images. We present a physical explanation in advance to indicate that the spatial structures of the complex grain images are subject to a representative Weibull distribution according to the theory of sequential fragmentation, which is well known in the continued comminution of ore grinding. To delineate the spatial structure of the grain image, we present a method of multiscale and omnidirectional Gaussian derivative filtering. Then, a product quality classifier based on sparse multikernel–least squares support vector machine is proposed to solve the low-confidence classification problem of imbalanced data distribution. The proposed method is applied on the assembly line of a food-processing enterprise to classify (or identify) automatically the production quality of rice. The experiments on the real application case, compared with the commonly used methods, illustrate the validity of our method. PMID:26986726

  18. The statistical mechanics of complex signaling networks: nerve growth factor signaling

    NASA Astrophysics Data System (ADS)

    Brown, K. S.; Hill, C. C.; Calero, G. A.; Myers, C. R.; Lee, K. H.; Sethna, J. P.; Cerione, R. A.

    2004-10-01

    The inherent complexity of cellular signaling networks and their importance to a wide range of cellular functions necessitates the development of modeling methods that can be applied toward making predictions and highlighting the appropriate experiments to test our understanding of how these systems are designed and function. We use methods of statistical mechanics to extract useful predictions for complex cellular signaling networks. A key difficulty with signaling models is that, while significant effort is being made to experimentally measure the rate constants for individual steps in these networks, many of the parameters required to describe their behavior remain unknown or at best represent estimates. To establish the usefulness of our approach, we have applied our methods toward modeling the nerve growth factor (NGF)-induced differentiation of neuronal cells. In particular, we study the actions of NGF and mitogenic epidermal growth factor (EGF) in rat pheochromocytoma (PC12) cells. Through a network of intermediate signaling proteins, each of these growth factors stimulates extracellular regulated kinase (Erk) phosphorylation with distinct dynamical profiles. Using our modeling approach, we are able to predict the influence of specific signaling modules in determining the integrated cellular response to the two growth factors. Our methods also raise some interesting insights into the design and possible evolution of cellular systems, highlighting an inherent property of these systems that we call 'sloppiness.'

  19. Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

    NASA Astrophysics Data System (ADS)

    Donges, Jonathan; Heitzig, Jobst; Beronov, Boyan; Wiedermann, Marc; Runge, Jakob; Feng, Qing Yi; Tupikina, Liubov; Stolbova, Veronika; Donner, Reik; Marwan, Norbert; Dijkstra, Henk; Kurths, Jürgen

    2016-04-01

    We introduce the pyunicorn (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. pyunicorn is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics, or network surrogates. Additionally, pyunicorn provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis, recurrence networks, visibility graphs, and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology. pyunicorn is available online at https://github.com/pik-copan/pyunicorn. Reference: J.F. Donges, J. Heitzig, B. Beronov, M. Wiedermann, J. Runge, Q.-Y. Feng, L. Tupikina, V. Stolbova, R.V. Donner, N. Marwan, H.A. Dijkstra, and J. Kurths, Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package, Chaos 25, 113101 (2015), DOI: 10.1063/1.4934554, Preprint: arxiv.org:1507.01571 [physics.data-an].

  20. Reducing Power Differentials in the Classroom Using Student-Based Quantitative Research Scenarios: Applications in Undergraduate and Graduate Research Methods Classrooms

    ERIC Educational Resources Information Center

    Morrow, Jennifer Ann; Kelly, Stephanie; Skolits, Gary

    2013-01-01

    Understanding and conducting research is a complex, integral skill that needs to be mastered by both undergraduate and graduate students. Yet many students are reluctant and often somewhat apprehensive about undertaking research and understanding the underlying statistical methods used to evaluate research (Dauphinee, Schau, & Stevens, 1997).…

  1. Stratification of complexity in congenital heart surgery: comparative study of the Risk Adjustment for Congenital Heart Surgery (RACHS-1) method, Aristotle basic score and Society of Thoracic Surgeons-European Association for Cardio- Thoracic Surgery (STS-EACTS) mortality score.

    PubMed

    Cavalcanti, Paulo Ernando Ferraz; Sá, Michel Pompeu Barros de Oliveira; Santos, Cecília Andrade dos; Esmeraldo, Isaac Melo; Chaves, Mariana Leal; Lins, Ricardo Felipe de Albuquerque; Lima, Ricardo de Carvalho

    2015-01-01

    To determine whether stratification of complexity models in congenital heart surgery (RACHS-1, Aristotle basic score and STS-EACTS mortality score) fit to our center and determine the best method of discriminating hospital mortality. Surgical procedures in congenital heart diseases in patients under 18 years of age were allocated to the categories proposed by the stratification of complexity methods currently available. The outcome hospital mortality was calculated for each category from the three models. Statistical analysis was performed to verify whether the categories presented different mortalities. The discriminatory ability of the models was determined by calculating the area under the ROC curve and a comparison between the curves of the three models was performed. 360 patients were allocated according to the three methods. There was a statistically significant difference between the mortality categories: RACHS-1 (1) - 1.3%, (2) - 11.4%, (3)-27.3%, (4) - 50 %, (P<0.001); Aristotle basic score (1) - 1.1%, (2) - 12.2%, (3) - 34%, (4) - 64.7%, (P<0.001); and STS-EACTS mortality score (1) - 5.5 %, (2) - 13.6%, (3) - 18.7%, (4) - 35.8%, (P<0.001). The three models had similar accuracy by calculating the area under the ROC curve: RACHS-1- 0.738; STS-EACTS-0.739; Aristotle- 0.766. The three models of stratification of complexity currently available in the literature are useful with different mortalities between the proposed categories with similar discriminatory capacity for hospital mortality.

  2. Stratification of complexity in congenital heart surgery: comparative study of the Risk Adjustment for Congenital Heart Surgery (RACHS-1) method, Aristotle basic score and Society of Thoracic Surgeons-European Association for Cardio- Thoracic Surgery (STS-EACTS) mortality score

    PubMed Central

    Cavalcanti, Paulo Ernando Ferraz; Sá, Michel Pompeu Barros de Oliveira; dos Santos, Cecília Andrade; Esmeraldo, Isaac Melo; Chaves, Mariana Leal; Lins, Ricardo Felipe de Albuquerque; Lima, Ricardo de Carvalho

    2015-01-01

    Objective To determine whether stratification of complexity models in congenital heart surgery (RACHS-1, Aristotle basic score and STS-EACTS mortality score) fit to our center and determine the best method of discriminating hospital mortality. Methods Surgical procedures in congenital heart diseases in patients under 18 years of age were allocated to the categories proposed by the stratification of complexity methods currently available. The outcome hospital mortality was calculated for each category from the three models. Statistical analysis was performed to verify whether the categories presented different mortalities. The discriminatory ability of the models was determined by calculating the area under the ROC curve and a comparison between the curves of the three models was performed. Results 360 patients were allocated according to the three methods. There was a statistically significant difference between the mortality categories: RACHS-1 (1) - 1.3%, (2) - 11.4%, (3)-27.3%, (4) - 50 %, (P<0.001); Aristotle basic score (1) - 1.1%, (2) - 12.2%, (3) - 34%, (4) - 64.7%, (P<0.001); and STS-EACTS mortality score (1) - 5.5 %, (2) - 13.6%, (3) - 18.7%, (4) - 35.8%, (P<0.001). The three models had similar accuracy by calculating the area under the ROC curve: RACHS-1- 0.738; STS-EACTS-0.739; Aristotle- 0.766. Conclusion The three models of stratification of complexity currently available in the literature are useful with different mortalities between the proposed categories with similar discriminatory capacity for hospital mortality. PMID:26107445

  3. [Construction of haplotype and haplotype block based on tag single nucleotide polymorphisms and their applications in association studies].

    PubMed

    Gu, Ming-liang; Chu, Jia-you

    2007-12-01

    Human genome has structures of haplotype and haplotype block which provide valuable information on human evolutionary history and may lead to the development of more efficient strategies to identify genetic variants that increase susceptibility to complex diseases. Haplotype block can be divided into discrete blocks of limited haplotype diversity. In each block, a small fraction of ptag SNPsq can be used to distinguish a large fraction of the haplotypes. These tag SNPs can be potentially useful for construction of haplotype and haplotype block, and association studies in complex diseases. There are two general classes of methods to construct haplotype and haplotype blocks based on genotypes on large pedigrees and statistical algorithms respectively. The author evaluate several construction methods to assess the power of different association tests with a variety of disease models and block-partitioning criteria. The advantages, limitations and applications of each method and the application in the association studies are discussed equitably. With the completion of the HapMap and development of statistical algorithms for addressing haplotype reconstruction, ideas of construction of haplotype based on combination of mathematics, physics, and computer science etc will have profound impacts on population genetics, location and cloning for susceptible genes in complex diseases, and related domain with life science etc.

  4. A functional U-statistic method for association analysis of sequencing data.

    PubMed

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  5. Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

    PubMed Central

    Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

    2015-01-01

    Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600

  6. Use of statistical and neural net approaches in predicting toxicity of chemicals.

    PubMed

    Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D

    2000-01-01

    Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.

  7. Different Manhattan project: automatic statistical model generation

    NASA Astrophysics Data System (ADS)

    Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore

    2002-03-01

    We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.

  8. LATTE Linking Acoustic Tests and Tagging Using Statistical Estimation

    DTIC Science & Technology

    2015-09-30

    the complexity of the model: (from simplest to most complex) Kalman filter , Markov chain Monte-Carlo (MCMC) and ABC. Many of these methods have been...using SMMs fitted using Kalman filters . Therefore, using the DTAG data, we can estimate the distributions associated with 2D horizontal displacement...speed (a key problem in the previous Kalman filter implementation). This new approach also allows the animal’s horizontal movement direction to differ

  9. A Method to Predict the Structure and Stability of RNA/RNA Complexes.

    PubMed

    Xu, Xiaojun; Chen, Shi-Jie

    2016-01-01

    RNA/RNA interactions are essential for genomic RNA dimerization and regulation of gene expression. Intermolecular loop-loop base pairing is a widespread and functionally important tertiary structure motif in RNA machinery. However, computational prediction of intermolecular loop-loop base pairing is challenged by the entropy and free energy calculation due to the conformational constraint and the intermolecular interactions. In this chapter, we describe a recently developed statistical mechanics-based method for the prediction of RNA/RNA complex structures and stabilities. The method is based on the virtual bond RNA folding model (Vfold). The main emphasis in the method is placed on the evaluation of the entropy and free energy for the loops, especially tertiary kissing loops. The method also uses recursive partition function calculations and two-step screening algorithm for large, complicated structures of RNA/RNA complexes. As case studies, we use the HIV-1 Mal dimer and the siRNA/HIV-1 mutant (T4) to illustrate the method.

  10. High-Density Signal Interface Electromagnetic Radiation Prediction for Electromagnetic Compatibility Evaluation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Halligan, Matthew

    Radiated power calculation approaches for practical scenarios of incomplete high- density interface characterization information and incomplete incident power information are presented. The suggested approaches build upon a method that characterizes power losses through the definition of power loss constant matrices. Potential radiated power estimates include using total power loss information, partial radiated power loss information, worst case analysis, and statistical bounding analysis. A method is also proposed to calculate radiated power when incident power information is not fully known for non-periodic signals at the interface. Incident data signals are modeled from a two-state Markov chain where bit state probabilities aremore » derived. The total spectrum for windowed signals is postulated as the superposition of spectra from individual pulses in a data sequence. Statistical bounding methods are proposed as a basis for the radiated power calculation due to the statistical calculation complexity to find a radiated power probability density function.« less

  11. Statistical methods for thermonuclear reaction rates and nucleosynthesis simulations

    NASA Astrophysics Data System (ADS)

    Iliadis, Christian; Longland, Richard; Coc, Alain; Timmes, F. X.; Champagne, Art E.

    2015-03-01

    Rigorous statistical methods for estimating thermonuclear reaction rates and nucleosynthesis are becoming increasingly established in nuclear astrophysics. The main challenge being faced is that experimental reaction rates are highly complex quantities derived from a multitude of different measured nuclear parameters (e.g., astrophysical S-factors, resonance energies and strengths, particle and γ-ray partial widths). We discuss the application of the Monte Carlo method to two distinct, but related, questions. First, given a set of measured nuclear parameters, how can one best estimate the resulting thermonuclear reaction rates and associated uncertainties? Second, given a set of appropriate reaction rates, how can one best estimate the abundances from nucleosynthesis (i.e., reaction network) calculations? The techniques described here provide probability density functions that can be used to derive statistically meaningful reaction rates and final abundances for any desired coverage probability. Examples are given for applications to s-process neutron sources, core-collapse supernovae, classical novae, and Big Bang nucleosynthesis.

  12. Sparse approximation of currents for statistics on curves and surfaces.

    PubMed

    Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas

    2008-01-01

    Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.

  13. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  14. Evolving Scale-Free Networks by Poisson Process: Modeling and Degree Distribution.

    PubMed

    Feng, Minyu; Qu, Hong; Yi, Zhang; Xie, Xiurui; Kurths, Jurgen

    2016-05-01

    Since the great mathematician Leonhard Euler initiated the study of graph theory, the network has been one of the most significant research subject in multidisciplinary. In recent years, the proposition of the small-world and scale-free properties of complex networks in statistical physics made the network science intriguing again for many researchers. One of the challenges of the network science is to propose rational models for complex networks. In this paper, in order to reveal the influence of the vertex generating mechanism of complex networks, we propose three novel models based on the homogeneous Poisson, nonhomogeneous Poisson and birth death process, respectively, which can be regarded as typical scale-free networks and utilized to simulate practical networks. The degree distribution and exponent are analyzed and explained in mathematics by different approaches. In the simulation, we display the modeling process, the degree distribution of empirical data by statistical methods, and reliability of proposed networks, results show our models follow the features of typical complex networks. Finally, some future challenges for complex systems are discussed.

  15. Avoiding the ensemble decorrelation problem using member-by-member post-processing

    NASA Astrophysics Data System (ADS)

    Van Schaeybroeck, Bert; Vannitsem, Stéphane

    2014-05-01

    Forecast calibration or post-processing has become a standard tool in atmospheric and climatological science due to the presence of systematic initial condition and model errors. For ensemble forecasts the most competitive methods derive from the assumption of a fixed ensemble distribution. However, when independently applying such 'statistical' methods at different locations, lead times or for multiple variables the correlation structure for individual ensemble members is destroyed. Instead of reastablishing the correlation structure as in Schefzik et al. (2013) we instead propose a calibration method that avoids such problem by correcting each ensemble member individually. Moreover, we analyse the fundamental mechanisms by which the probabilistic ensemble skill can be enhanced. In terms of continuous ranked probability score, our member-by-member approach amounts to skill gain that extends for lead times far beyond the error doubling time and which is as good as the one of the most competitive statistical approach, non-homogeneous Gaussian regression (Gneiting et al. 2005). Besides the conservation of correlation structure, additional benefits arise including the fact that higher-order ensemble moments like kurtosis and skewness are inherited from the uncorrected forecasts. Our detailed analysis is performed in the context of the Kuramoto-Sivashinsky equation and different simple models but the results extent succesfully to the ensemble forecast of the European Centre for Medium-Range Weather Forecasts (Van Schaeybroeck and Vannitsem, 2013, 2014) . References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Schefzik, R., T.L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty Quantification in Complex Simulation Models Using Ensemble Copula Coupling. To appear in Statistical Science 28. [3] Van Schaeybroeck, B., and S. Vannitsem, 2013: Reliable probabilities through statistical post-processing of ensemble forecasts. Proceedings of the European Conference on Complex Systems 2012, Springer proceedings on complexity, XVI, p. 347-352. [4] Van Schaeybroeck, B., and S. Vannitsem, 2014: Ensemble post-processing using member-by-member approaches: theoretical aspects, under review.

  16. A critique of the usefulness of inferential statistics in applied behavior analysis

    PubMed Central

    Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.

    1998-01-01

    Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304

  17. Application of multivariate statistical techniques in microbial ecology

    PubMed Central

    Paliy, O.; Shankar, V.

    2016-01-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791

  18. Walking through the statistical black boxes of plant breeding.

    PubMed

    Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin

    2016-10-01

    The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

  19. Quantifying spatial and temporal trends in beach-dune volumetric changes using spatial statistics

    NASA Astrophysics Data System (ADS)

    Eamer, Jordan B. R.; Walker, Ian J.

    2013-06-01

    Spatial statistics are generally underutilized in coastal geomorphology, despite offering great potential for identifying and quantifying spatial-temporal trends in landscape morphodynamics. In particular, local Moran's Ii provides a statistical framework for detecting clusters of significant change in an attribute (e.g., surface erosion or deposition) and quantifying how this changes over space and time. This study analyzes and interprets spatial-temporal patterns in sediment volume changes in a beach-foredune-transgressive dune complex following removal of invasive marram grass (Ammophila spp.). Results are derived by detecting significant changes in post-removal repeat DEMs derived from topographic surveys and airborne LiDAR. The study site was separated into discrete, linked geomorphic units (beach, foredune, transgressive dune complex) to facilitate sub-landscape scale analysis of volumetric change and sediment budget responses. Difference surfaces derived from a pixel-subtraction algorithm between interval DEMs and the LiDAR baseline DEM were filtered using the local Moran's Ii method and two different spatial weights (1.5 and 5 m) to detect statistically significant change. Moran's Ii results were compared with those derived from a more spatially uniform statistical method that uses a simpler student's t distribution threshold for change detection. Morphodynamic patterns and volumetric estimates were similar between the uniform geostatistical method and Moran's Ii at a spatial weight of 5 m while the smaller spatial weight (1.5 m) consistently indicated volumetric changes of less magnitude. The larger 5 m spatial weight was most representative of broader site morphodynamics and spatial patterns while the smaller spatial weight provided volumetric changes consistent with field observations. All methods showed foredune deflation immediately following removal with increased sediment volumes into the spring via deposition at the crest and on lobes in the lee, despite erosion on the stoss slope and dune toe. Generally, the foredune became wider by landward extension and the seaward slope recovered from erosion to a similar height and form to that of pre-restoration despite remaining essentially free of vegetation.

  20. Synthesizing evidence on complex interventions: how meta-analytical, qualitative, and mixed-method approaches can contribute.

    PubMed

    Petticrew, Mark; Rehfuess, Eva; Noyes, Jane; Higgins, Julian P T; Mayhew, Alain; Pantoja, Tomas; Shemilt, Ian; Sowden, Amanda

    2013-11-01

    Although there is increasing interest in the evaluation of complex interventions, there is little guidance on how evidence from complex interventions may be reviewed and synthesized, and the relevance of the plethora of evidence synthesis methods to complexity is unclear. This article aims to explore how different meta-analytical approaches can be used to examine aspects of complexity; describe the contribution of various narrative, tabular, and graphical approaches to synthesis; and give an overview of the potential choice of selected qualitative and mixed-method evidence synthesis approaches. The methodological discussions presented here build on a 2-day workshop held in Montebello, Canada, in January 2012, involving methodological experts from the Campbell and Cochrane Collaborations and from other international review centers (Anderson L, Petticrew M, Chandler J, et al. systematic reviews of complex interventions. In press). These systematic review methodologists discussed the broad range of existing methods and considered the relevance of these methods to reviews of complex interventions. The evidence from primary studies of complex interventions may be qualitative or quantitative. There is a wide range of methodological options for reviewing and presenting this evidence. Specific contributions of statistical approaches include the use of meta-analysis, meta-regression, and Bayesian methods, whereas narrative summary approaches provide valuable precursors or alternatives to these. Qualitative and mixed-method approaches include thematic synthesis, framework synthesis, and realist synthesis. A suitable combination of these approaches allows synthesis of evidence for understanding complex interventions. Reviewers need to consider which aspects of complex interventions should be a focus of their review and what types of quantitative and/or qualitative studies they will be including, and this will inform their choice of review methods. These may range from standard meta-analysis through to more complex mixed-method synthesis and synthesis approaches that incorporate theory and/or user's perspectives. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Comparison of Efficiency of Jackknife and Variance Component Estimators of Standard Errors. Program Statistics Research. Technical Report.

    ERIC Educational Resources Information Center

    Longford, Nicholas T.

    Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A…

  2. Robust inference for group sequential trials.

    PubMed

    Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

    2017-03-01

    For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.

  3. An online sleep apnea detection method based on recurrence quantification analysis.

    PubMed

    Nguyen, Hoa Dinh; Wilkins, Brek A; Cheng, Qi; Benjamin, Bruce Allen

    2014-07-01

    This paper introduces an online sleep apnea detection method based on heart rate complexity as measured by recurrence quantification analysis (RQA) statistics of heart rate variability (HRV) data. RQA statistics can capture nonlinear dynamics of a complex cardiorespiratory system during obstructive sleep apnea. In order to obtain a more robust measurement of the nonstationarity of the cardiorespiratory system, we use different fixed amount of neighbor thresholdings for recurrence plot calculation. We integrate a feature selection algorithm based on conditional mutual information to select the most informative RQA features for classification, and hence, to speed up the real-time classification process without degrading the performance of the system. Two types of binary classifiers, i.e., support vector machine and neural network, are used to differentiate apnea from normal sleep. A soft decision fusion rule is developed to combine the results of these classifiers in order to improve the classification performance of the whole system. Experimental results show that our proposed method achieves better classification results compared with the previous recurrence analysis-based approach. We also show that our method is flexible and a strong candidate for a real efficient sleep apnea detection system.

  4. Faster Mass Spectrometry-based Protein Inference: Junction Trees are More Efficient than Sampling and Marginalization by Enumeration

    PubMed Central

    Serang, Oliver; Noble, William Stafford

    2012-01-01

    The problem of identifying the proteins in a complex mixture using tandem mass spectrometry can be framed as an inference problem on a graph that connects peptides to proteins. Several existing protein identification methods make use of statistical inference methods for graphical models, including expectation maximization, Markov chain Monte Carlo, and full marginalization coupled with approximation heuristics. We show that, for this problem, the majority of the cost of inference usually comes from a few highly connected subgraphs. Furthermore, we evaluate three different statistical inference methods using a common graphical model, and we demonstrate that junction tree inference substantially improves rates of convergence compared to existing methods. The python code used for this paper is available at http://noble.gs.washington.edu/proj/fido. PMID:22331862

  5. solGS: a web-based tool for genomic selection

    USDA-ARS?s Scientific Manuscript database

    Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, ana...

  6. Random forests for classification in ecology

    USGS Publications Warehouse

    Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

    2007-01-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.

  7. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data.

    PubMed

    Mifsud, Borbala; Martincorena, Inigo; Darbo, Elodie; Sugar, Robert; Schoenfelder, Stefan; Fraser, Peter; Luscombe, Nicholas M

    2017-01-01

    Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).

  8. Information and material flows in complex networks

    NASA Astrophysics Data System (ADS)

    Helbing, Dirk; Armbruster, Dieter; Mikhailov, Alexander S.; Lefeber, Erjen

    2006-04-01

    In this special issue, an overview of the Thematic Institute (TI) on Information and Material Flows in Complex Systems is given. The TI was carried out within EXYSTENCE, the first EU Network of Excellence in the area of complex systems. Its motivation, research approach and subjects are presented here. Among the various methods used are many-particle and statistical physics, nonlinear dynamics, as well as complex systems, network and control theory. The contributions are relevant for complex systems as diverse as vehicle and data traffic in networks, logistics, production, and material flows in biological systems. The key disciplines involved are socio-, econo-, traffic- and bio-physics, and a new research area that could be called “biologistics”.

  9. Sparse intervertebral fence composition for 3D cervical vertebra segmentation

    NASA Astrophysics Data System (ADS)

    Liu, Xinxin; Yang, Jian; Song, Shuang; Cong, Weijian; Jiao, Peifeng; Song, Hong; Ai, Danni; Jiang, Yurong; Wang, Yongtian

    2018-06-01

    Statistical shape models are capable of extracting shape prior information, and are usually utilized to assist the task of segmentation of medical images. However, such models require large training datasets in the case of multi-object structures, and it also is difficult to achieve satisfactory results for complex shapes. This study proposed a novel statistical model for cervical vertebra segmentation, called sparse intervertebral fence composition (SiFC), which can reconstruct the boundary between adjacent vertebrae by modeling intervertebral fences. The complex shape of the cervical spine is replaced by a simple intervertebral fence, which considerably reduces the difficulty of cervical segmentation. The final segmentation results are obtained by using a 3D active contour deformation model without shape constraint, which substantially enhances the recognition capability of the proposed method for objects with complex shapes. The proposed segmentation framework is tested on a dataset with CT images from 20 patients. A quantitative comparison against corresponding reference vertebral segmentation yields an overall mean absolute surface distance of 0.70 mm and a dice similarity index of 95.47% for cervical vertebral segmentation. The experimental results show that the SiFC method achieves competitive cervical vertebral segmentation performances, and completely eliminates inter-process overlap.

  10. RADSS: an integration of GIS, spatial statistics, and network service for regional data mining

    NASA Astrophysics Data System (ADS)

    Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing

    2005-10-01

    Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.

  11. Conceptual and statistical problems associated with the use of diversity indices in ecology.

    PubMed

    Barrantes, Gilbert; Sandoval, Luis

    2009-09-01

    Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.

  12. Distinguishing synchronous and time-varying synergies using point process interval statistics: motor primitives in frog and rat

    PubMed Central

    Hart, Corey B.; Giszter, Simon F.

    2013-01-01

    We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (p < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors. PMID:23675341

  13. Huffman and linear scanning methods with statistical language models.

    PubMed

    Roark, Brian; Fried-Oken, Melanie; Gibbons, Chris

    2015-03-01

    Current scanning access methods for text generation in AAC devices are limited to relatively few options, most notably row/column variations within a matrix. We present Huffman scanning, a new method for applying statistical language models to binary-switch, static-grid typing AAC interfaces, and compare it to other scanning options under a variety of conditions. We present results for 16 adults without disabilities and one 36-year-old man with locked-in syndrome who presents with complex communication needs and uses AAC scanning devices for writing. Huffman scanning with a statistical language model yielded significant typing speedups for the 16 participants without disabilities versus any of the other methods tested, including two row/column scanning methods. A similar pattern of results was found with the individual with locked-in syndrome. Interestingly, faster typing speeds were obtained with Huffman scanning using a more leisurely scan rate than relatively fast individually calibrated scan rates. Overall, the results reported here demonstrate great promise for the usability of Huffman scanning as a faster alternative to row/column scanning.

  14. An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.

    PubMed

    Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca

    2002-09-01

    In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.

  15. Fault diagnosis of sensor networked structures with multiple faults using a virtual beam based approach

    NASA Astrophysics Data System (ADS)

    Wang, H.; Jing, X. J.

    2017-07-01

    This paper presents a virtual beam based approach suitable for conducting diagnosis of multiple faults in complex structures with limited prior knowledge of the faults involved. The "virtual beam", a recently-proposed concept for fault detection in complex structures, is applied, which consists of a chain of sensors representing a vibration energy transmission path embedded in the complex structure. Statistical tests and adaptive threshold are particularly adopted for fault detection due to limited prior knowledge of normal operational conditions and fault conditions. To isolate the multiple faults within a specific structure or substructure of a more complex one, a 'biased running' strategy is developed and embedded within the bacterial-based optimization method to construct effective virtual beams and thus to improve the accuracy of localization. The proposed method is easy and efficient to implement for multiple fault localization with limited prior knowledge of normal conditions and faults. With extensive experimental results, it is validated that the proposed method can localize both single fault and multiple faults more effectively than the classical trust index subtract on negative add on positive (TI-SNAP) method.

  16. Detection of crossover time scales in multifractal detrended fluctuation analysis

    NASA Astrophysics Data System (ADS)

    Ge, Erjia; Leung, Yee

    2013-04-01

    Fractal is employed in this paper as a scale-based method for the identification of the scaling behavior of time series. Many spatial and temporal processes exhibiting complex multi(mono)-scaling behaviors are fractals. One of the important concepts in fractals is crossover time scale(s) that separates distinct regimes having different fractal scaling behaviors. A common method is multifractal detrended fluctuation analysis (MF-DFA). The detection of crossover time scale(s) is, however, relatively subjective since it has been made without rigorous statistical procedures and has generally been determined by eye balling or subjective observation. Crossover time scales such determined may be spurious and problematic. It may not reflect the genuine underlying scaling behavior of a time series. The purpose of this paper is to propose a statistical procedure to model complex fractal scaling behaviors and reliably identify the crossover time scales under MF-DFA. The scaling-identification regression model, grounded on a solid statistical foundation, is first proposed to describe multi-scaling behaviors of fractals. Through the regression analysis and statistical inference, we can (1) identify the crossover time scales that cannot be detected by eye-balling observation, (2) determine the number and locations of the genuine crossover time scales, (3) give confidence intervals for the crossover time scales, and (4) establish the statistically significant regression model depicting the underlying scaling behavior of a time series. To substantive our argument, the regression model is applied to analyze the multi-scaling behaviors of avian-influenza outbreaks, water consumption, daily mean temperature, and rainfall of Hong Kong. Through the proposed model, we can have a deeper understanding of fractals in general and a statistical approach to identify multi-scaling behavior under MF-DFA in particular.

  17. A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks.

    PubMed

    Le, Duc-Hau

    2015-01-01

    Protein complexes formed by non-covalent interaction among proteins play important roles in cellular functions. Computational and purification methods have been used to identify many protein complexes and their cellular functions. However, their roles in terms of causing disease have not been well discovered yet. There exist only a few studies for the identification of disease-associated protein complexes. However, they mostly utilize complicated heterogeneous networks which are constructed based on an out-of-date database of phenotype similarity network collected from literature. In addition, they only apply for diseases for which tissue-specific data exist. In this study, we propose a method to identify novel disease-protein complex associations. First, we introduce a framework to construct functional similarity protein complex networks where two protein complexes are functionally connected by either shared protein elements, shared annotating GO terms or based on protein interactions between elements in each protein complex. Second, we propose a simple but effective neighborhood-based algorithm, which yields a local similarity measure, to rank disease candidate protein complexes. Comparing the predictive performance of our proposed algorithm with that of two state-of-the-art network propagation algorithms including one we used in our previous study, we found that it performed statistically significantly better than that of these two algorithms for all the constructed functional similarity protein complex networks. In addition, it ran about 32 times faster than these two algorithms. Moreover, our proposed method always achieved high performance in terms of AUC values irrespective of the ways to construct the functional similarity protein complex networks and the used algorithms. The performance of our method was also higher than that reported in some existing methods which were based on complicated heterogeneous networks. Finally, we also tested our method with prostate cancer and selected the top 100 highly ranked candidate protein complexes. Interestingly, 69 of them were evidenced since at least one of their protein elements are known to be associated with prostate cancer. Our proposed method, including the framework to construct functional similarity protein complex networks and the neighborhood-based algorithm on these networks, could be used for identification of novel disease-protein complex associations.

  18. Spectrophotometric Determination of Mycophenolate Mofetil as Its Charge-Transfer Complexes with Two π-Acceptors

    PubMed Central

    Vinay, K. B.; Revanasiddappa, H. D.; Raghu, M. S.; Abdulrahman, Sameer. A. M.; Rajendraprasad, N.

    2012-01-01

    Two simple, selective, and rapid spectrophotometric methods are described for the determination of mycophenolate mofetil (MPM) in pure form and in tablets. Both methods are based on charge-transfer complexation reaction of MPM with p-chloranilic acid (p-CA) or 2,3-dichloro-5,6-dicyano-1,4-benzoquinone (DDQ) in dioxane-acetonitrile medium resulting in coloured product measurable at 520 nm (p-CA) or 580 nm (DDQ). Beer's law is obeyed over the concentration ranges of 40–400 and 12–120 μg mL−1 MPM for p-CA and DDQ, respectively, with correlation coefficients (r) of 0.9995 and 0.9947. The apparent molar absorptivity values are calculated to be 1.06 × 103 and 3.87 × 103 L mol−1 cm−1, respectively, and the corresponding Sandell's sensitivities are 0.4106 and 0.1119 μg cm−1. The limits of detection (LOD) and quantification (LOQ) are also reported for both methods. The described methods were successfully applied to the determination of MPM in tablets. Statistical comparison of the results with those of the reference method showed excellent agreement. No interference was observed from the common excipients present in tablets. Both methods were validated statistically for accuracy and precision. The accuracy and reliability of the methods were further ascertained by recovery studies via standard addition procedure. PMID:22567572

  19. Synaptic dynamics contribute to long-term single neuron response fluctuations.

    PubMed

    Reinartz, Sebastian; Biro, Istvan; Gal, Asaf; Giugliano, Michele; Marom, Shimon

    2014-01-01

    Firing rate variability at the single neuron level is characterized by long-memory processes and complex statistics over a wide range of time scales (from milliseconds up to several hours). Here, we focus on the contribution of non-stationary efficacy of the ensemble of synapses-activated in response to a given stimulus-on single neuron response variability. We present and validate a method tailored for controlled and specific long-term activation of a single cortical neuron in vitro via synaptic or antidromic stimulation, enabling a clear separation between two determinants of neuronal response variability: membrane excitability dynamics vs. synaptic dynamics. Applying this method we show that, within the range of physiological activation frequencies, the synaptic ensemble of a given neuron is a key contributor to the neuronal response variability, long-memory processes and complex statistics observed over extended time scales. Synaptic transmission dynamics impact on response variability in stimulation rates that are substantially lower compared to stimulation rates that drive excitability resources to fluctuate. Implications to network embedded neurons are discussed.

  20. Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples.

    PubMed

    Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

    2017-07-05

    Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.

  1. Partitioning heritability by functional annotation using genome-wide association summary statistics.

    PubMed

    Finucane, Hilary K; Bulik-Sullivan, Brendan; Gusev, Alexander; Trynka, Gosia; Reshef, Yakir; Loh, Po-Ru; Anttila, Verneri; Xu, Han; Zang, Chongzhi; Farh, Kyle; Ripke, Stephan; Day, Felix R; Purcell, Shaun; Stahl, Eli; Lindstrom, Sara; Perry, John R B; Okada, Yukinori; Raychaudhuri, Soumya; Daly, Mark J; Patterson, Nick; Neale, Benjamin M; Price, Alkes L

    2015-11-01

    Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.

  2. The value and cost of complexity in predictive modelling: role of tissue anisotropic conductivity and fibre tracts in neuromodulation

    NASA Astrophysics Data System (ADS)

    Salman Shahid, Syed; Bikson, Marom; Salman, Humaira; Wen, Peng; Ahfock, Tony

    2014-06-01

    Objectives. Computational methods are increasingly used to optimize transcranial direct current stimulation (tDCS) dose strategies and yet complexities of existing approaches limit their clinical access. Since predictive modelling indicates the relevance of subject/pathology based data and hence the need for subject specific modelling, the incremental clinical value of increasingly complex modelling methods must be balanced against the computational and clinical time and costs. For example, the incorporation of multiple tissue layers and measured diffusion tensor (DTI) based conductivity estimates increase model precision but at the cost of clinical and computational resources. Costs related to such complexities aggregate when considering individual optimization and the myriad of potential montages. Here, rather than considering if additional details change current-flow prediction, we consider when added complexities influence clinical decisions. Approach. Towards developing quantitative and qualitative metrics of value/cost associated with computational model complexity, we considered field distributions generated by two 4 × 1 high-definition montages (m1 = 4 × 1 HD montage with anode at C3 and m2 = 4 × 1 HD montage with anode at C1) and a single conventional (m3 = C3-Fp2) tDCS electrode montage. We evaluated statistical methods, including residual error (RE) and relative difference measure (RDM), to consider the clinical impact and utility of increased complexities, namely the influence of skull, muscle and brain anisotropic conductivities in a volume conductor model. Main results. Anisotropy modulated current-flow in a montage and region dependent manner. However, significant statistical changes, produced within montage by anisotropy, did not change qualitative peak and topographic comparisons across montages. Thus for the examples analysed, clinical decision on which dose to select would not be altered by the omission of anisotropic brain conductivity. Significance. Results illustrate the need to rationally balance the role of model complexity, such as anisotropy in detailed current flow analysis versus value in clinical dose design. However, when extending our analysis to include axonal polarization, the results provide presumably clinically meaningful information. Hence the importance of model complexity may be more relevant with cellular level predictions of neuromodulation.

  3. Datamining approaches for modeling tumor control probability.

    PubMed

    Naqa, Issam El; Deasy, Joseph O; Mu, Yi; Huang, Ellen; Hope, Andrew J; Lindsay, Patricia E; Apte, Aditya; Alaly, James; Bradley, Jeffrey D

    2010-11-01

    Tumor control probability (TCP) to radiotherapy is determined by complex interactions between tumor biology, tumor microenvironment, radiation dosimetry, and patient-related variables. The complexity of these heterogeneous variable interactions constitutes a challenge for building predictive models for routine clinical practice. We describe a datamining framework that can unravel the higher order relationships among dosimetric dose-volume prognostic variables, interrogate various radiobiological processes, and generalize to unseen data before when applied prospectively. Several datamining approaches are discussed that include dose-volume metrics, equivalent uniform dose, mechanistic Poisson model, and model building methods using statistical regression and machine learning techniques. Institutional datasets of non-small cell lung cancer (NSCLC) patients are used to demonstrate these methods. The performance of the different methods was evaluated using bivariate Spearman rank correlations (rs). Over-fitting was controlled via resampling methods. Using a dataset of 56 patients with primary NCSLC tumors and 23 candidate variables, we estimated GTV volume and V75 to be the best model parameters for predicting TCP using statistical resampling and a logistic model. Using these variables, the support vector machine (SVM) kernel method provided superior performance for TCP prediction with an rs=0.68 on leave-one-out testing compared to logistic regression (rs=0.4), Poisson-based TCP (rs=0.33), and cell kill equivalent uniform dose model (rs=0.17). The prediction of treatment response can be improved by utilizing datamining approaches, which are able to unravel important non-linear complex interactions among model variables and have the capacity to predict on unseen data for prospective clinical applications.

  4. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

    PubMed Central

    McDermott, Jason E.; Wang, Jing; Mitchell, Hugh; Webb-Robertson, Bobbie-Jo; Hafen, Ryan; Ramey, John; Rodland, Karin D.

    2012-01-01

    Introduction The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers. PMID:23335946

  5. Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.

    2013-01-01

    The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less

  6. Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies.

    PubMed

    Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen

    2018-03-01

    Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.

  7. A clinical research analytics toolkit for cohort study.

    PubMed

    Yu, Yiqin; Zhu, Yu; Sun, Xingzhi; Tao, Ying; Zhang, Shuo; Xu, Linhao; Pan, Yue

    2012-01-01

    This paper presents a clinical informatics toolkit that can assist physicians to conduct cohort studies effectively and efficiently. The toolkit has three key features: 1) support of procedures defined in epidemiology, 2) recommendation of statistical methods in data analysis, and 3) automatic generation of research reports. On one hand, our system can help physicians control research quality by leveraging the integrated knowledge of epidemiology and medical statistics; on the other hand, it can improve productivity by reducing the complexities for physicians during their cohort studies.

  8. Statistical process control: A feasibility study of the application of time-series measurement in early neurorehabilitation after acquired brain injury.

    PubMed

    Markovic, Gabriela; Schult, Marie-Louise; Bartfai, Aniko; Elg, Mattias

    2017-01-31

    Progress in early cognitive recovery after acquired brain injury is uneven and unpredictable, and thus the evaluation of rehabilitation is complex. The use of time-series measurements is susceptible to statistical change due to process variation. To evaluate the feasibility of using a time-series method, statistical process control, in early cognitive rehabilitation. Participants were 27 patients with acquired brain injury undergoing interdisciplinary rehabilitation of attention within 4 months post-injury. The outcome measure, the Paced Auditory Serial Addition Test, was analysed using statistical process control. Statistical process control identifies if and when change occurs in the process according to 3 patterns: rapid, steady or stationary performers. The statistical process control method was adjusted, in terms of constructing the baseline and the total number of measurement points, in order to measure a process in change. Statistical process control methodology is feasible for use in early cognitive rehabilitation, since it provides information about change in a process, thus enabling adjustment of the individual treatment response. Together with the results indicating discernible subgroups that respond differently to rehabilitation, statistical process control could be a valid tool in clinical decision-making. This study is a starting-point in understanding the rehabilitation process using a real-time-measurements approach.

  9. Ariadne's Thread: A Robust Software Solution Leading to Automated Absolute and Relative Quantification of SRM Data.

    PubMed

    Nasso, Sara; Goetze, Sandra; Martens, Lennart

    2015-09-04

    Selected reaction monitoring (SRM) MS is a highly selective and sensitive technique to quantify protein abundances in complex biological samples. To enhance the pace of SRM large studies, a validated, robust method to fully automate absolute quantification and to substitute for interactive evaluation would be valuable. To address this demand, we present Ariadne, a Matlab software. To quantify monitored targets, Ariadne exploits metadata imported from the transition lists, and targets can be filtered according to mProphet output. Signal processing and statistical learning approaches are combined to compute peptide quantifications. To robustly estimate absolute abundances, the external calibration curve method is applied, ensuring linearity over the measured dynamic range. Ariadne was benchmarked against mProphet and Skyline by comparing its quantification performance on three different dilution series, featuring either noisy/smooth traces without background or smooth traces with complex background. Results, evaluated as efficiency, linearity, accuracy, and precision of quantification, showed that Ariadne's performance is independent of data smoothness and complex background presence and that Ariadne outperforms mProphet on the noisier data set and improved 2-fold Skyline's accuracy and precision for the lowest abundant dilution with complex background. Remarkably, Ariadne could statistically distinguish from each other all different abundances, discriminating dilutions as low as 0.1 and 0.2 fmol. These results suggest that Ariadne offers reliable and automated analysis of large-scale SRM differential expression studies.

  10. Detection and characterization of nonspecific, sparsely-populated binding modes in the early stages of complexation

    PubMed Central

    Cardone, A.; Bornstein, A.; Pant, H. C.; Brady, M.; Sriram, R.; Hassan, S. A.

    2015-01-01

    A method is proposed to study protein-ligand binding in a system governed by specific and non-specific interactions. Strong associations lead to narrow distributions in the proteins configuration space; weak and ultra-weak associations lead instead to broader distributions, a manifestation of non-specific, sparsely-populated binding modes with multiple interfaces. The method is based on the notion that a discrete set of preferential first-encounter modes are metastable states from which stable (pre-relaxation) complexes at equilibrium evolve. The method can be used to explore alternative pathways of complexation with statistical significance and can be integrated into a general algorithm to study protein interaction networks. The method is applied to a peptide-protein complex. The peptide adopts several low-population conformers and binds in a variety of modes with a broad range of affinities. The system is thus well suited to analyze general features of binding, including conformational selection, multiplicity of binding modes, and nonspecific interactions, and to illustrate how the method can be applied to study these problems systematically. The equilibrium distributions can be used to generate biasing functions for simulations of multiprotein systems from which bulk thermodynamic quantities can be calculated. PMID:25782918

  11. Quantitative assessment of copper proteinates used as animal feed additives using ATR-FTIR spectroscopy and powder X-ray diffraction (PXRD) analysis.

    PubMed

    Cantwell, Caoimhe A; Byrne, Laurann A; Connolly, Cathal D; Hynes, Michael J; McArdle, Patrick; Murphy, Richard A

    2017-08-01

    The aim of the present work was to establish a reliable analytical method to determine the degree of complexation in commercial metal proteinates used as feed additives in the solid state. Two complementary techniques were developed. Firstly, a quantitative attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopic method investigated modifications in vibrational absorption bands of the ligand on complex formation. Secondly, a powder X-ray diffraction (PXRD) method to quantify the amount of crystalline material in the proteinate product was developed. These methods were developed in tandem and cross-validated with each other. Multivariate analysis (MVA) was used to develop validated calibration and prediction models. The FTIR and PXRD calibrations showed excellent linearity (R 2  > 0.99). The diagnostic model parameters showed that the FTIR and PXRD methods were robust with a root mean square error of calibration RMSEC ≤3.39% and a root mean square error of prediction RMSEP ≤7.17% respectively. Comparative statistics show excellent agreement between the MVA packages assessed and between the FTIR and PXRD methods. The methods can be used to determine the degree of complexation in complexes of both protein hydrolysates and pure amino acids.

  12. Systematic sampling for suspended sediment

    Treesearch

    Robert B. Thomas

    1991-01-01

    Abstract - Because of high costs or complex logistics, scientific populations cannot be measured entirely and must be sampled. Accepted scientific practice holds that sample selection be based on statistical principles to assure objectivity when estimating totals and variances. Probability sampling--obtaining samples with known probabilities--is the only method that...

  13. Spectroscopic and Statistical Techniques for Information Recovery in Metabonomics and Metabolomics

    NASA Astrophysics Data System (ADS)

    Lindon, John C.; Nicholson, Jeremy K.

    2008-07-01

    Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.

  14. Spectroscopic and statistical techniques for information recovery in metabonomics and metabolomics.

    PubMed

    Lindon, John C; Nicholson, Jeremy K

    2008-01-01

    Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.

  15. The New Maia Detector System: Methods For High Definition Trace Element Imaging Of Natural Material

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ryan, C. G.; School of Physics, University of Melbourne, Parkville VIC; CODES Centre of Excellence, University of Tasmania, Hobart TAS

    2010-04-06

    Motivated by the need for megapixel high definition trace element imaging to capture intricate detail in natural material, together with faster acquisition and improved counting statistics in elemental imaging, a large energy-dispersive detector array called Maia has been developed by CSIRO and BNL for SXRF imaging on the XFM beamline at the Australian Synchrotron. A 96 detector prototype demonstrated the capacity of the system for real-time deconvolution of complex spectral data using an embedded implementation of the Dynamic Analysis method and acquiring highly detailed images up to 77 M pixels spanning large areas of complex mineral sample sections.

  16. The New Maia Detector System: Methods For High Definition Trace Element Imaging Of Natural Material

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ryan, C.G.; Siddons, D.P.; Kirkham, R.

    2010-05-25

    Motivated by the need for megapixel high definition trace element imaging to capture intricate detail in natural material, together with faster acquisition and improved counting statistics in elemental imaging, a large energy-dispersive detector array called Maia has been developed by CSIRO and BNL for SXRF imaging on the XFM beamline at the Australian Synchrotron. A 96 detector prototype demonstrated the capacity of the system for real-time deconvolution of complex spectral data using an embedded implementation of the Dynamic Analysis method and acquiring highly detailed images up to 77 M pixels spanning large areas of complex mineral sample sections.

  17. Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines

    NASA Astrophysics Data System (ADS)

    Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.

    2016-12-01

    Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.

  18. Action detection by double hierarchical multi-structure space-time statistical matching model

    NASA Astrophysics Data System (ADS)

    Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang

    2018-03-01

    Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.

  19. Action detection by double hierarchical multi-structure space–time statistical matching model

    NASA Astrophysics Data System (ADS)

    Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang

    2018-06-01

    Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.

  20. A survey of design methods for failure detection in dynamic systems

    NASA Technical Reports Server (NTRS)

    Willsky, A. S.

    1975-01-01

    A number of methods for detecting abrupt changes (such as failures) in stochastic dynamical systems are surveyed. The class of linear systems is concentrated on but the basic concepts, if not the detailed analyses, carry over to other classes of systems. The methods surveyed range from the design of specific failure-sensitive filters, to the use of statistical tests on filter innovations, to the development of jump process formulations. Tradeoffs in complexity versus performance are discussed.

  1. Use of personalized Dynamic Treatment Regimes (DTRs) and Sequential Multiple Assignment Randomized Trials (SMARTs) in mental health studies

    PubMed Central

    Liu, Ying; ZENG, Donglin; WANG, Yuanjia

    2014-01-01

    Summary Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each point where a clinical decision is made based on each patient’s time-varying characteristics and intermediate outcomes observed at earlier points in time. The complexity, patient heterogeneity, and chronicity of mental disorders call for learning optimal DTRs to dynamically adapt treatment to an individual’s response over time. The Sequential Multiple Assignment Randomized Trial (SMARTs) design allows for estimating causal effects of DTRs. Modern statistical tools have been developed to optimize DTRs based on personalized variables and intermediate outcomes using rich data collected from SMARTs; these statistical methods can also be used to recommend tailoring variables for designing future SMART studies. This paper introduces DTRs and SMARTs using two examples in mental health studies, discusses two machine learning methods for estimating optimal DTR from SMARTs data, and demonstrates the performance of the statistical methods using simulated data. PMID:25642116

  2. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics.

    PubMed

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q

    2016-06-08

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.

  3. Numerical solutions of ideal quantum gas dynamical flows governed by semiclassical ellipsoidal-statistical distribution.

    PubMed

    Yang, Jaw-Yen; Yan, Chih-Yuan; Diaz, Manuel; Huang, Juan-Chen; Li, Zhihui; Zhang, Hanxin

    2014-01-08

    The ideal quantum gas dynamics as manifested by the semiclassical ellipsoidal-statistical (ES) equilibrium distribution derived in Wu et al. (Wu et al . 2012 Proc. R. Soc. A 468 , 1799-1823 (doi:10.1098/rspa.2011.0673)) is numerically studied for particles of three statistics. This anisotropic ES equilibrium distribution was derived using the maximum entropy principle and conserves the mass, momentum and energy, but differs from the standard Fermi-Dirac or Bose-Einstein distribution. The present numerical method combines the discrete velocity (or momentum) ordinate method in momentum space and the high-resolution shock-capturing method in physical space. A decoding procedure to obtain the necessary parameters for determining the ES distribution is also devised. Computations of two-dimensional Riemann problems are presented, and various contours of the quantities unique to this ES model are illustrated. The main flow features, such as shock waves, expansion waves and slip lines and their complex nonlinear interactions, are depicted and found to be consistent with existing calculations for a classical gas.

  4. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

    PubMed Central

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q.

    2016-01-01

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519

  5. Methyl-CpG island-associated genome signature tags

    DOEpatents

    Dunn, John J

    2014-05-20

    Disclosed is a method for analyzing the organismic complexity of a sample through analysis of the nucleic acid in the sample. In the disclosed method, through a series of steps, including digestion with a type II restriction enzyme, ligation of capture adapters and linkers and digestion with a type IIS restriction enzyme, genome signature tags are produced. The sequences of a statistically significant number of the signature tags are determined and the sequences are used to identify and quantify the organisms in the sample. Various embodiments of the invention described herein include methods for using single point genome signature tags to analyze the related families present in a sample, methods for analyzing sequences associated with hyper- and hypo-methylated CpG islands, methods for visualizing organismic complexity change in a sampling location over time and methods for generating the genome signature tag profile of a sample of fragmented DNA.

  6. Using genetic data to strengthen causal inference in observational research.

    PubMed

    Pingault, Jean-Baptiste; O'Reilly, Paul F; Schoeler, Tabea; Ploubidis, George B; Rijsdijk, Frühling; Dudbridge, Frank

    2018-06-05

    Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology - including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining - has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.

  7. Application of multivariate statistical techniques in microbial ecology.

    PubMed

    Paliy, O; Shankar, V

    2016-03-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.

  8. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  9. Uncertainties of flood frequency estimation approaches based on continuous simulation using data resampling

    NASA Astrophysics Data System (ADS)

    Arnaud, Patrick; Cantet, Philippe; Odry, Jean

    2017-11-01

    Flood frequency analyses (FFAs) are needed for flood risk management. Many methods exist ranging from classical purely statistical approaches to more complex approaches based on process simulation. The results of these methods are associated with uncertainties that are sometimes difficult to estimate due to the complexity of the approaches or the number of parameters, especially for process simulation. This is the case of the simulation-based FFA approach called SHYREG presented in this paper, in which a rainfall generator is coupled with a simple rainfall-runoff model in an attempt to estimate the uncertainties due to the estimation of the seven parameters needed to estimate flood frequencies. The six parameters of the rainfall generator are mean values, so their theoretical distribution is known and can be used to estimate the generator uncertainties. In contrast, the theoretical distribution of the single hydrological model parameter is unknown; consequently, a bootstrap method is applied to estimate the calibration uncertainties. The propagation of uncertainty from the rainfall generator to the hydrological model is also taken into account. This method is applied to 1112 basins throughout France. Uncertainties coming from the SHYREG method and from purely statistical approaches are compared, and the results are discussed according to the length of the recorded observations, basin size and basin location. Uncertainties of the SHYREG method decrease as the basin size increases or as the length of the recorded flow increases. Moreover, the results show that the confidence intervals of the SHYREG method are relatively small despite the complexity of the method and the number of parameters (seven). This is due to the stability of the parameters and takes into account the dependence of uncertainties due to the rainfall model and the hydrological calibration. Indeed, the uncertainties on the flow quantiles are on the same order of magnitude as those associated with the use of a statistical law with two parameters (here generalised extreme value Type I distribution) and clearly lower than those associated with the use of a three-parameter law (here generalised extreme value Type II distribution). For extreme flood quantiles, the uncertainties are mostly due to the rainfall generator because of the progressive saturation of the hydrological model.

  10. Joint QTL linkage mapping for multiple-cross mating design sharing one common parent

    USDA-ARS?s Scientific Manuscript database

    Nested association mapping (NAM) is a novel genetic mating design that combines the advantages of linkage analysis and association mapping. This design provides opportunities to study the inheritance of complex traits, but also requires more advanced statistical methods. In this paper, we present th...

  11. Averaging Models: Parameters Estimation with the R-Average Procedure

    ERIC Educational Resources Information Center

    Vidotto, G.; Massidda, D.; Noventa, S.

    2010-01-01

    The Functional Measurement approach, proposed within the theoretical framework of Information Integration Theory (Anderson, 1981, 1982), can be a useful multi-attribute analysis tool. Compared to the majority of statistical models, the averaging model can account for interaction effects without adding complexity. The R-Average method (Vidotto &…

  12. Quasi-Experimental Analysis: A Mixture of Methods and Judgment.

    ERIC Educational Resources Information Center

    Cordray, David S.

    1986-01-01

    The role of human judgment in the development and synthesis of evidence has not been adequately developed or acknowledged within quasi-experimental analysis. Corrective solutions need to confront the fact that causal analysis within complex environments will require a more active assessment that entails reasoning and statistical modeling.…

  13. Implementation and Use of the Reference Analytics Module of LibAnswers

    ERIC Educational Resources Information Center

    Flatley, Robert; Jensen, Robert Bruce

    2012-01-01

    Academic libraries have traditionally collected reference statistics using hash marks on paper. Although efficient and simple, this method is not an effective way to capture the complexity of reference transactions. Several electronic tools are now available to assist libraries with collecting often elusive reference data--among them homegrown…

  14. Specifying and Refining a Complex Measurement Model.

    ERIC Educational Resources Information Center

    Levy, Roy; Mislevy, Robert J.

    This paper aims to describe a Bayesian approach to modeling and estimating cognitive models both in terms of statistical machinery and actual instrument development. Such a method taps the knowledge of experts to provide initial estimates for the probabilistic relationships among the variables in a multivariate latent variable model and refines…

  15. Comparisons between physics-based, engineering, and statistical learning models for outdoor sound propagation.

    PubMed

    Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T

    2016-05-01

    Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.

  16. A Subband Coding Method for HDTV

    NASA Technical Reports Server (NTRS)

    Chung, Wilson; Kossentini, Faouzi; Smith, Mark J. T.

    1995-01-01

    This paper introduces a new HDTV coder based on motion compensation, subband coding, and high order conditional entropy coding. The proposed coder exploits the temporal and spatial statistical dependencies inherent in the HDTV signal by using intra- and inter-subband conditioning for coding both the motion coordinates and the residual signal. The new framework provides an easy way to control the system complexity and performance, and inherently supports multiresolution transmission. Experimental results show that the coder outperforms MPEG-2, while still maintaining relatively low complexity.

  17. Complex networks as a unified framework for descriptive analysis and predictive modeling in climate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R

    The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less

  18. Finding equilibrium in the spatiotemporal chaos of the complex Ginzburg-Landau equation

    NASA Astrophysics Data System (ADS)

    Ballard, Christopher C.; Esty, C. Clark; Egolf, David A.

    2016-11-01

    Equilibrium statistical mechanics allows the prediction of collective behaviors of large numbers of interacting objects from just a few system-wide properties; however, a similar theory does not exist for far-from-equilibrium systems exhibiting complex spatial and temporal behavior. We propose a method for predicting behaviors in a broad class of such systems and apply these ideas to an archetypal example, the spatiotemporal chaotic 1D complex Ginzburg-Landau equation in the defect chaos regime. Building on the ideas of Ruelle and of Cross and Hohenberg that a spatiotemporal chaotic system can be considered a collection of weakly interacting dynamical units of a characteristic size, the chaotic length scale, we identify underlying, mesoscale, chaotic units and effective interaction potentials between them. We find that the resulting equilibrium Takahashi model accurately predicts distributions of particle numbers. These results suggest the intriguing possibility that a class of far-from-equilibrium systems may be well described at coarse-grained scales by the well-established theory of equilibrium statistical mechanics.

  19. Finding equilibrium in the spatiotemporal chaos of the complex Ginzburg-Landau equation.

    PubMed

    Ballard, Christopher C; Esty, C Clark; Egolf, David A

    2016-11-01

    Equilibrium statistical mechanics allows the prediction of collective behaviors of large numbers of interacting objects from just a few system-wide properties; however, a similar theory does not exist for far-from-equilibrium systems exhibiting complex spatial and temporal behavior. We propose a method for predicting behaviors in a broad class of such systems and apply these ideas to an archetypal example, the spatiotemporal chaotic 1D complex Ginzburg-Landau equation in the defect chaos regime. Building on the ideas of Ruelle and of Cross and Hohenberg that a spatiotemporal chaotic system can be considered a collection of weakly interacting dynamical units of a characteristic size, the chaotic length scale, we identify underlying, mesoscale, chaotic units and effective interaction potentials between them. We find that the resulting equilibrium Takahashi model accurately predicts distributions of particle numbers. These results suggest the intriguing possibility that a class of far-from-equilibrium systems may be well described at coarse-grained scales by the well-established theory of equilibrium statistical mechanics.

  20. Spectral statistics and scattering resonances of complex primes arrays

    NASA Astrophysics Data System (ADS)

    Wang, Ren; Pinheiro, Felipe A.; Dal Negro, Luca

    2018-01-01

    We introduce a class of aperiodic arrays of electric dipoles generated from the distribution of prime numbers in complex quadratic fields (Eisenstein and Gaussian primes) as well as quaternion primes (Hurwitz and Lifschitz primes), and study the nature of their scattering resonances using the vectorial Green's matrix method. In these systems we demonstrate several distinctive spectral properties, such as the absence of level repulsion in the strongly scattering regime, critical statistics of level spacings, and the existence of critical modes, which are extended fractal modes with long lifetimes not supported by either random or periodic systems. Moreover, we show that one can predict important physical properties, such as the existence spectral gaps, by analyzing the eigenvalue distribution of the Green's matrix of the arrays in the complex plane. Our results unveil the importance of aperiodic correlations in prime number arrays for the engineering of gapped photonic media that support far richer mode localization and spectral properties compared to usual periodic and random media.

  1. Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods

    PubMed Central

    2012-01-01

    High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods. Reviewers This article was reviewed by Arcady Mushegian, Byung-Soo Kim and Joel Bader. PMID:23227854

  2. Spectroscopic studies on Solvatochromism of mixed-chelate copper(II) complexes using MLR technique

    NASA Astrophysics Data System (ADS)

    Golchoubian, Hamid; Moayyedi, Golasa; Fazilati, Hakimeh

    2012-01-01

    Mixed-chelate copper(II) complexes with a general formula [Cu(acac)(diamine)]X where acac = acetylacetonate ion, diamine = N,N-dimethyl,N'-benzyl-1,2-diaminoethane and X = BPh 4-, PF 6-, ClO 4- and BF 4- have been prepared. The complexes were characterized on the basis of elemental analysis, molar conductance, UV-vis and IR spectroscopies. The complexes are solvatochromic and their solvatochromism were investigated by visible spectroscopy. All complexes demonstrated the positive solvatochromism and among the complexes [Cu(acac)(diamine)]BPh 4·H 2O showed the highest Δ νmax value. To explore the mechanism of interaction between solvent molecules and the complexes, different solvent parameters such as DN, AN, α and β using multiple linear regression (MLR) method were employed. The statistical results suggested that the DN parameter of the solvent plays a dominate contribution to the shift of the d-d absorption band of the complexes.

  3. Origins and Elaboration of the National Health Accounts, 1926-2006

    PubMed Central

    Fetter, Bruce

    2006-01-01

    The National Health Statistics Group (NHSG) has managed to keep the national health accounts (NHA) apolitical and highly respected. NHSG strategies have included the careful acquisition and presentation of statistics relating to health costs and payers; the use of scholarly journals to disseminate ideas to other government offices and, beyond them, to industry, labor, the professions, and universities; and the promotion of cooperation with related U.S., statistical agencies, provider groups, contractors, and international organizations. Responding to an increasingly complex system of third-party payers in the U.S. health system and controversies over methods, the NHA has continually evolved to meet the demands of health care decisionmakers. Historically, these dialogues have forced health accountants to refine their methods to ensure that their portrayal of spending and financing trends presents information that can inform the decisionmaking process in a non-partisan way. PMID:17290668

  4. Robust Statistical Detection of Power-Law Cross-Correlation.

    PubMed

    Blythe, Duncan A J; Nikulin, Vadim V; Müller, Klaus-Robert

    2016-06-02

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.

  5. Robust Statistical Detection of Power-Law Cross-Correlation

    PubMed Central

    Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert

    2016-01-01

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram. PMID:27250630

  6. Preparing and Presenting Effective Research Posters

    PubMed Central

    Miller, Jane E

    2007-01-01

    Objectives Posters are a common way to present results of a statistical analysis, program evaluation, or other project at professional conferences. Often, researchers fail to recognize the unique nature of the format, which is a hybrid of a published paper and an oral presentation. This methods note demonstrates how to design research posters to convey study objectives, methods, findings, and implications effectively to varied professional audiences. Methods A review of existing literature on research communication and poster design is used to identify and demonstrate important considerations for poster content and layout. Guidelines on how to write about statistical methods, results, and statistical significance are illustrated with samples of ineffective writing annotated to point out weaknesses, accompanied by concrete examples and explanations of improved presentation. A comparison of the content and format of papers, speeches, and posters is also provided. Findings Each component of a research poster about a quantitative analysis should be adapted to the audience and format, with complex statistical results translated into simplified charts, tables, and bulleted text to convey findings as part of a clear, focused story line. Conclusions Effective research posters should be designed around two or three key findings with accompanying handouts and narrative description to supply additional technical detail and encourage dialog with poster viewers. PMID:17355594

  7. Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

    DTIC Science & Technology

    2015-09-12

    AFRL-AFOSR-VA-TR-2015-0278 DERIVATIVE FREE OPTIMIZATION OF COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS Katya Scheinberg...COMPLEX SYSTEMS WITH THE USE OF STATISTICAL MACHINE LEARNING MODELS 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA9550-11-1-0239 5c.  PROGRAM ELEMENT...developed, which has been the focus of our research. 15. SUBJECT TERMS optimization, Derivative-Free Optimization, Statistical Machine Learning 16. SECURITY

  8. [Evaluation of using statistical methods in selected national medical journals].

    PubMed

    Sych, Z

    1996-01-01

    The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  9. A comparison of spectral magnitude and phase-locking value analyses of the frequency-following response to complex tones

    PubMed Central

    Zhu, Li; Bharadwaj, Hari; Xia, Jing; Shinn-Cunningham, Barbara

    2013-01-01

    Two experiments, both presenting diotic, harmonic tone complexes (100 Hz fundamental), were conducted to explore the envelope-related component of the frequency-following response (FFRENV), a measure of synchronous, subcortical neural activity evoked by a periodic acoustic input. Experiment 1 directly compared two common analysis methods, computing the magnitude spectrum and the phase-locking value (PLV). Bootstrapping identified which FFRENV frequency components were statistically above the noise floor for each metric and quantified the statistical power of the approaches. Across listeners and conditions, the two methods produced highly correlated results. However, PLV analysis required fewer processing stages to produce readily interpretable results. Moreover, at the fundamental frequency of the input, PLVs were farther above the metric's noise floor than spectral magnitudes. Having established the advantages of PLV analysis, the efficacy of the approach was further demonstrated by investigating how different acoustic frequencies contribute to FFRENV, analyzing responses to complex tones composed of different acoustic harmonics of 100 Hz (Experiment 2). Results show that the FFRENV response is dominated by peripheral auditory channels responding to unresolved harmonics, although low-frequency channels driven by resolved harmonics also contribute. These results demonstrate the utility of the PLV for quantifying the strength of FFRENV across conditions. PMID:23862815

  10. A survey of design methods for failure detection in dynamic systems

    NASA Technical Reports Server (NTRS)

    Willsky, A. S.

    1975-01-01

    A number of methods for the detection of abrupt changes (such as failures) in stochastic dynamical systems were surveyed. The class of linear systems were emphasized, but the basic concepts, if not the detailed analyses, carry over to other classes of systems. The methods surveyed range from the design of specific failure-sensitive filters, to the use of statistical tests on filter innovations, to the development of jump process formulations. Tradeoffs in complexity versus performance are discussed.

  11. Research in Computational Astrobiology

    NASA Technical Reports Server (NTRS)

    Chaban, Galina; Colombano, Silvano; Scargle, Jeff; New, Michael H.; Pohorille, Andrew; Wilson, Michael A.

    2003-01-01

    We report on several projects in the field of computational astrobiology, which is devoted to advancing our understanding of the origin, evolution and distribution of life in the Universe using theoretical and computational tools. Research projects included modifying existing computer simulation codes to use efficient, multiple time step algorithms, statistical methods for analysis of astrophysical data via optimal partitioning methods, electronic structure calculations on water-nuclei acid complexes, incorporation of structural information into genomic sequence analysis methods and calculations of shock-induced formation of polycylic aromatic hydrocarbon compounds.

  12. Alternative methods of phylogenetic inference for the Patagonian lizard group Liolaemus elongatus-kriegi (Iguania: Liolaemini) based on mitochondrial and nuclear markers.

    PubMed

    Medina, Cintia Débora; Avila, Luciano Javier; Sites, Jack Walter; Santos, Juan; Morando, Mariana

    2018-03-01

    We present different approaches to a multi-locus phylogeny for the Liolaemus elongatus-kriegi group, including almost all species and recognized lineages. We sequenced two mitochondrial and five nuclear gene regions for 123 individuals from 35 taxa, and compared relationships resolved from concatenated and species tree methods. The L. elongatus-kriegi group was inferred as monophyletic in three of the five analyses (concatenated mitochondrial, concatenated mitochondrial + nuclear gene trees, and SVD quartet species tree). The mitochondrial gene tree resolved four haploclades, three corresponding to the previously recognized complexes: L. elongatus, L. kriegi and L. petrophilus complexes, and the L. punmahuida group. The BEAST species tree approach included the L. punmahuida group within the L. kriegi complex, but the SVD quartet method placed it as sister to the L. elongatus-kriegi group. BEAST inferred species of the L. elongatus and L. petrophilus complexes as one clade, while SVDquartet inferred these two complexes as monophyletic (although with no statistical support for the L. petrophilus complex). The species tree approach also included the L. punmahuida group as part of the L. elongatus-kriegi group. Our study provides detailed multilocus phylogenetic hypotheses for the L. elongatus-kriegi group, and we discuss possible reasons for differences in the concatenation and species tree methods. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis.

    PubMed

    Dietrich, Stefan; Floegel, Anna; Troll, Martina; Kühn, Tilman; Rathmann, Wolfgang; Peters, Anette; Sookthai, Disorn; von Bergen, Martin; Kaaks, Rudolf; Adamski, Jerzy; Prehn, Cornelia; Boeing, Heiner; Schulze, Matthias B; Illig, Thomas; Pischon, Tobias; Knüppel, Sven; Wang-Sattler, Rui; Drogan, Dagmar

    2016-10-01

    The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues. Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided. The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites. The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data. © The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.

  14. Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions

    NASA Astrophysics Data System (ADS)

    Chen, Nan; Majda, Andrew J.

    2018-02-01

    Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.

  15. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  16. Methods and means of Fourier-Stokes polarimetry and the spatial frequency filtering of phase anisotropy manifestations

    NASA Astrophysics Data System (ADS)

    Novakovskaya, O. Yu.; Ushenko, A. G.; Dubolazov, A. V.; Ushenko, V. A.; Ushenko, Yu. A.; Sakhnovskiy, M. Yu.; Soltys, I. V.; Zhytaryuk, V. H.; Olar, O. V.; Sidor, M.; Gorsky, M. P.

    2016-12-01

    The theoretical background of azimuthally stable method of Jones-matrix mapping of histological sections of biopsy of myocardium tissue on the basis of spatial frequency selection of the mechanisms of linear and circular birefringence is presented. The diagnostic application of a new correlation parameter - complex degree of mutual anisotropy - is analytically substantiated. The method of measuring coordinate distributions of complex degree of mutual anisotropy with further spatial filtration of their high- and low-frequency components is developed. The interconnections of such distributions with parameters of linear and circular birefringence of myocardium tissue histological sections are found. The comparative results of measuring the coordinate distributions of complex degree of mutual anisotropy formed by fibrillar networks of myosin fibrils of myocardium tissue of different necrotic states - dead due to coronary heart disease and acute coronary insufficiency are shown. The values and ranges of change of the statistical (moments of the 1st - 4th order) parameters of complex degree of mutual anisotropy coordinate distributions are studied. The objective criteria of differentiation of cause of death are determined.

  17. Modelling gene expression profiles related to prostate tumor progression using binary states

    PubMed Central

    2013-01-01

    Background Cancer is a complex disease commonly characterized by the disrupted activity of several cancer-related genes such as oncogenes and tumor-suppressor genes. Previous studies suggest that the process of tumor progression to malignancy is dynamic and can be traced by changes in gene expression. Despite the enormous efforts made for differential expression detection and biomarker discovery, few methods have been designed to model the gene expression level to tumor stage during malignancy progression. Such models could help us understand the dynamics and simplify or reveal the complexity of tumor progression. Methods We have modeled an on-off state of gene activation per sample then per stage to select gene expression profiles associated to tumor progression. The selection is guided by statistical significance of profiles based on random permutated datasets. Results We show that our method identifies expected profiles corresponding to oncogenes and tumor suppressor genes in a prostate tumor progression dataset. Comparisons with other methods support our findings and indicate that a considerable proportion of significant profiles is not found by other statistical tests commonly used to detect differential expression between tumor stages nor found by other tailored methods. Ontology and pathway analysis concurred with these findings. Conclusions Results suggest that our methodology may be a valuable tool to study tumor malignancy progression, which might reveal novel cancer therapies. PMID:23721350

  18. The fundamental problem of treating light incoherence in photovoltaics and its practical consequences

    NASA Astrophysics Data System (ADS)

    Herman, Aline; Sarrazin, Michaël; Deparis, Olivier

    2014-01-01

    The incoherence of sunlight has long been suspected to have an impact on solar cell energy conversion efficiency, although the extent of this is unclear. Existing computational methods used to optimize solar cell efficiency under incoherent light are based on multiple time-consuming runs and statistical averaging. These indirect methods show limitations related to the complexity of the solar cell structure. As a consequence, complex corrugated cells, which exploit light trapping for enhancing the efficiency, have not yet been accessible for optimization under incoherent light. To overcome this bottleneck, we developed an original direct method which has the key advantage that the treatment of incoherence can be totally decoupled from the complexity of the cell. As an illustration, surface-corrugated GaAs and c-Si thin-films are considered. The spectrally integrated absorption in these devices is found to depend strongly on the degree of light coherence and, accordingly, the maximum achievable photocurrent can be higher under incoherent light than under coherent light. These results show the importance of taking into account sunlight incoherence in solar cell optimization and point out the ability of our direct method to deal with complex solar cell structures.

  19. Model Uncertainty Quantification Methods In Data Assimilation

    NASA Astrophysics Data System (ADS)

    Pathiraja, S. D.; Marshall, L. A.; Sharma, A.; Moradkhani, H.

    2017-12-01

    Data Assimilation involves utilising observations to improve model predictions in a seamless and statistically optimal fashion. Its applications are wide-ranging; from improving weather forecasts to tracking targets such as in the Apollo 11 mission. The use of Data Assimilation methods in high dimensional complex geophysical systems is an active area of research, where there exists many opportunities to enhance existing methodologies. One of the central challenges is in model uncertainty quantification; the outcome of any Data Assimilation study is strongly dependent on the uncertainties assigned to both observations and models. I focus on developing improved model uncertainty quantification methods that are applicable to challenging real world scenarios. These include developing methods for cases where the system states are only partially observed, where there is little prior knowledge of the model errors, and where the model error statistics are likely to be highly non-Gaussian.

  20. Multiscale multifractal DCCA and complexity behaviors of return intervals for Potts price model

    NASA Astrophysics Data System (ADS)

    Wang, Jie; Wang, Jun; Stanley, H. Eugene

    2018-02-01

    To investigate the characteristics of extreme events in financial markets and the corresponding return intervals among these events, we use a Potts dynamic system to construct a random financial time series model of the attitudes of market traders. We use multiscale multifractal detrended cross-correlation analysis (MM-DCCA) and Lempel-Ziv complexity (LZC) perform numerical research of the return intervals for two significant China's stock market indices and for the proposed model. The new MM-DCCA method is based on the Hurst surface and provides more interpretable cross-correlations of the dynamic mechanism between different return interval series. We scale the LZC method with different exponents to illustrate the complexity of return intervals in different scales. Empirical studies indicate that the proposed return intervals from the Potts system and the real stock market indices hold similar statistical properties.

  1. Putting engineering back into protein engineering: bioinformatic approaches to catalyst design.

    PubMed

    Gustafsson, Claes; Govindarajan, Sridhar; Minshull, Jeremy

    2003-08-01

    Complex multivariate engineering problems are commonplace and not unique to protein engineering. Mathematical and data-mining tools developed in other fields of engineering have now been applied to analyze sequence-activity relationships of peptides and proteins and to assist in the design of proteins and peptides with specified properties. Decreasing costs of DNA sequencing in conjunction with methods to quickly synthesize statistically representative sets of proteins allow modern heuristic statistics to be applied to protein engineering. This provides an alternative approach to expensive assays or unreliable high-throughput surrogate screens.

  2. An attribute-driven statistics generator for use in a G.I.S. environment

    NASA Technical Reports Server (NTRS)

    Thomas, R. W.; Ritter, P. R.; Kaugars, A.

    1984-01-01

    When performing research using digital geographic information it is often useful to produce quantitative characterizations of the data, usually within some constraints. In the research environment the different combinations of required data and constraints can often become quite complex. This paper describes a technique that gives the researcher a powerful and flexible way to set up many possible combinations of data and constraints without having to perform numerous intermediate steps or create temporary data bands. This method provides an efficient way to produce descriptive statistics in such situations.

  3. Comment on 'Imaging of prompt gamma rays emitted during delivery of clinical proton beams with a Compton camera: feasibility studies for range verification'.

    PubMed

    Sitek, Arkadiusz

    2016-12-21

    The origin ensemble (OE) algorithm is a new method used for image reconstruction from nuclear tomographic data. The main advantage of this algorithm is the ease of implementation for complex tomographic models and the sound statistical theory. In this comment, the author provides the basics of the statistical interpretation of OE and gives suggestions for the improvement of the algorithm in the application to prompt gamma imaging as described in Polf et al (2015 Phys. Med. Biol. 60 7085).

  4. Comment on ‘Imaging of prompt gamma rays emitted during delivery of clinical proton beams with a Compton camera: feasibility studies for range verification’

    NASA Astrophysics Data System (ADS)

    Sitek, Arkadiusz

    2016-12-01

    The origin ensemble (OE) algorithm is a new method used for image reconstruction from nuclear tomographic data. The main advantage of this algorithm is the ease of implementation for complex tomographic models and the sound statistical theory. In this comment, the author provides the basics of the statistical interpretation of OE and gives suggestions for the improvement of the algorithm in the application to prompt gamma imaging as described in Polf et al (2015 Phys. Med. Biol. 60 7085).

  5. A Ricin Forensic Profiling Approach Based on a Complex Set of Biomarkers

    DOE PAGES

    Fredriksson, Sten-Ake; Wunschel, David S.; Lindstrom, Susanne Wiklund; ...

    2018-03-28

    A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1 – PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods andmore » robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved.« less

  6. A Ricin Forensic Profiling Approach Based on a Complex Set of Biomarkers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fredriksson, Sten-Ake; Wunschel, David S.; Lindstrom, Susanne Wiklund

    A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1 – PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods andmore » robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved.« less

  7. Statistical learning and selective inference.

    PubMed

    Taylor, Jonathan; Tibshirani, Robert J

    2015-06-23

    We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.

  8. Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC

    PubMed Central

    Templeton, Alan R.

    2009-01-01

    Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182

  9. The Complexity of Solar and Geomagnetic Indices

    NASA Astrophysics Data System (ADS)

    Pesnell, W. Dean

    2017-08-01

    How far in advance can the sunspot number be predicted with any degree of confidence? Solar cycle predictions are needed to plan long-term space missions. Fleets of satellites circle the Earth collecting science data, protecting astronauts, and relaying information. All of these satellites are sensitive at some level to solar cycle effects. Statistical and timeseries analyses of the sunspot number are often used to predict solar activity. These methods have not been completely successful as the solar dynamo changes over time and one cycle's sunspots are not a faithful predictor of the next cycle's activity. In some ways, using these techniques is similar to asking whether the stock market can be predicted. It has been shown that the Dow Jones Industrial Average (DJIA) can be more accurately predicted during periods when it obeys certain statistical properties than at other times. The Hurst exponent is one such way to partition the data. Another measure of the complexity of a timeseries is the fractal dimension. We can use these measures of complexity to compare the sunspot number with other solar and geomagnetic indices. Our concentration is on how trends are removed by the various techniques, either internally or externally. Comparisons of the statistical properties of the various solar indices may guide us in understanding how the dynamo manifests in the various indices and the Sun.

  10. Distinguishing signatures of determinism and stochasticity in spiking complex systems

    PubMed Central

    Aragoneses, Andrés; Rubido, Nicolás; Tiana-Alsina, Jordi; Torrent, M. C.; Masoller, Cristina

    2013-01-01

    We describe a method to infer signatures of determinism and stochasticity in the sequence of apparently random intensity dropouts emitted by a semiconductor laser with optical feedback. The method uses ordinal time-series analysis to classify experimental data of inter-dropout-intervals (IDIs) in two categories that display statistically significant different features. Despite the apparent randomness of the dropout events, one IDI category is consistent with waiting times in a resting state until noise triggers a dropout, and the other is consistent with dropouts occurring during the return to the resting state, which have a clear deterministic component. The method we describe can be a powerful tool for inferring signatures of determinism in the dynamics of complex systems in noisy environments, at an event-level description of their dynamics.

  11. Parameterization of light absorption by components of seawater in optically complex coastal waters of the Crimea Peninsula (Black Sea).

    PubMed

    Dmitriev, Egor V; Khomenko, Georges; Chami, Malik; Sokolov, Anton A; Churilova, Tatyana Y; Korotaev, Gennady K

    2009-03-01

    The absorption of sunlight by oceanic constituents significantly contributes to the spectral distribution of the water-leaving radiance. Here it is shown that current parameterizations of absorption coefficients do not apply to the optically complex waters of the Crimea Peninsula. Based on in situ measurements, parameterizations of phytoplankton, nonalgal, and total particulate absorption coefficients are proposed. Their performance is evaluated using a log-log regression combined with a low-pass filter and the nonlinear least-square method. Statistical significance of the estimated parameters is verified using the bootstrap method. The parameterizations are relevant for chlorophyll a concentrations ranging from 0.45 up to 2 mg/m(3).

  12. Fractal mechanisms in the electrophysiology of the heart

    NASA Technical Reports Server (NTRS)

    Goldberger, A. L.

    1992-01-01

    The mathematical concept of fractals provides insights into complex anatomic branching structures that lack a characteristic (single) length scale, and certain complex physiologic processes, such as heart rate regulation, that lack a single time scale. Heart rate control is perturbed by alterations in neuro-autonomic function in a number of important clinical syndromes, including sudden cardiac death, congestive failure, cocaine intoxication, fetal distress, space sickness and physiologic aging. These conditions are associated with a loss of the normal fractal complexity of interbeat interval dynamics. Such changes, which may not be detectable using conventional statistics, can be quantified using new methods derived from "chaos theory.".

  13. Wave Propagation in Non-Stationary Statistical Mantle Models at the Global Scale

    NASA Astrophysics Data System (ADS)

    Meschede, M.; Romanowicz, B. A.

    2014-12-01

    We study the effect of statistically distributed heterogeneities that are smaller than the resolution of current tomographic models on seismic waves that propagate through the Earth's mantle at teleseismic distances. Current global tomographic models are missing small-scale structure as evidenced by the failure of even accurate numerical synthetics to explain enhanced coda in observed body and surface waveforms. One way to characterize small scale heterogeneity is to construct random models and confront observed coda waveforms with predictions from these models. Statistical studies of the coda typically rely on models with simplified isotropic and stationary correlation functions in Cartesian geometries. We show how to construct more complex random models for the mantle that can account for arbitrary non-stationary and anisotropic correlation functions as well as for complex geometries. Although this method is computationally heavy, model characteristics such as translational, cylindrical or spherical symmetries can be used to greatly reduce the complexity such that this method becomes practical. With this approach, we can create 3D models of the full spherical Earth that can be radially anisotropic, i.e. with different horizontal and radial correlation functions, and radially non-stationary, i.e. with radially varying model power and correlation functions. Both of these features are crucial for a statistical description of the mantle in which structure depends to first order on the spherical geometry of the Earth. We combine different random model realizations of S velocity with current global tomographic models that are robust at long wavelengths (e.g. Meschede and Romanowicz, 2014, GJI submitted), and compute the effects of these hybrid models on the wavefield with a spectral element code (SPECFEM3D_GLOBE). We finally analyze the resulting coda waves for our model selection and compare our computations with observations. Based on these observations, we make predictions about the strength of unresolved small-scale structure and extrinsic attenuation.

  14. Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples

    PubMed Central

    Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

    2017-01-01

    Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell. DOI: http://dx.doi.org/10.7554/eLife.26580.001 PMID:28678007

  15. Arkansas StreamStats: a U.S. Geological Survey web map application for basin characteristics and streamflow statistics

    USGS Publications Warehouse

    Pugh, Aaron L.

    2014-01-01

    Users of streamflow information often require streamflow statistics and basin characteristics at various locations along a stream. The USGS periodically calculates and publishes streamflow statistics and basin characteristics for streamflowgaging stations and partial-record stations, but these data commonly are scattered among many reports that may or may not be readily available to the public. The USGS also provides and periodically updates regional analyses of streamflow statistics that include regression equations and other prediction methods for estimating statistics for ungaged and unregulated streams across the State. Use of these regional predictions for a stream can be complex and often requires the user to determine a number of basin characteristics that may require interpretation. Basin characteristics may include drainage area, classifiers for physical properties, climatic characteristics, and other inputs. Obtaining these input values for gaged and ungaged locations traditionally has been time consuming, subjective, and can lead to inconsistent results.

  16. Defining the best quality-control systems by design and inspection.

    PubMed

    Hinckley, C M

    1997-05-01

    Not all of the many approaches to quality control are equally effective. Nonconformities in laboratory testing are caused basically by excessive process variation and mistakes. Statistical quality control can effectively control process variation, but it cannot detect or prevent most mistakes. Because mistakes or blunders are frequently the dominant source of nonconformities, we conclude that statistical quality control by itself is not effective. I explore the 100% inspection methods essential for controlling mistakes. Unlike the inspection techniques that Deming described as ineffective, the new "source" inspection methods can detect mistakes and enable corrections before nonconformities are generated, achieving the highest degree of quality at a fraction of the cost of traditional methods. Key relationships between task complexity and nonconformity rates are also described, along with cultural changes that are essential for implementing the best quality-control practices.

  17. Can't Count or Won't Count? Embedding Quantitative Methods in Substantive Sociology Curricula: A Quasi-Experiment.

    PubMed

    Williams, Malcolm; Sloan, Luke; Cheung, Sin Yi; Sutton, Carole; Stevens, Sebastian; Runham, Libby

    2016-06-01

    This paper reports on a quasi-experiment in which quantitative methods (QM) are embedded within a substantive sociology module. Through measuring student attitudes before and after the intervention alongside control group comparisons, we illustrate the impact that embedding has on the student experience. Our findings are complex and even contradictory. Whilst the experimental group were less likely to be distrustful of statistics and appreciate how QM inform social research, they were also less confident about their statistical abilities, suggesting that through 'doing' quantitative sociology the experimental group are exposed to the intricacies of method and their optimism about their own abilities is challenged. We conclude that embedding QM in a single substantive module is not a 'magic bullet' and that a wider programme of content and assessment diversification across the curriculum is preferential.

  18. Multivariate Statistical Analysis of Orthogonal Mass Spectral Data for the Identification of Chemical Attribution Signatures of 3-Methylfentanyl

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mayer, B. P.; Valdez, C. A.; DeHope, A. J.

    Critical to many modern forensic investigations is the chemical attribution of the origin of an illegal drug. This process greatly relies on identification of compounds indicative of its clandestine or commercial production. The results of these studies can yield detailed information on method of manufacture, sophistication of the synthesis operation, starting material source, and final product. In the present work, chemical attribution signatures (CAS) associated with the synthesis of the analgesic 3- methylfentanyl, N-(3-methyl-1-phenethylpiperidin-4-yl)-N-phenylpropanamide, were investigated. Six synthesis methods were studied in an effort to identify and classify route-specific signatures. These methods were chosen to minimize the use of scheduledmore » precursors, complicated laboratory equipment, number of overall steps, and demanding reaction conditions. Using gas and liquid chromatographies combined with mass spectrometric methods (GC-QTOF and LC-QTOF) in conjunction with inductivelycoupled plasma mass spectrometry (ICP-MS), over 240 distinct compounds and elements were monitored. As seen in our previous work with CAS of fentanyl synthesis the complexity of the resultant data matrix necessitated the use of multivariate statistical analysis. Using partial least squares discriminant analysis (PLS-DA), 62 statistically significant, route-specific CAS were identified. Statistical classification models using a variety of machine learning techniques were then developed with the ability to predict the method of 3-methylfentanyl synthesis from three blind crude samples generated by synthetic chemists without prior experience with these methods.« less

  19. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    PubMed

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  20. Interference in the classical probabilistic model and its representation in complex Hilbert space

    NASA Astrophysics Data System (ADS)

    Khrennikov, Andrei Yu.

    2005-10-01

    The notion of a context (complex of physical conditions, that is to say: specification of the measurement setup) is basic in this paper.We show that the main structures of quantum theory (interference of probabilities, Born's rule, complex probabilistic amplitudes, Hilbert state space, representation of observables by operators) are present already in a latent form in the classical Kolmogorov probability model. However, this model should be considered as a calculus of contextual probabilities. In our approach it is forbidden to consider abstract context independent probabilities: “first context and only then probability”. We construct the representation of the general contextual probabilistic dynamics in the complex Hilbert space. Thus dynamics of the wave function (in particular, Schrödinger's dynamics) can be considered as Hilbert space projections of a realistic dynamics in a “prespace”. The basic condition for representing of the prespace-dynamics is the law of statistical conservation of energy-conservation of probabilities. In general the Hilbert space projection of the “prespace” dynamics can be nonlinear and even irreversible (but it is always unitary). Methods developed in this paper can be applied not only to quantum mechanics, but also to classical statistical mechanics. The main quantum-like structures (e.g., interference of probabilities) might be found in some models of classical statistical mechanics. Quantum-like probabilistic behavior can be demonstrated by biological systems. In particular, it was recently found in some psychological experiments.

  1. Characterizing and locating air pollution sources in a complex industrial district using optical remote sensing technology and multivariate statistical modeling.

    PubMed

    Chang, Pao-Erh Paul; Yang, Jen-Chih Rena; Den, Walter; Wu, Chang-Fu

    2014-09-01

    Emissions of volatile organic compounds (VOCs) are most frequent environmental nuisance complaints in urban areas, especially where industrial districts are nearby. Unfortunately, identifying the responsible emission sources of VOCs is essentially a difficult task. In this study, we proposed a dynamic approach to gradually confine the location of potential VOC emission sources in an industrial complex, by combining multi-path open-path Fourier transform infrared spectrometry (OP-FTIR) measurement and the statistical method of principal component analysis (PCA). Close-cell FTIR was further used to verify the VOC emission source by measuring emitted VOCs from selected exhaust stacks at factories in the confined areas. Multiple open-path monitoring lines were deployed during a 3-month monitoring campaign in a complex industrial district. The emission patterns were identified and locations of emissions were confined by the wind data collected simultaneously. N,N-Dimethyl formamide (DMF), 2-butanone, toluene, and ethyl acetate with mean concentrations of 80.0 ± 1.8, 34.5 ± 0.8, 103.7 ± 2.8, and 26.6 ± 0.7 ppbv, respectively, were identified as the major VOC mixture at all times of the day around the receptor site. As the toxic air pollutant, the concentrations of DMF in air samples were found exceeding the ambient standard despite the path-average effect of OP-FTIR upon concentration levels. The PCA data identified three major emission sources, including PU coating, chemical packaging, and lithographic printing industries. Applying instrumental measurement and statistical modeling, this study has established a systematic approach for locating emission sources. Statistical modeling (PCA) plays an important role in reducing dimensionality of a large measured dataset and identifying underlying emission sources. Instrumental measurement, however, helps verify the outcomes of the statistical modeling. The field study has demonstrated the feasibility of using multi-path OP-FTIR measurement. The wind data incorporating with the statistical modeling (PCA) may successfully identify the major emission source in a complex industrial district.

  2. Statistical assessment on a combined analysis of GRYN-ROMN-UCBN upland vegetation vital signs

    USGS Publications Warehouse

    Irvine, Kathryn M.; Rodhouse, Thomas J.

    2014-01-01

    As of 2013, Rocky Mountain and Upper Columbia Basin Inventory and Monitoring Networks have multiple years of vegetation data and Greater Yellowstone Network has three years of vegetation data and monitoring is ongoing in all three networks. Our primary objective is to assess whether a combined analysis of these data aimed at exploring correlations with climate and weather data is feasible. We summarize the core survey design elements across protocols and point out the major statistical challenges for a combined analysis at present. The dissimilarity in response designs between ROMN and UCBN-GRYN network protocols presents a statistical challenge that has not been resolved yet. However, the UCBN and GRYN data are compatible as they implement a similar response design; therefore, a combined analysis is feasible and will be pursued in future. When data collected by different networks are combined, the survey design describing the merged dataset is (likely) a complex survey design. A complex survey design is the result of combining datasets from different sampling designs. A complex survey design is characterized by unequal probability sampling, varying stratification, and clustering (see Lohr 2010 Chapter 7 for general overview). Statistical analysis of complex survey data requires modifications to standard methods, one of which is to include survey design weights within a statistical model. We focus on this issue for a combined analysis of upland vegetation from these networks, leaving other topics for future research. We conduct a simulation study on the possible effects of equal versus unequal probability selection of points on parameter estimates of temporal trend using available packages within the R statistical computing package. We find that, as written, using lmer or lm for trend detection in a continuous response and clm and clmm for visually estimated cover classes with “raw” GRTS design weights specified for the weight argument leads to substantially different results and/or computational instability. However, when only fixed effects are of interest, the survey package (svyglm and svyolr) may be suitable for a model-assisted analysis for trend. We provide possible directions for future research into combined analysis for ordinal and continuous vital sign indictors.

  3. Statistical Features of Complex Systems ---Toward Establishing Sociological Physics---

    NASA Astrophysics Data System (ADS)

    Kobayashi, Naoki; Kuninaka, Hiroto; Wakita, Jun-ichi; Matsushita, Mitsugu

    2011-07-01

    Complex systems have recently attracted much attention, both in natural sciences and in sociological sciences. Members constituting a complex system evolve through nonlinear interactions among each other. This means that in a complex system the multiplicative experience or, so to speak, the history of each member produces its present characteristics. If attention is paid to any statistical property in any complex system, the lognormal distribution is the most natural and appropriate among the standard or ``normal'' statistics to overview the whole system. In fact, the lognormality emerges rather conspicuously when we examine, as familiar and typical examples of statistical aspects in complex systems, the nursing-care period for the aged, populations of prefectures and municipalities, and our body height and weight. Many other examples are found in nature and society. On the basis of these observations, we discuss the possibility of sociological physics.

  4. Machine Learning for Detecting Gene-Gene Interactions

    PubMed Central

    McKinney, Brett A.; Reif, David M.; Ritchie, Marylyn D.; Moore, Jason H.

    2011-01-01

    Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics. PMID:16722772

  5. Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

    PubMed

    Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon

    2015-01-01

    Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.

  6. Complex Langevin analysis of the spontaneous symmetry breaking in dimensionally reduced super Yang-Mills models

    NASA Astrophysics Data System (ADS)

    Anagnostopoulos, Konstantinos N.; Azuma, Takehiro; Ito, Yuta; Nishimura, Jun; Papadoudis, Stratos Kovalkov

    2018-02-01

    In recent years the complex Langevin method (CLM) has proven a powerful method in studying statistical systems which suffer from the sign problem. Here we show that it can also be applied to an important problem concerning why we live in four-dimensional spacetime. Our target system is the type IIB matrix model, which is conjectured to be a nonperturbative definition of type IIB superstring theory in ten dimensions. The fermion determinant of the model becomes complex upon Euclideanization, which causes a severe sign problem in its Monte Carlo studies. It is speculated that the phase of the fermion determinant actually induces the spontaneous breaking of the SO(10) rotational symmetry, which has direct consequences on the aforementioned question. In this paper, we apply the CLM to the 6D version of the type IIB matrix model and show clear evidence that the SO(6) symmetry is broken down to SO(3). Our results are consistent with those obtained previously by the Gaussian expansion method.

  7. Strategies for Reduced-Order Models in Uncertainty Quantification of Complex Turbulent Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Qi, Di

    Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.

  8. Extracting and Converting Quantitative Data into Human Error Probabilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tuan Q. Tran; Ronald L. Boring; Jeffrey C. Joe

    2007-08-01

    This paper discusses a proposed method using a combination of advanced statistical approaches (e.g., meta-analysis, regression, structural equation modeling) that will not only convert different empirical results into a common metric for scaling individual PSFs effects, but will also examine the complex interrelationships among PSFs. Furthermore, the paper discusses how the derived statistical estimates (i.e., effect sizes) can be mapped onto a HRA method (e.g. SPAR-H) to generate HEPs that can then be use in probabilistic risk assessment (PRA). The paper concludes with a discussion of the benefits of using academic literature in assisting HRA analysts in generating sound HEPsmore » and HRA developers in validating current HRA models and formulating new HRA models.« less

  9. Study/experimental/research design: much more than statistics.

    PubMed

    Knight, Kenneth L

    2010-01-01

    The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.

  10. Use of neural networks to model complex immunogenetic associations of disease: human leukocyte antigen impact on the progression of human immunodeficiency virus infection.

    PubMed

    Ioannidis, J P; McQueen, P G; Goedert, J J; Kaslow, R A

    1998-03-01

    Complex immunogenetic associations of disease involving a large number of gene products are difficult to evaluate with traditional statistical methods and may require complex modeling. The authors evaluated the performance of feed-forward backpropagation neural networks in predicting rapid progression to acquired immunodeficiency syndrome (AIDS) for patients with human immunodeficiency virus (HIV) infection on the basis of major histocompatibility complex variables. Networks were trained on data from patients from the Multicenter AIDS Cohort Study (n = 139) and then validated on patients from the DC Gay cohort (n = 102). The outcome of interest was rapid disease progression, defined as progression to AIDS in <6 years from seroconversion. Human leukocyte antigen (HLA) variables were selected as network inputs with multivariate regression and a previously described algorithm selecting markers with extreme point estimates for progression risk. Network performance was compared with that of logistic regression. Networks with 15 HLA inputs and a single hidden layer of five nodes achieved a sensitivity of 87.5% and specificity of 95.6% in the training set, vs. 77.0% and 76.9%, respectively, achieved by logistic regression. When validated on the DC Gay cohort, networks averaged a sensitivity of 59.1% and specificity of 74.3%, vs. 53.1% and 61.4%, respectively, for logistic regression. Neural networks offer further support to the notion that HIV disease progression may be dependent on complex interactions between different class I and class II alleles and transporters associated with antigen processing variants. The effect in the current models is of moderate magnitude, and more data as well as other host and pathogen variables may need to be considered to improve the performance of the models. Artificial intelligence methods may complement linear statistical methods for evaluating immunogenetic associations of disease.

  11. Polarization-correlation optical microscopy of anisotropic biological layers

    NASA Astrophysics Data System (ADS)

    Ushenko, A. G.; Dubolazov, A. V.; Ushenko, V. A.; Ushenko, Yu. A.; Sakhnovskiy, M. Y.; Balazyuk, V. N.; Khukhlina, O.; Viligorska, K.; Bykov, A.; Doronin, A.; Meglinski, I.

    2016-09-01

    The theoretical background of azimuthally stable method of Jones-matrix mapping of histological sections of biopsy of myocardium tissue on the basis of spatial frequency selection of the mechanisms of linear and circular birefringence is presented. The diagnostic application of a new correlation parameter - complex degree of mutual anisotropy - is analytically substantiated. The method of measuring coordinate distributions of complex degree of mutual anisotropy with further spatial filtration of their high- and low-frequency components is developed. The interconnections of such distributions with parameters of linear and circular birefringence of myocardium tissue histological sections are found. The comparative results of measuring the coordinate distributions of complex degree of mutual anisotropy formed by fibrillar networks of myosin fibrils of myocardium tissue of different necrotic states - dead due to coronary heart disease and acute coronary insufficiency are shown. The values and ranges of change of the statistical (moments of the 1st - 4th order) parameters of complex degree of mutual anisotropy coordinate distributions are studied. The objective criteria of differentiation of cause of death are determined.

  12. Across-cohort QC analyses of GWAS summary statistics from complex traits.

    PubMed

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2016-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

  13. Across-cohort QC analyses of GWAS summary statistics from complex traits

    PubMed Central

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2017-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965

  14. Permutation entropy and statistical complexity analysis of turbulence in laboratory plasmas and the solar wind.

    PubMed

    Weck, P J; Schaffner, D A; Brown, M R; Wicks, R T

    2015-02-01

    The Bandt-Pompe permutation entropy and the Jensen-Shannon statistical complexity are used to analyze fluctuating time series of three different turbulent plasmas: the magnetohydrodynamic (MHD) turbulence in the plasma wind tunnel of the Swarthmore Spheromak Experiment (SSX), drift-wave turbulence of ion saturation current fluctuations in the edge of the Large Plasma Device (LAPD), and fully developed turbulent magnetic fluctuations of the solar wind taken from the Wind spacecraft. The entropy and complexity values are presented as coordinates on the CH plane for comparison among the different plasma environments and other fluctuation models. The solar wind is found to have the highest permutation entropy and lowest statistical complexity of the three data sets analyzed. Both laboratory data sets have larger values of statistical complexity, suggesting that these systems have fewer degrees of freedom in their fluctuations, with SSX magnetic fluctuations having slightly less complexity than the LAPD edge I(sat). The CH plane coordinates are compared to the shape and distribution of a spectral decomposition of the wave forms. These results suggest that fully developed turbulence (solar wind) occupies the lower-right region of the CH plane, and that other plasma systems considered to be turbulent have less permutation entropy and more statistical complexity. This paper presents use of this statistical analysis tool on solar wind plasma, as well as on an MHD turbulent experimental plasma.

  15. Inferring general relations between network characteristics from specific network ensembles.

    PubMed

    Cardanobile, Stefano; Pernice, Volker; Deger, Moritz; Rotter, Stefan

    2012-01-01

    Different network models have been suggested for the topology underlying complex interactions in natural systems. These models are aimed at replicating specific statistical features encountered in real-world networks. However, it is rarely considered to which degree the results obtained for one particular network class can be extrapolated to real-world networks. We address this issue by comparing different classical and more recently developed network models with respect to their ability to generate networks with large structural variability. In particular, we consider the statistical constraints which the respective construction scheme imposes on the generated networks. After having identified the most variable networks, we address the issue of which constraints are common to all network classes and are thus suitable candidates for being generic statistical laws of complex networks. In fact, we find that generic, not model-related dependencies between different network characteristics do exist. This makes it possible to infer global features from local ones using regression models trained on networks with high generalization power. Our results confirm and extend previous findings regarding the synchronization properties of neural networks. Our method seems especially relevant for large networks, which are difficult to map completely, like the neural networks in the brain. The structure of such large networks cannot be fully sampled with the present technology. Our approach provides a method to estimate global properties of under-sampled networks in good approximation. Finally, we demonstrate on three different data sets (C. elegans neuronal network, R. prowazekii metabolic network, and a network of synonyms extracted from Roget's Thesaurus) that real-world networks have statistical relations compatible with those obtained using regression models.

  16. LD-SPatt: large deviations statistics for patterns on Markov chains.

    PubMed

    Nuel, G

    2004-01-01

    Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.

  17. Applications of modern statistical methods to analysis of data in physical science

    NASA Astrophysics Data System (ADS)

    Wicker, James Eric

    Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

  18. Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.; Daughney, C.

    2016-12-01

    The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.

  19. Path statistics, memory, and coarse-graining of continuous-time random walks on networks

    PubMed Central

    Kion-Crosby, Willow; Morozov, Alexandre V.

    2015-01-01

    Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states (for example, those employed in studies of molecular kinetics) or spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. Here we use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach can be applied to calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks, we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models, we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN (Path Matrix Algorithm for Networks), a Python script that users can apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs. PMID:26646868

  20. Numerical solutions of ideal quantum gas dynamical flows governed by semiclassical ellipsoidal-statistical distribution

    PubMed Central

    Yang, Jaw-Yen; Yan, Chih-Yuan; Diaz, Manuel; Huang, Juan-Chen; Li, Zhihui; Zhang, Hanxin

    2014-01-01

    The ideal quantum gas dynamics as manifested by the semiclassical ellipsoidal-statistical (ES) equilibrium distribution derived in Wu et al. (Wu et al. 2012 Proc. R. Soc. A 468, 1799–1823 (doi:10.1098/rspa.2011.0673)) is numerically studied for particles of three statistics. This anisotropic ES equilibrium distribution was derived using the maximum entropy principle and conserves the mass, momentum and energy, but differs from the standard Fermi–Dirac or Bose–Einstein distribution. The present numerical method combines the discrete velocity (or momentum) ordinate method in momentum space and the high-resolution shock-capturing method in physical space. A decoding procedure to obtain the necessary parameters for determining the ES distribution is also devised. Computations of two-dimensional Riemann problems are presented, and various contours of the quantities unique to this ES model are illustrated. The main flow features, such as shock waves, expansion waves and slip lines and their complex nonlinear interactions, are depicted and found to be consistent with existing calculations for a classical gas. PMID:24399919

  1. Geo-statistical analysis of Culicoides spp. distribution and abundance in Sicily, Italy.

    PubMed

    Blanda, Valeria; Blanda, Marcellocalogero; La Russa, Francesco; Scimeca, Rossella; Scimeca, Salvatore; D'Agostino, Rosalia; Auteri, Michelangelo; Torina, Alessandra

    2018-02-01

    Biting midges belonging to Culicoides imicola, Culicoides obsoletus complex and Culicoides pulicaris complex (Diptera: Ceratopogonidae) are increasingly implicated as vectors of bluetongue virus in Palaearctic regions. Culicoides obsoletus complex includes C. obsoletus (sensu stricto), C. scoticus, C. dewulfi and C. chiopterus. Culicoides pulicaris and C. lupicaris belong to the Culicoides pulicaris complex. The aim of this study was a geo-statistical analysis of the abundance and spatial distribution of Culicoides spp. involved in bluetongue virus transmission. As part of the national bluetongue surveillance plan 7081 catches were collected in 897 Sicilian farms from 2000 to 2013. Onderstepoort-type blacklight traps were used for sample collection and each catch was analysed for the presence of Culicoides spp. and for the presence and abundance of Culicoides vector species (C. imicola, C. pulicaris / C. obsoletus complexes). A geo-statistical analysis was carried out monthly via the interpolation of measured values based on the Inverse Distance Weighted method, using a GIS tool. Raster maps were reclassified into seven classes according to the presence and abundance of Culicoides, in order to obtain suitable maps for Map Algebra operations. Sicilian provinces showing a very high abundance of Culicoides vector species were Messina (80% of the whole area), Palermo (20%) and Catania (12%). A total of 5654 farms fell within the very high risk area for bluetongue (21% of the 26,676 farms active in Sicily); of these, 3483 farms were in Messina, 1567 in Palermo and 604 in Catania. Culicoides imicola was prevalent in Palermo, C. pulicaris in Messina and C. obsoletus complex was very abundant over the whole island with the highest abundance value in Messina. Our study reports the results of a geo-statistical analysis concerning the abundance and spatial distribution of Culicoides spp. in Sicily throughout the fourteen year study. It provides useful decision support in the field of epidemiology, allowing the identification of areas to be monitored as bases for improved surveillance plans. Moreover, this knowledge can become a tool for the evaluation of virus transmission risks, especially if related to vector competence.

  2. A brief introduction to computer-intensive methods, with a view towards applications in spatial statistics and stereology.

    PubMed

    Mattfeldt, Torsten

    2011-04-01

    Computer-intensive methods may be defined as data analytical procedures involving a huge number of highly repetitive computations. We mention resampling methods with replacement (bootstrap methods), resampling methods without replacement (randomization tests) and simulation methods. The resampling methods are based on simple and robust principles and are largely free from distributional assumptions. Bootstrap methods may be used to compute confidence intervals for a scalar model parameter and for summary statistics from replicated planar point patterns, and for significance tests. For some simple models of planar point processes, point patterns can be simulated by elementary Monte Carlo methods. The simulation of models with more complex interaction properties usually requires more advanced computing methods. In this context, we mention simulation of Gibbs processes with Markov chain Monte Carlo methods using the Metropolis-Hastings algorithm. An alternative to simulations on the basis of a parametric model consists of stochastic reconstruction methods. The basic ideas behind the methods are briefly reviewed and illustrated by simple worked examples in order to encourage novices in the field to use computer-intensive methods. © 2010 The Authors Journal of Microscopy © 2010 Royal Microscopical Society.

  3. Networking—a statistical physics perspective

    NASA Astrophysics Data System (ADS)

    Yeung, Chi Ho; Saad, David

    2013-03-01

    Networking encompasses a variety of tasks related to the communication of information on networks; it has a substantial economic and societal impact on a broad range of areas including transportation systems, wired and wireless communications and a range of Internet applications. As transportation and communication networks become increasingly more complex, the ever increasing demand for congestion control, higher traffic capacity, quality of service, robustness and reduced energy consumption requires new tools and methods to meet these conflicting requirements. The new methodology should serve for gaining better understanding of the properties of networking systems at the macroscopic level, as well as for the development of new principled optimization and management algorithms at the microscopic level. Methods of statistical physics seem best placed to provide new approaches as they have been developed specifically to deal with nonlinear large-scale systems. This review aims at presenting an overview of tools and methods that have been developed within the statistical physics community and that can be readily applied to address the emerging problems in networking. These include diffusion processes, methods from disordered systems and polymer physics, probabilistic inference, which have direct relevance to network routing, file and frequency distribution, the exploration of network structures and vulnerability, and various other practical networking applications.

  4. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods.

    PubMed

    Vizcaíno, Iván P; Carrera, Enrique V; Muñoz-Romero, Sergio; Cumbal, Luis H; Rojo-Álvarez, José Luis

    2017-10-16

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer's kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer's kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem.

  5. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods

    PubMed Central

    Vizcaíno, Iván P.; Muñoz-Romero, Sergio; Cumbal, Luis H.

    2017-01-01

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer’s kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer’s kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem. PMID:29035333

  6. Fluorimetric determination of some sulfur containing compounds through complex formation with terbium (Tb+3) and uranium (U+3).

    PubMed

    Taha, Elham Anwer; Hassan, Nagiba Yehya; Aal, Fahima Abdel; Fattah, Laila El-Sayed Abdel

    2007-05-01

    Two simple, sensitive and specific fluorimetric methods have been developed for the determination of some sulphur containing compounds namely, Acetylcysteine (Ac), Carbocisteine (Cc) and Thioctic acid (Th) using terbium Tb+3 and uranium U+3 ions as fluorescent probes. The proposed methods involve the formation of a ternary complex with Tb+3 in presence of Tris-buffer method (I) and a binary complex with aqueous uranyl acetate solution method (II). The fluorescence quenching of Tb+3 at 510, 488 and 540 nm (lambda(ex) 250, 241 and 268 nm) and of uranyl acetate at 512 nm (lambda(ex) 240 nm) due to the complex formation was quantitatively measured for Ac, Cc and Th, respectively. The reaction conditions and the fluorescence spectral properties of the complexes have been investigated. Under the described conditions, the proposed methods were applicable over the concentration range (0.2-2.5 microg ml(-1)), (1-4 microg ml(-1)) and (0.5-3.5 microg ml(-1)) with mean percentage recoveries 99.74+/-0.36, 99.70+/-0.52 and 99.43+/-0.23 for method (I) and (0.5-6 microg ml(-1)), (0.5-5 microg ml(-1)), and (1-6 microg ml(-1)) with mean percentage recoveries 99.38+/-0.20, 99.82+/-0.28 and 99.93+/-0.32 for method (II), for the three cited drugs, respectively. The proposed methods were successfully applied for the determination of the studied compounds in bulk powders and in pharmaceutical formulations, as well as in presence of their related substances. The results obtained were found to be in agree statistically with those obtained by official and reported ones. The two methods were validated according to USP guidelines and also assessed by applying the standard addition technique.

  7. Cryptic or pseudocryptic: can morphological methods inform copepod taxonomy? An analysis of publications and a case study of the Eurytemora affinis species complex

    PubMed Central

    Lajus, Dmitry; Sukhikh, Natalia; Alekseev, Victor

    2015-01-01

    Interest in cryptic species has increased significantly with current progress in genetic methods. The large number of cryptic species suggests that the resolution of traditional morphological techniques may be insufficient for taxonomical research. However, some species now considered to be cryptic may, in fact, be designated pseudocryptic after close morphological examination. Thus the “cryptic or pseudocryptic” dilemma speaks to the resolution of morphological analysis and its utility for identifying species. We address this dilemma first by systematically reviewing data published from 1980 to 2013 on cryptic species of Copepoda and then by performing an in-depth morphological study of the former Eurytemora affinis complex of cryptic species. Analyzing the published data showed that, in 5 of 24 revisions eligible for systematic review, cryptic species assignment was based solely on the genetic variation of forms without detailed morphological analysis to confirm the assignment. Therefore, some newly described cryptic species might be designated pseudocryptic under more detailed morphological analysis as happened with Eurytemora affinis complex. Recent genetic analyses of the complex found high levels of heterogeneity without morphological differences; it is argued to be cryptic. However, next detailed morphological analyses allowed to describe a number of valid species. Our study, using deep statistical analyses usually not applied for new species describing, of this species complex confirmed considerable differences between former cryptic species. In particular, fluctuating asymmetry (FA), the random variation of left and right structures, was significantly different between forms and provided independent information about their status. Our work showed that multivariate statistical approaches, such as principal component analysis, can be powerful techniques for the morphological discrimination of cryptic taxons. Despite increasing cryptic species designations, morphological techniques have great potential in determining copepod taxonomy. PMID:26120427

  8. Method and system for knowledge discovery using non-linear statistical analysis and a 1st and 2nd tier computer program

    DOEpatents

    Hively, Lee M [Philadelphia, TN

    2011-07-12

    The invention relates to a method and apparatus for simultaneously processing different sources of test data into informational data and then processing different categories of informational data into knowledge-based data. The knowledge-based data can then be communicated between nodes in a system of multiple computers according to rules for a type of complex, hierarchical computer system modeled on a human brain.

  9. Extractive colorimetric method for the determination of dothiepin hydrochloride and risperidone in pure and in dosage forms.

    PubMed

    Hassan, Wafaa El-Sayed

    2008-08-01

    Three rapid, simple, reproducible and sensitive extractive colorimetric methods (A--C) for assaying dothiepin hydrochloride (I) and risperidone (II) in bulk sample and in dosage forms were investigated. Methods A and B are based on the formation of an ion pair complexes with methyl orange (A) and orange G (B), whereas method C depends on ternary complex formation between cobalt thiocyanate and the studied drug I or II. The optimum reaction conditions were investigated and it was observed the calibration curves resulting from the measurements of absorbance concentration relations of the extracted complexes were linear over the concentration range 0.1--12 microg ml(-1) for method A, 0.5--11 mug ml(-1) for method B, and 3.2--80 microg ml(-1) for method C with a relative standard deviation (RSD) of 1.17 and 1.28 for drug I and II, respectively. The molar absorptivity, Sandell sensitivity, Ringbom optimum concentration ranges, and detection and quantification limits for all complexes were calculated and evaluated at maximum wavelengths of 423, 498, and 625 nm, using methods A, B, and C, respectively. The interference from excipients commonly present in dosage forms and common degradation products was studied. The proposed methods are highly specific for the determination of drugs I and II, in their dosage forms applying the standard additions technique without any interference from common excipients. The proposed methods have been compared statistically to the reference methods and found to be simple, accurate (t-test) and reproducible (F-value).

  10. Reconstructing Information in Large-Scale Structure via Logarithmic Mapping

    NASA Astrophysics Data System (ADS)

    Szapudi, Istvan

    We propose to develop a new method to extract information from large-scale structure data combining two-point statistics and non-linear transformations; before, this information was available only with substantially more complex higher-order statistical methods. Initially, most of the cosmological information in large-scale structure lies in two-point statistics. With non- linear evolution, some of that useful information leaks into higher-order statistics. The PI and group has shown in a series of theoretical investigations how that leakage occurs, and explained the Fisher information plateau at smaller scales. This plateau means that even as more modes are added to the measurement of the power spectrum, the total cumulative information (loosely speaking the inverse errorbar) is not increasing. Recently we have shown in Neyrinck et al. (2009, 2010) that a logarithmic (and a related Gaussianization or Box-Cox) transformation on the non-linear Dark Matter or galaxy field reconstructs a surprisingly large fraction of this missing Fisher information of the initial conditions. This was predicted by the earlier wave mechanical formulation of gravitational dynamics by Szapudi & Kaiser (2003). The present proposal is focused on working out the theoretical underpinning of the method to a point that it can be used in practice to analyze data. In particular, one needs to deal with the usual real-life issues of galaxy surveys, such as complex geometry, discrete sam- pling (Poisson or sub-Poisson noise), bias (linear, or non-linear, deterministic, or stochastic), redshift distortions, pro jection effects for 2D samples, and the effects of photometric redshift errors. We will develop methods for weak lensing and Sunyaev-Zeldovich power spectra as well, the latter specifically targetting Planck. In addition, we plan to investigate the question of residual higher- order information after the non-linear mapping, and possible applications for cosmology. Our aim will be to work out practical methods, with the ultimate goal of cosmological parameter estimation. We will quantify with standard MCMC and Fisher methods (including DETF Figure of merit when applicable) the efficiency of our estimators, comparing with the conventional method, that uses the un-transformed field. Preliminary results indicate that the increase for NASA's WFIRST in the DETF Figure of Merit would be 1.5-4.2 using a range of pessimistic to optimistic assumptions, respectively.

  11. Estimating statistical isotropy violation in CMB due to non-circular beam and complex scan in minutes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pant, Nidhi; Das, Santanu; Mitra, Sanjit

    Mild, unavoidable deviations from circular-symmetry of instrumental beams along with scan strategy can give rise to measurable Statistical Isotropy (SI) violation in Cosmic Microwave Background (CMB) experiments. If not accounted properly, this spurious signal can complicate the extraction of other SI violation signals (if any) in the data. However, estimation of this effect through exact numerical simulation is computationally intensive and time consuming. A generalized analytical formalism not only provides a quick way of estimating this signal, but also gives a detailed understanding connecting the leading beam anisotropy components to a measurable BipoSH characterisation of SI violation. In this paper,more » we provide an approximate generic analytical method for estimating the SI violation generated due to a non-circular (NC) beam and arbitrary scan strategy, in terms of the Bipolar Spherical Harmonic (BipoSH) spectra. Our analytical method can predict almost all the features introduced by a NC beam in a complex scan and thus reduces the need for extensive numerical simulation worth tens of thousands of CPU hours into minutes long calculations. As an illustrative example, we use WMAP beams and scanning strategy to demonstrate the easability, usability and efficiency of our method. We test all our analytical results against that from exact numerical simulations.« less

  12. Using assemblage data in ecological indicators: A comparison and evaluation of commonly available statistical tools

    USGS Publications Warehouse

    Smith, Joseph M.; Mather, Martha E.

    2012-01-01

    Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.

  13. Survivability Versus Time

    NASA Technical Reports Server (NTRS)

    Joyner, James J., Sr.

    2014-01-01

    Develop Survivability vs Time Model as a decision-evaluation tool to assess various emergency egress methods used at Launch Complex 39B (LC 39B) and in the Vehicle Assembly Building (VAB) on NASAs Kennedy Space Center. For each hazard scenario, develop probability distributions to address statistical uncertainty resulting in survivability plots over time and composite survivability plots encompassing multiple hazard scenarios.

  14. Hunting Solomonoff's Swans: Exploring the Boundary Between Physics and Statistics in Hydrological Modeling

    NASA Astrophysics Data System (ADS)

    Nearing, G. S.

    2014-12-01

    Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the tradeoff between model complexity and prediction power.

  15. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, George

    1993-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.

  16. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, Stanislav

    1992-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

  17. “Forward Genetics” as a Method to Maximize Power and Cost-Efficiency in Studies of Human Complex Traits

    PubMed Central

    Derks, E. M.; Dolan, C. V.; Kahn, R. S.; Ophoff, R. A.

    2010-01-01

    There is increasing interest in methods to disentangle the relationship between genotype and (endo)phenotypes in human complex traits. We present a population-based method of increasing the power and cost-efficiency of studies by selecting random individuals with a particular genotype and then assessing the accompanying quantitative phenotypes. Using statistical derivations, power- and cost graphs we show that such a “forward genetics” approach can lead to a marked reduction in sample size and costs. This approach is particularly apt for implementing in epidemiological studies for which DNA is already available but the phenotyping costs are high. Electronic supplementary material The online version of this article (doi:10.1007/s10519-010-9348-y) contains supplementary material, which is available to authorized users. PMID:20232132

  18. Comparison of molecular mechanics-Poisson-Boltzmann surface area (MM-PBSA) and molecular mechanics-three-dimensional reference interaction site model (MM-3D-RISM) method to calculate the binding free energy of protein-ligand complexes: Effect of metal ion and advance statistical test

    NASA Astrophysics Data System (ADS)

    Pandey, Preeti; Srivastava, Rakesh; Bandyopadhyay, Pradipta

    2018-03-01

    The relative performance of MM-PBSA and MM-3D-RISM methods to estimate the binding free energy of protein-ligand complexes is investigated by applying these to three proteins (Dihydrofolate Reductase, Catechol-O-methyltransferase, and Stromelysin-1) differing in the number of metal ions they contain. None of the computational methods could distinguish all the ligands based on their calculated binding free energies (as compared to experimental values). The difference between the two comes from both polar and non-polar part of solvation. For charged ligand case, MM-PBSA and MM-3D-RISM give a qualitatively different result for the polar part of solvation.

  19. A family of variable step-size affine projection adaptive filter algorithms using statistics of channel impulse response

    NASA Astrophysics Data System (ADS)

    Shams Esfand Abadi, Mohammad; AbbasZadeh Arani, Seyed Ali Asghar

    2011-12-01

    This paper extends the recently introduced variable step-size (VSS) approach to the family of adaptive filter algorithms. This method uses prior knowledge of the channel impulse response statistic. Accordingly, optimal step-size vector is obtained by minimizing the mean-square deviation (MSD). The presented algorithms are the VSS affine projection algorithm (VSS-APA), the VSS selective partial update NLMS (VSS-SPU-NLMS), the VSS-SPU-APA, and the VSS selective regressor APA (VSS-SR-APA). In VSS-SPU adaptive algorithms the filter coefficients are partially updated which reduce the computational complexity. In VSS-SR-APA, the optimal selection of input regressors is performed during the adaptation. The presented algorithms have good convergence speed, low steady state mean square error (MSE), and low computational complexity features. We demonstrate the good performance of the proposed algorithms through several simulations in system identification scenario.

  20. Massive parallelization of serial inference algorithms for a complex generalized linear model

    PubMed Central

    Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David

    2014-01-01

    Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363

  1. Evolutionary dynamics of group interactions on structured populations: a review

    PubMed Central

    Perc, Matjaž; Gómez-Gardeñes, Jesús; Szolnoki, Attila; Floría, Luis M.; Moreno, Yamir

    2013-01-01

    Interactions among living organisms, from bacteria colonies to human societies, are inherently more complex than interactions among particles and non-living matter. Group interactions are a particularly important and widespread class, representative of which is the public goods game. In addition, methods of statistical physics have proved valuable for studying pattern formation, equilibrium selection and self-organization in evolutionary games. Here, we review recent advances in the study of evolutionary dynamics of group interactions on top of structured populations, including lattices, complex networks and coevolutionary models. We also compare these results with those obtained on well-mixed populations. The review particularly highlights that the study of the dynamics of group interactions, like several other important equilibrium and non-equilibrium dynamical processes in biological, economical and social sciences, benefits from the synergy between statistical physics, network science and evolutionary game theory. PMID:23303223

  2. Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications.

    PubMed

    Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil

    2016-11-17

    Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends' preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors.

  3. Application of Bayesian inference to the study of hierarchical organization in self-organized complex adaptive systems

    NASA Astrophysics Data System (ADS)

    Knuth, K. H.

    2001-05-01

    We consider the application of Bayesian inference to the study of self-organized structures in complex adaptive systems. In particular, we examine the distribution of elements, agents, or processes in systems dominated by hierarchical structure. We demonstrate that results obtained by Caianiello [1] on Hierarchical Modular Systems (HMS) can be found by applying Jaynes' Principle of Group Invariance [2] to a few key assumptions about our knowledge of hierarchical organization. Subsequent application of the Principle of Maximum Entropy allows inferences to be made about specific systems. The utility of the Bayesian method is considered by examining both successes and failures of the hierarchical model. We discuss how Caianiello's original statements suffer from the Mind Projection Fallacy [3] and we restate his assumptions thus widening the applicability of the HMS model. The relationship between inference and statistical physics, described by Jaynes [4], is reiterated with the expectation that this realization will aid the field of complex systems research by moving away from often inappropriate direct application of statistical mechanics to a more encompassing inferential methodology.

  4. Shannon entropy in the research on stationary regimes and the evolution of complexity

    NASA Astrophysics Data System (ADS)

    Eskov, V. M.; Eskov, V. V.; Vochmina, Yu. V.; Gorbunov, D. V.; Ilyashenko, L. K.

    2017-05-01

    The questions of the identification of complex biological systems (complexity) as special self-organizing systems or systems of the third type first defined by W. Weaver in 1948 continue to be of interest. No reports on the evaluation of entropy for systems of the third type were found among the publications currently available to the authors. The present study addresses the parameters of muscle biopotentials recorded using surface interference electromyography and presents the results of calculation of the Shannon entropy, autocorrelation functions, and statistical distribution functions for electromyograms of subjects in different physiological states (rest and tension of muscles). The results do not allow for statistically reliable discrimination between the functional states of muscles. However, the data obtained by calculating electromyogram quasiatttractor parameters and matrices of paired comparisons of electromyogram samples (calculation of the number k of "coinciding" pairs among the electromyogram samples) provide an integral characteristic that allows the identification of substantial differences between the state of rest and the different states of functional activity. Modifications and implementation of new methods in combination with the novel methods of the theory of chaos and self-organization are obviously essential. The stochastic approach paradigm is not applicable to systems of the third type due to continuous and chaotic changes of the parameters of the state vector x( t) of an organism or the contrasting constancy of these parameters (in the case of entropy).

  5. Approximate median regression for complex survey data with skewed response.

    PubMed

    Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi

    2016-12-01

    The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.

  6. Approximate Median Regression for Complex Survey Data with Skewed Response

    PubMed Central

    Fraser, Raphael André; Lipsitz, Stuart R.; Sinha, Debajyoti; Fitzmaurice, Garrett M.; Pan, Yi

    2016-01-01

    Summary The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling and weighting. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS) based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. PMID:27062562

  7. [Systems epidemiology].

    PubMed

    Huang, T; Li, L M

    2018-05-10

    The era of medical big data, translational medicine and precision medicine brings new opportunities for the study of etiology of chronic complex diseases. How to implement evidence-based medicine, translational medicine and precision medicine are the challenges we are facing. Systems epidemiology, a new field of epidemiology, combines medical big data with system biology and examines the statistical model of disease risk, the future risk simulation and prediction using the data at molecular, cellular, population, social and ecological levels. Due to the diversity and complexity of big data sources, the development of study design and analytic methods of systems epidemiology face new challenges and opportunities. This paper summarizes the theoretical basis, concept, objectives, significances, research design and analytic methods of systems epidemiology and its application in the field of public health.

  8. A statistical learning strategy for closed-loop control of fluid flows

    NASA Astrophysics Data System (ADS)

    Guéniat, Florimond; Mathelin, Lionel; Hussaini, M. Yousuff

    2016-12-01

    This work discusses a closed-loop control strategy for complex systems utilizing scarce and streaming data. A discrete embedding space is first built using hash functions applied to the sensor measurements from which a Markov process model is derived, approximating the complex system's dynamics. A control strategy is then learned using reinforcement learning once rewards relevant with respect to the control objective are identified. This method is designed for experimental configurations, requiring no computations nor prior knowledge of the system, and enjoys intrinsic robustness. It is illustrated on two systems: the control of the transitions of a Lorenz'63 dynamical system, and the control of the drag of a cylinder flow. The method is shown to perform well.

  9. Validated spectrophotometric methods for determination of sodium valproate based on charge transfer complexation reactions.

    PubMed

    Belal, Tarek S; El-Kafrawy, Dina S; Mahrous, Mohamed S; Abdel-Khalek, Magdi M; Abo-Gharam, Amira H

    2016-02-15

    This work presents the development, validation and application of four simple and direct spectrophotometric methods for determination of sodium valproate (VP) through charge transfer complexation reactions. The first method is based on the reaction of the drug with p-chloranilic acid (p-CA) in acetone to give a purple colored product with maximum absorbance at 524nm. The second method depends on the reaction of VP with dichlone (DC) in dimethylformamide forming a reddish orange product measured at 490nm. The third method is based upon the interaction of VP and picric acid (PA) in chloroform resulting in the formation of a yellow complex measured at 415nm. The fourth method involves the formation of a yellow complex peaking at 361nm upon the reaction of the drug with iodine in chloroform. Experimental conditions affecting the color development were studied and optimized. Stoichiometry of the reactions was determined. The proposed spectrophotometric procedures were effectively validated with respect to linearity, ranges, precision, accuracy, specificity, robustness, detection and quantification limits. Calibration curves of the formed color products with p-CA, DC, PA and iodine showed good linear relationships over the concentration ranges 24-144, 40-200, 2-20 and 1-8μg/mL respectively. The proposed methods were successfully applied to the assay of sodium valproate in tablets and oral solution dosage forms with good accuracy and precision. Assay results were statistically compared to a reference pharmacopoeial HPLC method where no significant differences were observed between the proposed methods and reference method. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Validated spectrophotometric methods for determination of sodium valproate based on charge transfer complexation reactions

    NASA Astrophysics Data System (ADS)

    Belal, Tarek S.; El-Kafrawy, Dina S.; Mahrous, Mohamed S.; Abdel-Khalek, Magdi M.; Abo-Gharam, Amira H.

    2016-02-01

    This work presents the development, validation and application of four simple and direct spectrophotometric methods for determination of sodium valproate (VP) through charge transfer complexation reactions. The first method is based on the reaction of the drug with p-chloranilic acid (p-CA) in acetone to give a purple colored product with maximum absorbance at 524 nm. The second method depends on the reaction of VP with dichlone (DC) in dimethylformamide forming a reddish orange product measured at 490 nm. The third method is based upon the interaction of VP and picric acid (PA) in chloroform resulting in the formation of a yellow complex measured at 415 nm. The fourth method involves the formation of a yellow complex peaking at 361 nm upon the reaction of the drug with iodine in chloroform. Experimental conditions affecting the color development were studied and optimized. Stoichiometry of the reactions was determined. The proposed spectrophotometric procedures were effectively validated with respect to linearity, ranges, precision, accuracy, specificity, robustness, detection and quantification limits. Calibration curves of the formed color products with p-CA, DC, PA and iodine showed good linear relationships over the concentration ranges 24-144, 40-200, 2-20 and 1-8 μg/mL respectively. The proposed methods were successfully applied to the assay of sodium valproate in tablets and oral solution dosage forms with good accuracy and precision. Assay results were statistically compared to a reference pharmacopoeial HPLC method where no significant differences were observed between the proposed methods and reference method.

  11. Can’t Count or Won’t Count? Embedding Quantitative Methods in Substantive Sociology Curricula: A Quasi-Experiment

    PubMed Central

    Williams, Malcolm; Sloan, Luke; Cheung, Sin Yi; Sutton, Carole; Stevens, Sebastian; Runham, Libby

    2015-01-01

    This paper reports on a quasi-experiment in which quantitative methods (QM) are embedded within a substantive sociology module. Through measuring student attitudes before and after the intervention alongside control group comparisons, we illustrate the impact that embedding has on the student experience. Our findings are complex and even contradictory. Whilst the experimental group were less likely to be distrustful of statistics and appreciate how QM inform social research, they were also less confident about their statistical abilities, suggesting that through ‘doing’ quantitative sociology the experimental group are exposed to the intricacies of method and their optimism about their own abilities is challenged. We conclude that embedding QM in a single substantive module is not a ‘magic bullet’ and that a wider programme of content and assessment diversification across the curriculum is preferential. PMID:27330225

  12. Extractive-spectrophotometric determination of disopyramide and irbesartan in their pharmaceutical formulation

    NASA Astrophysics Data System (ADS)

    Abdellatef, Hisham E.

    2007-04-01

    Picric acid, bromocresol green, bromothymol blue, cobalt thiocyanate and molybdenum(V) thiocyanate have been tested as spectrophotometric reagents for the determination of disopyramide and irbesartan. Reaction conditions have been optimized to obtain coloured comoplexes of higher sensitivity and longer stability. The absorbance of ion-pair complexes formed were found to increases linearity with increases in concentrations of disopyramide and irbesartan which were corroborated by correction coefficient values. The developed methods have been successfully applied for the determination of disopyramide and irbesartan in bulk drugs and pharmaceutical formulations. The common excipients and additives did not interfere in their determination. The results obtained by the proposed methods have been statistically compared by means of student t-test and by the variance ratio F-test. The validity was assessed by applying the standard addition technique. The results were compared statistically with the official or reference methods showing a good agreement with high precision and accuracy.

  13. Parameter Estimation and Model Validation of Nonlinear Dynamical Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abarbanel, Henry; Gill, Philip

    In the performance period of this work under a DOE contract, the co-PIs, Philip Gill and Henry Abarbanel, developed new methods for statistical data assimilation for problems of DOE interest, including geophysical and biological problems. This included numerical optimization algorithms for variational principles, new parallel processing Monte Carlo routines for performing the path integrals of statistical data assimilation. These results have been summarized in the monograph: “Predicting the Future: Completing Models of Observed Complex Systems” by Henry Abarbanel, published by Spring-Verlag in June 2013. Additional results and details have appeared in the peer reviewed literature.

  14. An Investigation of the Fine Spatial Structure of Meteor Streams Using the Relational Database ``Meteor''

    NASA Astrophysics Data System (ADS)

    Karpov, A. V.; Yumagulov, E. Z.

    2003-05-01

    We have restored and ordered the archive of meteor observations carried out with a meteor radar complex ``KGU-M5'' since 1986. A relational database has been formed under the control of the Database Management System (DBMS) Oracle 8. We also improved and tested a statistical method for studying the fine spatial structure of meteor streams with allowance for the specific features of application of the DBMS. Statistical analysis of the results of observations made it possible to obtain information about the substance distribution in the Quadrantid, Geminid, and Perseid meteor streams.

  15. Systems genetics approaches to understand complex traits

    PubMed Central

    Civelek, Mete; Lusis, Aldons J.

    2014-01-01

    Systems genetics is an approach to understand the flow of biological information that underlies complex traits. It uses a range of experimental and statistical methods to quantitate and integrate intermediate phenotypes, such as transcript, protein or metabolite levels, in populations that vary for traits of interest. Systems genetics studies have provided the first global view of the molecular architecture of complex traits and are useful for the identification of genes, pathways and networks that underlie common human diseases. Given the urgent need to understand how the thousands of loci that have been identified in genome-wide association studies contribute to disease susceptibility, systems genetics is likely to become an increasingly important approach to understanding both biology and disease. PMID:24296534

  16. A Complex Systems Approach to Causal Discovery in Psychiatry.

    PubMed

    Saxe, Glenn N; Statnikov, Alexander; Fenyo, David; Ren, Jiwen; Li, Zhiguo; Prasad, Meera; Wall, Dennis; Bergman, Nora; Briggs, Ernestine C; Aliferis, Constantin

    2016-01-01

    Conventional research methodologies and data analytic approaches in psychiatric research are unable to reliably infer causal relations without experimental designs, or to make inferences about the functional properties of the complex systems in which psychiatric disorders are embedded. This article describes a series of studies to validate a novel hybrid computational approach--the Complex Systems-Causal Network (CS-CN) method-designed to integrate causal discovery within a complex systems framework for psychiatric research. The CS-CN method was first applied to an existing dataset on psychopathology in 163 children hospitalized with injuries (validation study). Next, it was applied to a much larger dataset of traumatized children (replication study). Finally, the CS-CN method was applied in a controlled experiment using a 'gold standard' dataset for causal discovery and compared with other methods for accurately detecting causal variables (resimulation controlled experiment). The CS-CN method successfully detected a causal network of 111 variables and 167 bivariate relations in the initial validation study. This causal network had well-defined adaptive properties and a set of variables was found that disproportionally contributed to these properties. Modeling the removal of these variables resulted in significant loss of adaptive properties. The CS-CN method was successfully applied in the replication study and performed better than traditional statistical methods, and similarly to state-of-the-art causal discovery algorithms in the causal detection experiment. The CS-CN method was validated, replicated, and yielded both novel and previously validated findings related to risk factors and potential treatments of psychiatric disorders. The novel approach yields both fine-grain (micro) and high-level (macro) insights and thus represents a promising approach for complex systems-oriented research in psychiatry.

  17. Effectiveness of public deliberation methods for gathering input on issues in healthcare: Results from a randomized trial.

    PubMed

    Carman, Kristin L; Mallery, Coretta; Maurer, Maureen; Wang, Grace; Garfinkel, Steve; Yang, Manshu; Gilmore, Dierdre; Windham, Amy; Ginsburg, Marjorie; Sofaer, Shoshanna; Gold, Marthe; Pathak-Sen, Ela; Davies, Todd; Siegel, Joanna; Mangrum, Rikki; Fernandez, Jessica; Richmond, Jennifer; Fishkin, James; Siu Chao, Alice

    2015-05-01

    Public deliberation elicits informed perspectives on complex issues that are values-laden and lack technical solutions. This Deliberative Methods Demonstration examined the effectiveness of public deliberation for obtaining informed public input regarding the role of medical evidence in U.S. healthcare. We conducted a 5-arm randomized controlled trial, assigning participants to one of four deliberative methods or to a reading materials only (RMO) control group. The four deliberative methods reflected important differences in implementation, including length of the deliberative process and mode of interaction. The project convened 76 groups between August and November 2012 in four U.S. Chicago, IL; Sacramento, CA; Silver Spring, MD; and Durham, NC, capturing a sociodemographically diverse sample with specific attention to ensuring inclusion of Hispanic, African-American, and elderly participants. Of 1774 people recruited, 75% participated: 961 took part in a deliberative method and 377 participants comprised the RMO control group. To assess effectiveness of the deliberative methods overall and of individual methods, we evaluated whether mean pre-post changes on a knowledge and attitude survey were statistically different from the RMO control using ANCOVA. In addition, we calculated mean scores capturing participant views of the impact and value of deliberation. Participating in deliberation increased participants' knowledge of evidence and comparative effectiveness research and shifted participants' attitudes regarding the role of evidence in decision-making. When comparing each deliberative method to the RMO control group, all four deliberative methods resulted in statistically significant change on at least one knowledge or attitude measure. These findings were underscored by self-reports that the experience affected participants' opinions. Public deliberation offers unique potential for those seeking informed input on complex, values-laden topics affecting broad public constituencies. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. A Systematic Comparison of Linear Regression-Based Statistical Methods to Assess Exposome-Health Associations.

    PubMed

    Agier, Lydiane; Portengen, Lützen; Chadeau-Hyam, Marc; Basagaña, Xavier; Giorgis-Allemand, Lise; Siroux, Valérie; Robinson, Oliver; Vlaanderen, Jelle; González, Juan R; Nieuwenhuijsen, Mark J; Vineis, Paolo; Vrijheid, Martine; Slama, Rémy; Vermeulen, Roel

    2016-12-01

    The exposome constitutes a promising framework to improve understanding of the effects of environmental exposures on health by explicitly considering multiple testing and avoiding selective reporting. However, exposome studies are challenged by the simultaneous consideration of many correlated exposures. We compared the performances of linear regression-based statistical methods in assessing exposome-health associations. In a simulation study, we generated 237 exposure covariates with a realistic correlation structure and with a health outcome linearly related to 0 to 25 of these covariates. Statistical methods were compared primarily in terms of false discovery proportion (FDP) and sensitivity. On average over all simulation settings, the elastic net and sparse partial least-squares regression showed a sensitivity of 76% and an FDP of 44%; Graphical Unit Evolutionary Stochastic Search (GUESS) and the deletion/substitution/addition (DSA) algorithm revealed a sensitivity of 81% and an FDP of 34%. The environment-wide association study (EWAS) underperformed these methods in terms of FDP (average FDP, 86%) despite a higher sensitivity. Performances decreased considerably when assuming an exposome exposure matrix with high levels of correlation between covariates. Correlation between exposures is a challenge for exposome research, and the statistical methods investigated in this study were limited in their ability to efficiently differentiate true predictors from correlated covariates in a realistic exposome context. Although GUESS and DSA provided a marginally better balance between sensitivity and FDP, they did not outperform the other multivariate methods across all scenarios and properties examined, and computational complexity and flexibility should also be considered when choosing between these methods. Citation: Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, Robinson O, Vlaanderen J, González JR, Nieuwenhuijsen MJ, Vineis P, Vrijheid M, Slama R, Vermeulen R. 2016. A systematic comparison of linear regression-based statistical methods to assess exposome-health associations. Environ Health Perspect 124:1848-1856; http://dx.doi.org/10.1289/EHP172.

  19. Robust approaches to quantification of margin and uncertainty for sparse data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hund, Lauren; Schroeder, Benjamin B.; Rumsey, Kelin

    Characterizing the tails of probability distributions plays a key role in quantification of margins and uncertainties (QMU), where the goal is characterization of low probability, high consequence events based on continuous measures of performance. When data are collected using physical experimentation, probability distributions are typically fit using statistical methods based on the collected data, and these parametric distributional assumptions are often used to extrapolate about the extreme tail behavior of the underlying probability distribution. In this project, we character- ize the risk associated with such tail extrapolation. Specifically, we conducted a scaling study to demonstrate the large magnitude of themore » risk; then, we developed new methods for communicat- ing risk associated with tail extrapolation from unvalidated statistical models; lastly, we proposed a Bayesian data-integration framework to mitigate tail extrapolation risk through integrating ad- ditional information. We conclude that decision-making using QMU is a complex process that cannot be achieved using statistical analyses alone.« less

  20. Experimental Determination of Dynamical Lee-Yang Zeros

    NASA Astrophysics Data System (ADS)

    Brandner, Kay; Maisi, Ville F.; Pekola, Jukka P.; Garrahan, Juan P.; Flindt, Christian

    2017-05-01

    Statistical physics provides the concepts and methods to explain the phase behavior of interacting many-body systems. Investigations of Lee-Yang zeros—complex singularities of the free energy in systems of finite size—have led to a unified understanding of equilibrium phase transitions. The ideas of Lee and Yang, however, are not restricted to equilibrium phenomena. Recently, Lee-Yang zeros have been used to characterize nonequilibrium processes such as dynamical phase transitions in quantum systems after a quench or dynamic order-disorder transitions in glasses. Here, we experimentally realize a scheme for determining Lee-Yang zeros in such nonequilibrium settings. We extract the dynamical Lee-Yang zeros of a stochastic process involving Andreev tunneling between a normal-state island and two superconducting leads from measurements of the dynamical activity along a trajectory. From the short-time behavior of the Lee-Yang zeros, we predict the large-deviation statistics of the activity which is typically difficult to measure. Our method paves the way for further experiments on the statistical mechanics of many-body systems out of equilibrium.

  1. Frequent statistics of link-layer bit stream data based on AC-IM algorithm

    NASA Astrophysics Data System (ADS)

    Cao, Chenghong; Lei, Yingke; Xu, Yiming

    2017-08-01

    At present, there are many relevant researches on data processing using classical pattern matching and its improved algorithm, but few researches on statistical data of link-layer bit stream. This paper adopts a frequent statistical method of link-layer bit stream data based on AC-IM algorithm for classical multi-pattern matching algorithms such as AC algorithm has high computational complexity, low efficiency and it cannot be applied to binary bit stream data. The method's maximum jump distance of the mode tree is length of the shortest mode string plus 3 in case of no missing? In this paper, theoretical analysis is made on the principle of algorithm construction firstly, and then the experimental results show that the algorithm can adapt to the binary bit stream data environment and extract the frequent sequence more accurately, the effect is obvious. Meanwhile, comparing with the classical AC algorithm and other improved algorithms, AC-IM algorithm has a greater maximum jump distance and less time-consuming.

  2. Statistical Optics

    NASA Astrophysics Data System (ADS)

    Goodman, Joseph W.

    2000-07-01

    The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research

  3. Implementation of cross correlation for energy discrimination on the time-of-flight spectrometer CORELLI.

    PubMed

    Ye, Feng; Liu, Yaohua; Whitfield, Ross; Osborn, Ray; Rosenkranz, Stephan

    2018-04-01

    The CORELLI instrument at Oak Ridge National Laboratory is a statistical chopper spectrometer designed and optimized to probe complex disorder in crystalline materials through diffuse scattering experiments. On CORELLI, the high efficiency of white-beam Laue diffraction combined with elastic discrimination have enabled an unprecedented data collection rate to obtain both the total and the elastic-only scattering over a large volume of reciprocal space from a single measurement. To achieve this, CORELLI is equipped with a statistical chopper to modulate the incoming neutron beam quasi-randomly, and then the cross-correlation method is applied to reconstruct the elastic component from the scattering data. Details of the implementation of the cross-correlation method on CORELLI are given and its performance is discussed.

  4. Implementation of cross correlation for energy discrimination on the time-of-flight spectrometer CORELLI

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ye, Feng; Liu, Yaohua; Whitfield, Ross

    The CORELLI instrument at Oak Ridge National Laboratory is a statistical chopper spectrometer designed and optimized to probe complex disorder in crystalline materials through diffuse scattering experiments. On CORELLI, the high efficiency of white-beam Laue diffraction combined with elastic discrimination have enabled an unprecedented data collection rate to obtain both the total and the elastic-only scattering over a large volume of reciprocal space from a single measurement. To achieve this, CORELLI is equipped with a statistical chopper to modulate the incoming neutron beam quasi-randomly, and then the cross-correlation method is applied to reconstruct the elastic component from the scattering data.more » Lastly, details of the implementation of the cross-correlation method on CORELLI are given and its performance is discussed.« less

  5. Implementation of cross correlation for energy discrimination on the time-of-flight spectrometer CORELLI

    DOE PAGES

    Ye, Feng; Liu, Yaohua; Whitfield, Ross; ...

    2018-03-26

    The CORELLI instrument at Oak Ridge National Laboratory is a statistical chopper spectrometer designed and optimized to probe complex disorder in crystalline materials through diffuse scattering experiments. On CORELLI, the high efficiency of white-beam Laue diffraction combined with elastic discrimination have enabled an unprecedented data collection rate to obtain both the total and the elastic-only scattering over a large volume of reciprocal space from a single measurement. To achieve this, CORELLI is equipped with a statistical chopper to modulate the incoming neutron beam quasi-randomly, and then the cross-correlation method is applied to reconstruct the elastic component from the scattering data.more » Lastly, details of the implementation of the cross-correlation method on CORELLI are given and its performance is discussed.« less

  6. Chemometric and multivariate statistical analysis of time-of-flight secondary ion mass spectrometry spectra from complex Cu-Fe sulfides.

    PubMed

    Kalegowda, Yogesh; Harmer, Sarah L

    2012-03-20

    Time-of-flight secondary ion mass spectrometry (TOF-SIMS) spectra of mineral samples are complex, comprised of large mass ranges and many peaks. Consequently, characterization and classification analysis of these systems is challenging. In this study, different chemometric and statistical data evaluation methods, based on monolayer sensitive TOF-SIMS data, have been tested for the characterization and classification of copper-iron sulfide minerals (chalcopyrite, chalcocite, bornite, and pyrite) at different flotation pulp conditions (feed, conditioned feed, and Eh modified). The complex mass spectral data sets were analyzed using the following chemometric and statistical techniques: principal component analysis (PCA); principal component-discriminant functional analysis (PC-DFA); soft independent modeling of class analogy (SIMCA); and k-Nearest Neighbor (k-NN) classification. PCA was found to be an important first step in multivariate analysis, providing insight into both the relative grouping of samples and the elemental/molecular basis for those groupings. For samples exposed to oxidative conditions (at Eh ~430 mV), each technique (PCA, PC-DFA, SIMCA, and k-NN) was found to produce excellent classification. For samples at reductive conditions (at Eh ~ -200 mV SHE), k-NN and SIMCA produced the most accurate classification. Phase identification of particles that contain the same elements but a different crystal structure in a mixed multimetal mineral system has been achieved.

  7. Statistical complexity without explicit reference to underlying probabilities

    NASA Astrophysics Data System (ADS)

    Pennini, F.; Plastino, A.

    2018-06-01

    We show that extremely simple systems of a not too large number of particles can be simultaneously thermally stable and complex. To such an end, we extend the statistical complexity's notion to simple configurations of non-interacting particles, without appeal to probabilities, and discuss configurational properties.

  8. Maximizing information exchange between complex networks

    NASA Astrophysics Data System (ADS)

    West, Bruce J.; Geneston, Elvis L.; Grigolini, Paolo

    2008-10-01

    Science is not merely the smooth progressive interaction of hypothesis, experiment and theory, although it sometimes has that form. More realistically the scientific study of any given complex phenomenon generates a number of explanations, from a variety of perspectives, that eventually requires synthesis to achieve a deep level of insight and understanding. One such synthesis has created the field of out-of-equilibrium statistical physics as applied to the understanding of complex dynamic networks. Over the past forty years the concept of complexity has undergone a metamorphosis. Complexity was originally seen as a consequence of memory in individual particle trajectories, in full agreement with a Hamiltonian picture of microscopic dynamics and, in principle, macroscopic dynamics could be derived from the microscopic Hamiltonian picture. The main difficulty in deriving macroscopic dynamics from microscopic dynamics is the need to take into account the actions of a very large number of components. The existence of events such as abrupt jumps, considered by the conventional continuous time random walk approach to describing complexity was never perceived as conflicting with the Hamiltonian view. Herein we review many of the reasons why this traditional Hamiltonian view of complexity is unsatisfactory. We show that as a result of technological advances, which make the observation of single elementary events possible, the definition of complexity has shifted from the conventional memory concept towards the action of non-Poisson renewal events. We show that the observation of crucial processes, such as the intermittent fluorescence of blinking quantum dots as well as the brain’s response to music, as monitored by a set of electrodes attached to the scalp, has forced investigators to go beyond the traditional concept of complexity and to establish closer contact with the nascent field of complex networks. Complex networks form one of the most challenging areas of modern research overarching all of the traditional scientific disciplines. The transportation networks of planes, highways and railroads; the economic networks of global finance and stock markets; the social networks of terrorism, governments, businesses and churches; the physical networks of telephones, the Internet, earthquakes and global warming and the biological networks of gene regulation, the human body, clusters of neurons and food webs, share a number of apparently universal properties as the networks become increasingly complex. Ubiquitous aspects of such complex networks are the appearance of non-stationary and non-ergodic statistical processes and inverse power-law statistical distributions. Herein we review the traditional dynamical and phase-space methods for modeling such networks as their complexity increases and focus on the limitations of these procedures in explaining complex networks. Of course we will not be able to review the entire nascent field of network science, so we limit ourselves to a review of how certain complexity barriers have been surmounted using newly applied theoretical concepts such as aging, renewal, non-ergodic statistics and the fractional calculus. One emphasis of this review is information transport between complex networks, which requires a fundamental change in perception that we express as a transition from the familiar stochastic resonance to the new concept of complexity matching.

  9. Effective control of complex turbulent dynamical systems through statistical functionals.

    PubMed

    Majda, Andrew J; Qi, Di

    2017-05-30

    Turbulent dynamical systems characterized by both a high-dimensional phase space and a large number of instabilities are ubiquitous among complex systems in science and engineering, including climate, material, and neural science. Control of these complex systems is a grand challenge, for example, in mitigating the effects of climate change or safe design of technology with fully developed shear turbulence. Control of flows in the transition to turbulence, where there is a small dimension of instabilities about a basic mean state, is an important and successful discipline. In complex turbulent dynamical systems, it is impossible to track and control the large dimension of instabilities, which strongly interact and exchange energy, and new control strategies are needed. The goal of this paper is to propose an effective statistical control strategy for complex turbulent dynamical systems based on a recent statistical energy principle and statistical linear response theory. We illustrate the potential practical efficiency and verify this effective statistical control strategy on the 40D Lorenz 1996 model in forcing regimes with various types of fully turbulent dynamics with nearly one-half of the phase space unstable.

  10. A new test method for the evaluation of total antioxidant activity of herbal products.

    PubMed

    Zaporozhets, Olga A; Krushynska, Olena A; Lipkovska, Natalia A; Barvinchenko, Valentina N

    2004-01-14

    A new test method for measuring the antioxidant power of herbal products, based on solid-phase spectrophotometry using tetrabenzo-[b,f,j,n][1,5,9,13]-tetraazacyclohexadecine-Cu(II) complex immobilized on silica gel, is proposed. The absorbance of the modified sorbent (lambda(max) = 712 nm) increases proportionally to the total antioxidant activity of the sample solution. The method represents an attractive alternative to the mostly used radical scavenging capacity assays, because they generally require complex long-lasting stages to be carried out. The proposed test method is simple ("drop and measure" procedure is applied), rapid (10 min/sample), requires only the monitoring of time and absorbance, and provides good statistical parameters (s(r)

  11. A ricin forensic profiling approach based on a complex set of biomarkers.

    PubMed

    Fredriksson, Sten-Åke; Wunschel, David S; Lindström, Susanne Wiklund; Nilsson, Calle; Wahl, Karen; Åstot, Crister

    2018-08-15

    A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1-PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods and robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. A simple method for processing data with least square method

    NASA Astrophysics Data System (ADS)

    Wang, Chunyan; Qi, Liqun; Chen, Yongxiang; Pang, Guangning

    2017-08-01

    The least square method is widely used in data processing and error estimation. The mathematical method has become an essential technique for parameter estimation, data processing, regression analysis and experimental data fitting, and has become a criterion tool for statistical inference. In measurement data analysis, the distribution of complex rules is usually based on the least square principle, i.e., the use of matrix to solve the final estimate and to improve its accuracy. In this paper, a new method is presented for the solution of the method which is based on algebraic computation and is relatively straightforward and easy to understand. The practicability of this method is described by a concrete example.

  13. Salient target detection based on pseudo-Wigner-Ville distribution and Rényi entropy.

    PubMed

    Xu, Yuannan; Zhao, Yuan; Jin, Chenfei; Qu, Zengfeng; Liu, Liping; Sun, Xiudong

    2010-02-15

    We present what we believe to be a novel method based on pseudo-Wigner-Ville distribution (PWVD) and Rényi entropy for salient targets detection. In the foundation of studying the statistical property of Rényi entropy via PWVD, the residual entropy-based saliency map of an input image can be obtained. From the saliency map, target detection is completed by the simple and convenient threshold segmentation. Experimental results demonstrate the proposed method can detect targets effectively in complex ground scenes.

  14. Fusion of multiscale wavelet-based fractal analysis on retina image for stroke prediction.

    PubMed

    Che Azemin, M Z; Kumar, Dinesh K; Wong, T Y; Wang, J J; Kawasaki, R; Mitchell, P; Arjunan, Sridhar P

    2010-01-01

    In this paper, we present a novel method of analyzing retinal vasculature using Fourier Fractal Dimension to extract the complexity of the retinal vasculature enhanced at different wavelet scales. Logistic regression was used as a fusion method to model the classifier for 5-year stroke prediction. The efficacy of this technique has been tested using standard pattern recognition performance evaluation, Receivers Operating Characteristics (ROC) analysis and medical prediction statistics, odds ratio. Stroke prediction model was developed using the proposed system.

  15. Power calculation for overall hypothesis testing with high-dimensional commensurate outcomes.

    PubMed

    Chi, Yueh-Yun; Gribbin, Matthew J; Johnson, Jacqueline L; Muller, Keith E

    2014-02-28

    The complexity of system biology means that any metabolic, genetic, or proteomic pathway typically includes so many components (e.g., molecules) that statistical methods specialized for overall testing of high-dimensional and commensurate outcomes are required. While many overall tests have been proposed, very few have power and sample size methods. We develop accurate power and sample size methods and software to facilitate study planning for high-dimensional pathway analysis. With an account of any complex correlation structure between high-dimensional outcomes, the new methods allow power calculation even when the sample size is less than the number of variables. We derive the exact (finite-sample) and approximate non-null distributions of the 'univariate' approach to repeated measures test statistic, as well as power-equivalent scenarios useful to generalize our numerical evaluations. Extensive simulations of group comparisons support the accuracy of the approximations even when the ratio of number of variables to sample size is large. We derive a minimum set of constants and parameters sufficient and practical for power calculation. Using the new methods and specifying the minimum set to determine power for a study of metabolic consequences of vitamin B6 deficiency helps illustrate the practical value of the new results. Free software implementing the power and sample size methods applies to a wide range of designs, including one group pre-intervention and post-intervention comparisons, multiple parallel group comparisons with one-way or factorial designs, and the adjustment and evaluation of covariate effects. Copyright © 2013 John Wiley & Sons, Ltd.

  16. HDBStat!: a platform-independent software suite for statistical analysis of high dimensional biology data.

    PubMed

    Trivedi, Prinal; Edwards, Jode W; Wang, Jelai; Gadbury, Gary L; Srinivasasainagendra, Vinodh; Zakharkin, Stanislav O; Kim, Kyoungmi; Mehta, Tapan; Brand, Jacob P L; Patki, Amit; Page, Grier P; Allison, David B

    2005-04-06

    Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website http://www.soph.uab.edu/ssg_content.asp?id=1164.

  17. Practical approximation method for firing-rate models of coupled neural networks with correlated inputs

    NASA Astrophysics Data System (ADS)

    Barreiro, Andrea K.; Ly, Cheng

    2017-08-01

    Rapid experimental advances now enable simultaneous electrophysiological recording of neural activity at single-cell resolution across large regions of the nervous system. Models of this neural network activity will necessarily increase in size and complexity, thus increasing the computational cost of simulating them and the challenge of analyzing them. Here we present a method to approximate the activity and firing statistics of a general firing rate network model (of the Wilson-Cowan type) subject to noisy correlated background inputs. The method requires solving a system of transcendental equations and is fast compared to Monte Carlo simulations of coupled stochastic differential equations. We implement the method with several examples of coupled neural networks and show that the results are quantitatively accurate even with moderate coupling strengths and an appreciable amount of heterogeneity in many parameters. This work should be useful for investigating how various neural attributes qualitatively affect the spiking statistics of coupled neural networks.

  18. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  19. Coupled disease-behavior dynamics on complex networks: A review

    NASA Astrophysics Data System (ADS)

    Wang, Zhen; Andrews, Michael A.; Wu, Zhi-Xi; Wang, Lin; Bauch, Chris T.

    2015-12-01

    It is increasingly recognized that a key component of successful infection control efforts is understanding the complex, two-way interaction between disease dynamics and human behavioral and social dynamics. Human behavior such as contact precautions and social distancing clearly influence disease prevalence, but disease prevalence can in turn alter human behavior, forming a coupled, nonlinear system. Moreover, in many cases, the spatial structure of the population cannot be ignored, such that social and behavioral processes and/or transmission of infection must be represented with complex networks. Research on studying coupled disease-behavior dynamics in complex networks in particular is growing rapidly, and frequently makes use of analysis methods and concepts from statistical physics. Here, we review some of the growing literature in this area. We contrast network-based approaches to homogeneous-mixing approaches, point out how their predictions differ, and describe the rich and often surprising behavior of disease-behavior dynamics on complex networks, and compare them to processes in statistical physics. We discuss how these models can capture the dynamics that characterize many real-world scenarios, thereby suggesting ways that policy makers can better design effective prevention strategies. We also describe the growing sources of digital data that are facilitating research in this area. Finally, we suggest pitfalls which might be faced by researchers in the field, and we suggest several ways in which the field could move forward in the coming years.

  20. Online Statistical Modeling (Regression Analysis) for Independent Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  1. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  2. Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications

    PubMed Central

    Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil

    2016-01-01

    Background. Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends’ preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors. PMID:28231172

  3. Temporal scaling and spatial statistical analyses of groundwater level fluctuations

    NASA Astrophysics Data System (ADS)

    Sun, H.; Yuan, L., Sr.; Zhang, Y.

    2017-12-01

    Natural dynamics such as groundwater level fluctuations can exhibit multifractionality and/or multifractality due likely to multi-scale aquifer heterogeneity and controlling factors, whose statistics requires efficient quantification methods. This study explores multifractionality and non-Gaussian properties in groundwater dynamics expressed by time series of daily level fluctuation at three wells located in the lower Mississippi valley, after removing the seasonal cycle in the temporal scaling and spatial statistical analysis. First, using the time-scale multifractional analysis, a systematic statistical method is developed to analyze groundwater level fluctuations quantified by the time-scale local Hurst exponent (TS-LHE). Results show that the TS-LHE does not remain constant, implying the fractal-scaling behavior changing with time and location. Hence, we can distinguish the potentially location-dependent scaling feature, which may characterize the hydrology dynamic system. Second, spatial statistical analysis shows that the increment of groundwater level fluctuations exhibits a heavy tailed, non-Gaussian distribution, which can be better quantified by a Lévy stable distribution. Monte Carlo simulations of the fluctuation process also show that the linear fractional stable motion model can well depict the transient dynamics (i.e., fractal non-Gaussian property) of groundwater level, while fractional Brownian motion is inadequate to describe natural processes with anomalous dynamics. Analysis of temporal scaling and spatial statistics therefore may provide useful information and quantification to understand further the nature of complex dynamics in hydrology.

  4. Nonlinear stochastic interacting dynamics and complexity of financial gasket fractal-like lattice percolation

    NASA Astrophysics Data System (ADS)

    Zhang, Wei; Wang, Jun

    2018-05-01

    A novel nonlinear stochastic interacting price dynamics is proposed and investigated by the bond percolation on Sierpinski gasket fractal-like lattice, aim to make a new approach to reproduce and study the complexity dynamics of real security markets. Fractal-like lattices correspond to finite graphs with vertices and edges, which are similar to fractals, and Sierpinski gasket is a well-known example of fractals. Fractional ordinal array entropy and fractional ordinal array complexity are introduced to analyze the complexity behaviors of financial signals. To deeper comprehend the fluctuation characteristics of the stochastic price evolution, the complexity analysis of random logarithmic returns and volatility are preformed, including power-law distribution, fractional sample entropy and fractional ordinal array complexity. For further verifying the rationality and validity of the developed stochastic price evolution, the actual security market dataset are also studied with the same statistical methods for comparison. The empirical results show that this stochastic price dynamics can reconstruct complexity behaviors of the actual security markets to some extent.

  5. Statistic inversion of multi-zone transition probability models for aquifer characterization in alluvial fans

    DOE PAGES

    Zhu, Lin; Dai, Zhenxue; Gong, Huili; ...

    2015-06-12

    Understanding the heterogeneity arising from the complex architecture of sedimentary sequences in alluvial fans is challenging. This study develops a statistical inverse framework in a multi-zone transition probability approach for characterizing the heterogeneity in alluvial fans. An analytical solution of the transition probability matrix is used to define the statistical relationships among different hydrofacies and their mean lengths, integral scales, and volumetric proportions. A statistical inversion is conducted to identify the multi-zone transition probability models and estimate the optimal statistical parameters using the modified Gauss–Newton–Levenberg–Marquardt method. The Jacobian matrix is computed by the sensitivity equation method, which results in anmore » accurate inverse solution with quantification of parameter uncertainty. We use the Chaobai River alluvial fan in the Beijing Plain, China, as an example for elucidating the methodology of alluvial fan characterization. The alluvial fan is divided into three sediment zones. In each zone, the explicit mathematical formulations of the transition probability models are constructed with optimized different integral scales and volumetric proportions. The hydrofacies distributions in the three zones are simulated sequentially by the multi-zone transition probability-based indicator simulations. Finally, the result of this study provides the heterogeneous structure of the alluvial fan for further study of flow and transport simulations.« less

  6. Institute for Brain and Neural Systems

    DTIC Science & Technology

    2009-10-06

    to deal with computational complexity when analyzing large amounts of information in visual scenes. It seems natural that in addition to exploring...algorithms using methods from statistical pattern recognition and machine learning. Over the last fifteen years, significant advances had been made in...recognition, robustness to noise and ability to cope with significant variations in lighting conditions. Identifying an occluded target adds another layer of

  7. The Truth, the Whole Truth, and Nothing but the Ground-Truth: Methods to Advance Environmental Justice and Researcher-Community Partnerships

    ERIC Educational Resources Information Center

    Sadd, James; Morello-Frosch, Rachel; Pastor, Manuel; Matsuoka, Martha; Prichard, Michele; Carter, Vanessa

    2014-01-01

    Environmental justice advocates often argue that environmental hazards and their health effects vary by neighborhood, income, and race. To assess these patterns and advance preventive policy, their colleagues in the research world often use complex and methodologically sophisticated statistical and geospatial techniques. One way to bridge the gap…

  8. A Three-Step Approach To Model Tree Mortality in the State of Georgia

    Treesearch

    Qingmin Meng; Chris J. Cieszewski; Roger C. Lowe; Michal Zasada

    2005-01-01

    Tree mortality is one of the most complex phenomena of forest growth and yield. Many types of factors affect tree mortality, which is considered difficult to predict. This study presents a new systematic approach to simulate tree mortality based on the integration of statistical models and geographical information systems. This method begins with variable preselection...

  9. The High Cost of Complexity in Experimental Design and Data Analysis: Type I and Type II Error Rates in Multiway ANOVA.

    ERIC Educational Resources Information Center

    Smith, Rachel A.; Levine, Timothy R.; Lachlan, Kenneth A.; Fediuk, Thomas A.

    2002-01-01

    Notes that the availability of statistical software packages has led to a sharp increase in use of complex research designs and complex statistical analyses in communication research. Reports a series of Monte Carlo simulations which demonstrate that this complexity may come at a heavier cost than many communication researchers realize. Warns…

  10. Comparison of three sampling and analytical methods for the determination of airborne hexavalent chromium.

    PubMed

    Boiano, J M; Wallace, M E; Sieber, W K; Groff, J H; Wang, J; Ashley, K

    2000-08-01

    A field study was conducted with the goal of comparing the performance of three recently developed or modified sampling and analytical methods for the determination of airborne hexavalent chromium (Cr(VI)). The study was carried out in a hard chrome electroplating facility and in a jet engine manufacturing facility where airborne Cr(VI) was expected to be present. The analytical methods evaluated included two laboratory-based procedures (OSHA Method ID-215 and NIOSH Method 7605) and a field-portable method (NIOSH Method 7703). These three methods employ an identical sampling methodology: collection of Cr(VI)-containing aerosol on a polyvinyl chloride (PVC) filter housed in a sampling cassette, which is connected to a personal sampling pump calibrated at an appropriate flow rate. The basis of the analytical methods for all three methods involves extraction of the PVC filter in alkaline buffer solution, chemical isolation of the Cr(VI) ion, complexation of the Cr(VI) ion with 1,5-diphenylcarbazide, and spectrometric measurement of the violet chromium diphenylcarbazone complex at 540 nm. However, there are notable specific differences within the sample preparation procedures used in three methods. To assess the comparability of the three measurement protocols, a total of 20 side-by-side air samples were collected, equally divided between a chromic acid electroplating operation and a spray paint operation where water soluble forms of Cr(VI) were used. A range of Cr(VI) concentrations from 0.6 to 960 microg m(-3), with Cr(VI) mass loadings ranging from 0.4 to 32 microg, was measured at the two operations. The equivalence of the means of the log-transformed Cr(VI) concentrations obtained from the different analytical methods was compared. Based on analysis of variance (ANOVA) results, no statistically significant differences were observed between mean values measured using each of the three methods. Small but statistically significant differences were observed between results obtained from performance evaluation samples for the NIOSH field method and the OSHA laboratory method.

  11. Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

    PubMed Central

    Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

    2003-01-01

    Background Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Results Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. Conclusion In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects. PMID:12962547

  12. Sex genes for genomic analysis in human brain: internal controls for comparison of probe level data extraction.

    PubMed

    Galfalvy, Hanga C; Erraji-Benchekroun, Loubna; Smyrniotopoulos, Peggy; Pavlidis, Paul; Ellis, Steven P; Mann, J John; Sibille, Etienne; Arango, Victoria

    2003-09-08

    Genomic studies of complex tissues pose unique analytical challenges for assessment of data quality, performance of statistical methods used for data extraction, and detection of differentially expressed genes. Ideally, to assess the accuracy of gene expression analysis methods, one needs a set of genes which are known to be differentially expressed in the samples and which can be used as a "gold standard". We introduce the idea of using sex-chromosome genes as an alternative to spiked-in control genes or simulations for assessment of microarray data and analysis methods. Expression of sex-chromosome genes were used as true internal biological controls to compare alternate probe-level data extraction algorithms (Microarray Suite 5.0 [MAS5.0], Model Based Expression Index [MBEI] and Robust Multi-array Average [RMA]), to assess microarray data quality and to establish some statistical guidelines for analyzing large-scale gene expression. These approaches were implemented on a large new dataset of human brain samples. RMA-generated gene expression values were markedly less variable and more reliable than MAS5.0 and MBEI-derived values. A statistical technique controlling the false discovery rate was applied to adjust for multiple testing, as an alternative to the Bonferroni method, and showed no evidence of false negative results. Fourteen probesets, representing nine Y- and two X-chromosome linked genes, displayed significant sex differences in brain prefrontal cortex gene expression. In this study, we have demonstrated the use of sex genes as true biological internal controls for genomic analysis of complex tissues, and suggested analytical guidelines for testing alternate oligonucleotide microarray data extraction protocols and for adjusting multiple statistical analysis of differentially expressed genes. Our results also provided evidence for sex differences in gene expression in the brain prefrontal cortex, supporting the notion of a putative direct role of sex-chromosome genes in differentiation and maintenance of sexual dimorphism of the central nervous system. Importantly, these analytical approaches are applicable to all microarray studies that include male and female human or animal subjects.

  13. Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

    PubMed Central

    Stewart, Gavin B.; Altman, Douglas G.; Askie, Lisa M.; Duley, Lelia; Simmonds, Mark C.; Stewart, Lesley A.

    2012-01-01

    Background Individual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and Findings We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. Conclusions For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials. PMID:23056232

  14. Observation-Driven Configuration of Complex Software Systems

    NASA Astrophysics Data System (ADS)

    Sage, Aled

    2010-06-01

    The ever-increasing complexity of software systems makes them hard to comprehend, predict and tune due to emergent properties and non-deterministic behaviour. Complexity arises from the size of software systems and the wide variety of possible operating environments: the increasing choice of platforms and communication policies leads to ever more complex performance characteristics. In addition, software systems exhibit different behaviour under different workloads. Many software systems are designed to be configurable so that policies can be chosen to meet the needs of various stakeholders. For complex software systems it can be difficult to accurately predict the effects of a change and to know which configuration is most appropriate. This thesis demonstrates that it is useful to run automated experiments that measure a selection of system configurations. Experiments can find configurations that meet the stakeholders' needs, find interesting behavioural characteristics, and help produce predictive models of the system's behaviour. The design and use of ACT (Automated Configuration Tool) for running such experiments is described, in combination a number of search strategies for deciding on the configurations to measure. Design Of Experiments (DOE) is discussed, with emphasis on Taguchi Methods. These statistical methods have been used extensively in manufacturing, but have not previously been used for configuring software systems. The novel contribution here is an industrial case study, applying the combination of ACT and Taguchi Methods to DC-Directory, a product from Data Connection Ltd (DCL). The case study investigated the applicability of Taguchi Methods for configuring complex software systems. Taguchi Methods were found to be useful for modelling and configuring DC- Directory, making them a valuable addition to the techniques available to system administrators and developers.

  15. Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

    PubMed

    Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W

    2016-01-01

    A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.

  16. Medical Image Compression Using a New Subband Coding Method

    NASA Technical Reports Server (NTRS)

    Kossentini, Faouzi; Smith, Mark J. T.; Scales, Allen; Tucker, Doug

    1995-01-01

    A recently introduced iterative complexity- and entropy-constrained subband quantization design algorithm is generalized and applied to medical image compression. In particular, the corresponding subband coder is used to encode Computed Tomography (CT) axial slice head images, where statistical dependencies between neighboring image subbands are exploited. Inter-slice conditioning is also employed for further improvements in compression performance. The subband coder features many advantages such as relatively low complexity and operation over a very wide range of bit rates. Experimental results demonstrate that the performance of the new subband coder is relatively good, both objectively and subjectively.

  17. Experiment Design for Complex VTOL Aircraft with Distributed Propulsion and Tilt Wing

    NASA Technical Reports Server (NTRS)

    Murphy, Patrick C.; Landman, Drew

    2015-01-01

    Selected experimental results from a wind tunnel study of a subscale VTOL concept with distributed propulsion and tilt lifting surfaces are presented. The vehicle complexity and automated test facility were ideal for use with a randomized designed experiment. Design of Experiments and Response Surface Methods were invoked to produce run efficient, statistically rigorous regression models with minimized prediction error. Static tests were conducted at the NASA Langley 12-Foot Low-Speed Tunnel to model all six aerodynamic coefficients over a large flight envelope. This work supports investigations at NASA Langley in developing advanced configurations, simulations, and advanced control systems.

  18. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers.

    PubMed

    Chadeau-Hyam, Marc; Campanella, Gianluca; Jombart, Thibaut; Bottolo, Leonardo; Portengen, Lutzen; Vineis, Paolo; Liquet, Benoit; Vermeulen, Roel C H

    2013-08-01

    Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets. Copyright © 2013 Wiley Periodicals, Inc.

  19. Megalithic Monument of Abuli, Georgia, and Possible Astronomical Signi cance

    NASA Astrophysics Data System (ADS)

    Jijelava, Badri; Simonia, Irakli

    2016-08-01

    Background/Objectives: In recent years, in purpose of investigation of the artefacts, the ancient culture and religion, based on the astronomy knowledge play significant role. The aim of this work is to identify the orientations of the religious megalithic complexes and their correlation to the celestial luminaries. Methods/Statistical Analysis: We harmonized the archeological data, ethnographical, historical information and restoration of ancient celestial sphere (using special astronomy application), which give us possibility to identify the correlations between the acronychal or helical rising/ set of luminaries and directions of megalithic objects. Very often such connections are stored in a current folklore too. Findings: This technique of investigations give us more clear understanding of ancient universe. Using this method, we can receive latent information about the ancient Gods - Luminaries, clarify current mythology, date of the megalithic complex. Application/Improvements: This method of investigation is an additional instrument for archeological investigations,

  20. Efficient Statistically Accurate Algorithms for the Fokker-Planck Equation in Large Dimensions

    NASA Astrophysics Data System (ADS)

    Chen, N.; Majda, A.

    2017-12-01

    Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method, which is based on an effective data assimilation framework, provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace. Therefore, it is computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from the traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has a significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O(100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.

  1. Statistical modeling of temperature, humidity and wind fields in the atmospheric boundary layer over the Siberian region

    NASA Astrophysics Data System (ADS)

    Lomakina, N. Ya.

    2017-11-01

    The work presents the results of the applied climatic division of the Siberian region into districts based on the methodology of objective classification of the atmospheric boundary layer climates by the "temperature-moisture-wind" complex realized with using the method of principal components and the special similarity criteria of average profiles and the eigen values of correlation matrices. On the territory of Siberia, it was identified 14 homogeneous regions for winter season and 10 regions were revealed for summer. The local statistical models were constructed for each region. These include vertical profiles of mean values, mean square deviations, and matrices of interlevel correlation of temperature, specific humidity, zonal and meridional wind velocity. The advantage of the obtained local statistical models over the regional models is shown.

  2. Valid Statistical Analysis for Logistic Regression with Multiple Sources

    NASA Astrophysics Data System (ADS)

    Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

    Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

  3. A simulator for evaluating methods for the detection of lesion-deficit associations

    NASA Technical Reports Server (NTRS)

    Megalooikonomou, V.; Davatzikos, C.; Herskovits, E. H.

    2000-01-01

    Although much has been learned about the functional organization of the human brain through lesion-deficit analysis, the variety of statistical and image-processing methods developed for this purpose precludes a closed-form analysis of the statistical power of these systems. Therefore, we developed a lesion-deficit simulator (LDS), which generates artificial subjects, each of which consists of a set of functional deficits, and a brain image with lesions; the deficits and lesions conform to predefined distributions. We used probability distributions to model the number, sizes, and spatial distribution of lesions, to model the structure-function associations, and to model registration error. We used the LDS to evaluate, as examples, the effects of the complexities and strengths of lesion-deficit associations, and of registration error, on the power of lesion-deficit analysis. We measured the numbers of recovered associations from these simulated data, as a function of the number of subjects analyzed, the strengths and number of associations in the statistical model, the number of structures associated with a particular function, and the prior probabilities of structures being abnormal. The number of subjects required to recover the simulated lesion-deficit associations was found to have an inverse relationship to the strength of associations, and to the smallest probability in the structure-function model. The number of structures associated with a particular function (i.e., the complexity of associations) had a much greater effect on the performance of the analysis method than did the total number of associations. We also found that registration error of 5 mm or less reduces the number of associations discovered by approximately 13% compared to perfect registration. The LDS provides a flexible framework for evaluating many aspects of lesion-deficit analysis.

  4. Lurasidone-β-cyclodextrin complexes: Physicochemical characterization and comparison of their antidepressant, antipsychotic activities against that of self microemulsifying formulation

    NASA Astrophysics Data System (ADS)

    Londhe, Vaishali Y.; Deshmane, Aishwarya B.; Singh, Sarita R.; Kulkarni, Yogesh A.

    2018-04-01

    Lurasidone hydrochloride (LHD) is an atypical antipsychotic drug has poor aqueous solubility and low bioavailability (9-19%). This study describes effect of different methods of complex formation with β-cyclodextrin (BCD) on enhancement of dissolution and on antidepressant, antipsychotic effects of LHD. Other purpose of this study is to compare pharmacodynamic effects of complexes with that of self microemulsifying drug delivery system of LHD (SMEDDS). Inclusion complexes (IC) of LHD and BCD were prepared by physical mixing (PM), kneading (KN) and spray drying (SD) in a 1:1 M ratio. These complexes were characterized by different techniques. KN and SD showing enhancement in dissolution, were compared with SMEDDS using Forced swim test (FST) and Tail suspension test (TST) for antidepressant action and Paw test for antipsychotic activity. Characterization of complexes confirmed interaction between LHD and BCD. Enhancement in dissolution is seen in following order SD > KN > PM > LHD. In all three animal models, SD, KN and SMEDDS showed statistically significant effect (p < .05) than drug alone showing enhancement in bioavailability. Complexation of LHD with BCD enhances dissolution which reflected in improvement of antidepressant and antipsychotic activity of drug. Solubility enhancement methods like complexation and self microemulsion improves pharmacodynamic activities of drug. Improvement of pharmacodynamic effect is seen in order, SD ≥ SMEDDS ≥ KN > LHD.

  5. Wave packet and statistical quantum calculations for the He + NeH⁺ → HeH⁺ + Ne reaction on the ground electronic state.

    PubMed

    Koner, Debasish; Barrios, Lizandra; González-Lezana, Tomás; Panda, Aditya N

    2014-09-21

    A real wave packet based time-dependent method and a statistical quantum method have been used to study the He + NeH(+) (v, j) reaction with the reactant in various ro-vibrational states, on a recently calculated ab initio ground state potential energy surface. Both the wave packet and statistical quantum calculations were carried out within the centrifugal sudden approximation as well as using the exact Hamiltonian. Quantum reaction probabilities exhibit dense oscillatory pattern for smaller total angular momentum values, which is a signature of resonances in a complex forming mechanism for the title reaction. Significant differences, found between exact and approximate quantum reaction cross sections, highlight the importance of inclusion of Coriolis coupling in the calculations. Statistical results are in fairly good agreement with the exact quantum results, for ground ro-vibrational states of the reactant. Vibrational excitation greatly enhances the reaction cross sections, whereas rotational excitation has relatively small effect on the reaction. The nature of the reaction cross section curves is dependent on the initial vibrational state of the reactant and is typical of a late barrier type potential energy profile.

  6. Accounting for isotopic clustering in Fourier transform mass spectrometry data analysis for clinical diagnostic studies.

    PubMed

    Kakourou, Alexia; Vach, Werner; Nicolardi, Simone; van der Burgt, Yuri; Mertens, Bart

    2016-10-01

    Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.

  7. Statistics of Optical Coherence Tomography Data From Human Retina

    PubMed Central

    de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo

    2010-01-01

    Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Lin; Dai, Zhenxue; Gong, Huili

    Understanding the heterogeneity arising from the complex architecture of sedimentary sequences in alluvial fans is challenging. This study develops a statistical inverse framework in a multi-zone transition probability approach for characterizing the heterogeneity in alluvial fans. An analytical solution of the transition probability matrix is used to define the statistical relationships among different hydrofacies and their mean lengths, integral scales, and volumetric proportions. A statistical inversion is conducted to identify the multi-zone transition probability models and estimate the optimal statistical parameters using the modified Gauss–Newton–Levenberg–Marquardt method. The Jacobian matrix is computed by the sensitivity equation method, which results in anmore » accurate inverse solution with quantification of parameter uncertainty. We use the Chaobai River alluvial fan in the Beijing Plain, China, as an example for elucidating the methodology of alluvial fan characterization. The alluvial fan is divided into three sediment zones. In each zone, the explicit mathematical formulations of the transition probability models are constructed with optimized different integral scales and volumetric proportions. The hydrofacies distributions in the three zones are simulated sequentially by the multi-zone transition probability-based indicator simulations. Finally, the result of this study provides the heterogeneous structure of the alluvial fan for further study of flow and transport simulations.« less

  9. Toward cost-efficient sampling methods

    NASA Astrophysics Data System (ADS)

    Luo, Peng; Li, Yongli; Wu, Chong; Zhang, Guijie

    2015-09-01

    The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper proposes two new sampling methods based on the idea that a small part of vertices with high node degree could possess the most structure information of a complex network. The two proposed sampling methods are efficient in sampling high degree nodes so that they would be useful even if the sampling rate is low, which means cost-efficient. The first new sampling method is developed on the basis of the widely used stratified random sampling (SRS) method and the second one improves the famous snowball sampling (SBS) method. In order to demonstrate the validity and accuracy of two new sampling methods, we compare them with the existing sampling methods in three commonly used simulation networks that are scale-free network, random network, small-world network, and also in two real networks. The experimental results illustrate that the two proposed sampling methods perform much better than the existing sampling methods in terms of achieving the true network structure characteristics reflected by clustering coefficient, Bonacich centrality and average path length, especially when the sampling rate is low.

  10. Analysis strategies for longitudinal attachment loss data.

    PubMed

    Beck, J D; Elter, J R

    2000-02-01

    The purpose of this invited review is to describe and discuss methods currently in use to quantify the progression of attachment loss in epidemiological studies of periodontal disease, and to make recommendations for specific analytic methods based upon the particular design of the study and structure of the data. The review concentrates on the definition of incident attachment loss (ALOSS) and its component parts; measurement issues including thresholds and regression to the mean; methods of accounting for longitudinal change, including changes in means, changes in proportions of affected sites, incidence density, the effect of tooth loss and reversals, and repeated events; statistical models of longitudinal change, including the incorporation of the time element, use of linear, logistic or Poisson regression or survival analysis, and statistical tests; site vs person level of analysis, including statistical adjustment for correlated data; the strengths and limitations of ALOSS data. Examples from the Piedmont 65+ Dental Study are used to illustrate specific concepts. We conclude that incidence density is the preferred methodology to use for periodontal studies with more than one period of follow-up and that the use of studies not employing methods for dealing with complex samples, correlated data, and repeated measures does not take advantage of our current understanding of the site- and person-level variables important in periodontal disease and may generate biased results.

  11. High Order Entropy-Constrained Residual VQ for Lossless Compression of Images

    NASA Technical Reports Server (NTRS)

    Kossentini, Faouzi; Smith, Mark J. T.; Scales, Allen

    1995-01-01

    High order entropy coding is a powerful technique for exploiting high order statistical dependencies. However, the exponentially high complexity associated with such a method often discourages its use. In this paper, an entropy-constrained residual vector quantization method is proposed for lossless compression of images. The method consists of first quantizing the input image using a high order entropy-constrained residual vector quantizer and then coding the residual image using a first order entropy coder. The distortion measure used in the entropy-constrained optimization is essentially the first order entropy of the residual image. Experimental results show very competitive performance.

  12. Remote sensing of environmental particulate pollutants - Optical methods for determinations of size distribution and complex refractive index

    NASA Technical Reports Server (NTRS)

    Fymat, A. L.

    1978-01-01

    A unifying approach, based on a generalization of Pearson's differential equation of statistical theory, is proposed for both the representation of particulate size distribution and the interpretation of radiometric measurements in terms of this parameter. A single-parameter gamma-type distribution is introduced, and it is shown that inversion can only provide the dimensionless parameter, r/ab (where r = particle radius, a = effective radius, b = effective variance), at least when the distribution vanishes at both ends. The basic inversion problem in reconstructing the particle size distribution is analyzed, and the existing methods are reviewed (with emphasis on their capabilities) and classified. A two-step strategy is proposed for simultaneously determining the complex refractive index and reconstructing the size distribution of atmospheric particulates.

  13. An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals.

    PubMed

    Parsons, Nick R; Price, Charlotte L; Hiskens, Richard; Achten, Juul; Costa, Matthew L

    2012-04-25

    The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10-26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30-49%) of studies a different analysis should have been undertaken and in 17% (10-26%) a different analysis could have made a difference to the overall conclusions. It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.

  14. Nonnormality and Divergence in Posttreatment Alcohol Use

    PubMed Central

    Witkiewitz, Katie; van der Maas, Han L. J.; Hufford, Michael R.; Marlatt, G. Alan

    2007-01-01

    Alcohol lapses are the modal outcome following treatment for alcohol use disorders, yet many alcohol researchers have encountered limited success in the prediction and prevention of relapse. One hypothesis is that lapses are unpredictable, but another possibility is the complexity of the relapse process is not captured by traditional statistical methods. Data from Project Matching Alcohol Treatments to Client Heterogeneity (Project MATCH), a multisite alcohol treatment study, were reanalyzed with 2 statistical methodologies: catastrophe and 2-part growth mixture modeling. Drawing on previous investigations of self-efficacy as a dynamic predictor of relapse, the current study revisits the self-efficacy matching hypothesis, which was not statistically supported in Project MATCH. Results from both the catastrophe and growth mixture analyses demonstrated a dynamic relationship between self-efficacy and drinking outcomes. The growth mixture analyses provided evidence in support of the original matching hypothesis: Individuals with lower self-efficacy who received cognitive behavior therapy drank far less frequently than did those with low self-efficacy who received motivational therapy. These results highlight the dynamical nature of the relapse process and the importance of the use of methodologies that accommodate this complexity when evaluating treatment outcomes. PMID:17516769

  15. Phenomenological model to fit complex permittivity data of water from radio to optical frequencies.

    PubMed

    Shubitidze, Fridon; Osterberg, Ulf

    2007-04-01

    A general factorized form of the dielectric function together with a fractional model-based parameter estimation method is used to provide an accurate analytical formula for the complex refractive index in water for the frequency range 10(8)-10(16)Hz . The analytical formula is derived using a combination of a microscopic frequency-dependent rational function for adjusting zeros and poles of the dielectric dispersion together with the macroscopic statistical Fermi-Dirac distribution to provide a description of both the real and imaginary parts of the complex permittivity for water. The Fermi-Dirac distribution allows us to model the dramatic reduction in the imaginary part of the permittivity in the visible window of the water spectrum.

  16. Metrics for evaluation of the author's writing styles: Who is the best?

    NASA Astrophysics Data System (ADS)

    Darooneh, Amir H.; Shariati, Ashrafosadat

    2014-09-01

    Studying the complexity of language has attracted the physicist's attention recently. The methods borrowed from the statistical mechanics; namely, the complex network theory, can be used for exploring the regularities as a characteristic of complexity of language. In this paper, we focus on the authorship identification by using the complex network approach. We introduce three metrics which enable us for comparison the author's writing styles. This approach was previously used by us for finding the author of unknown book among collection of thirty six books written by five Persian poets. Here, we select a collection of one hundred and one books of nine English writers and quantify their writing styles according to our metrics. In our experiment, Shakespeare appears as the best author who follows a unique writing style in all of his works.

  17. Weighted fractional permutation entropy and fractional sample entropy for nonlinear Potts financial dynamics

    NASA Astrophysics Data System (ADS)

    Xu, Kaixuan; Wang, Jun

    2017-02-01

    In this paper, recently introduced permutation entropy and sample entropy are further developed to the fractional cases, weighted fractional permutation entropy (WFPE) and fractional sample entropy (FSE). The fractional order generalization of information entropy is utilized in the above two complexity approaches, to detect the statistical characteristics of fractional order information in complex systems. The effectiveness analysis of proposed methods on the synthetic data and the real-world data reveals that tuning the fractional order allows a high sensitivity and more accurate characterization to the signal evolution, which is useful in describing the dynamics of complex systems. Moreover, the numerical research on nonlinear complexity behaviors is compared between the returns series of Potts financial model and the actual stock markets. And the empirical results confirm the feasibility of the proposed model.

  18. Complexity-entropy causality plane: A useful approach for distinguishing songs

    NASA Astrophysics Data System (ADS)

    Ribeiro, Haroldo V.; Zunino, Luciano; Mendes, Renio S.; Lenzi, Ervin K.

    2012-04-01

    Nowadays we are often faced with huge databases resulting from the rapid growth of data storage technologies. This is particularly true when dealing with music databases. In this context, it is essential to have techniques and tools able to discriminate properties from these massive sets. In this work, we report on a statistical analysis of more than ten thousand songs aiming to obtain a complexity hierarchy. Our approach is based on the estimation of the permutation entropy combined with an intensive complexity measure, building up the complexity-entropy causality plane. The results obtained indicate that this representation space is very promising to discriminate songs as well as to allow a relative quantitative comparison among songs. Additionally, we believe that the here-reported method may be applied in practical situations since it is simple, robust and has a fast numerical implementation.

  19. Metrics for evaluation of the author's writing styles: who is the best?

    PubMed

    Darooneh, Amir H; Shariati, Ashrafosadat

    2014-09-01

    Studying the complexity of language has attracted the physicist's attention recently. The methods borrowed from the statistical mechanics; namely, the complex network theory, can be used for exploring the regularities as a characteristic of complexity of language. In this paper, we focus on the authorship identification by using the complex network approach. We introduce three metrics which enable us for comparison the author's writing styles. This approach was previously used by us for finding the author of unknown book among collection of thirty six books written by five Persian poets. Here, we select a collection of one hundred and one books of nine English writers and quantify their writing styles according to our metrics. In our experiment, Shakespeare appears as the best author who follows a unique writing style in all of his works.

  20. Multivariate analysis in thoracic research.

    PubMed

    Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego

    2015-03-01

    Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.

  1. An operational GLS model for hydrologic regression

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1989-01-01

    Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.

  2. Neuronal Correlation Parameter and the Idea of Thermodynamic Entropy of an N-Body Gravitationally Bounded System.

    PubMed

    Haranas, Ioannis; Gkigkitzis, Ioannis; Kotsireas, Ilias; Austerlitz, Carlos

    2017-01-01

    Understanding how the brain encodes information and performs computation requires statistical and functional analysis. Given the complexity of the human brain, simple methods that facilitate the interpretation of statistical correlations among different brain regions can be very useful. In this report we introduce a numerical correlation measure that may serve the interpretation of correlational neuronal data, and may assist in the evaluation of different brain states. The description of the dynamical brain system, through a global numerical measure may indicate the presence of an action principle which may facilitate a application of physics principles in the study of the human brain and cognition.

  3. Stokes-correlometry of polarization-inhomogeneous objects

    NASA Astrophysics Data System (ADS)

    Ushenko, O. G.; Dubolazov, A.; Bodnar, G. B.; Bachynskiy, V. T.; Vanchulyak, O.

    2018-01-01

    The paper consists of two parts. The first part presents short theoretical basics of the method of Stokes-correlometry description of optical anisotropy of biological tissues. It was provided experimentally measured coordinate distributions of modulus (MSV) and phase (PhSV) of complex Stokes vector of skeletal muscle tissue. It was defined the values and ranges of changes of statistic moments of the 1st-4th orders, which characterize the distributions of values of MSV and PhSV. The second part presents the data of statistic analysis of the distributions of modulus MSV and PhSV. It was defined the objective criteria of differentiation of samples with urinary incontinence.

  4. Kernel-based whole-genome prediction of complex traits: a review.

    PubMed

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  5. Predictive Big Data Analytics: A Study of Parkinson’s Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations

    PubMed Central

    Dinov, Ivo D.; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W.; Price, Nathan D.; Van Horn, John D.; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M.; Dauer, William; Toga, Arthur W.

    2016-01-01

    Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications. PMID:27494614

  6. Gene- and pathway-based association tests for multiple traits with GWAS summary statistics.

    PubMed

    Kwak, Il-Youp; Pan, Wei

    2017-01-01

    To identify novel genetic variants associated with complex traits and to shed new insights on underlying biology, in addition to the most popular single SNP-single trait association analysis, it would be useful to explore multiple correlated (intermediate) traits at the gene- or pathway-level by mining existing single GWAS or meta-analyzed GWAS data. For this purpose, we present an adaptive gene-based test and a pathway-based test for association analysis of multiple traits with GWAS summary statistics. The proposed tests are adaptive at both the SNP- and trait-levels; that is, they account for possibly varying association patterns (e.g. signal sparsity levels) across SNPs and traits, thus maintaining high power across a wide range of situations. Furthermore, the proposed methods are general: they can be applied to mixed types of traits, and to Z-statistics or P-values as summary statistics obtained from either a single GWAS or a meta-analysis of multiple GWAS. Our numerical studies with simulated and real data demonstrated the promising performance of the proposed methods. The methods are implemented in R package aSPU, freely and publicly available at: https://cran.r-project.org/web/packages/aSPU/ CONTACT: weip@biostat.umn.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Method of analysis of local neuronal circuits in the vertebrate central nervous system.

    PubMed

    Reinis, S; Weiss, D S; McGaraughty, S; Tsoukatos, J

    1992-06-01

    Although a considerable amount of knowledge has been accumulated about the activity of individual nerve cells in the brain, little is known about their mutual interactions at the local level. The method presented in this paper allows the reconstruction of functional relations within a group of neurons as recorded by a single microelectrode. Data are sampled at 10 or 13 kHz. Prominent spikes produced by one or more single cells are selected and sorted by K-means cluster analysis. The activities of single cells are then related to the background firing of neurons in their vicinity. Auto-correlograms of the leading cells, auto-correlograms of the background cells (mass correlograms) and cross-correlograms between these two levels of firing are computed and evaluated. The statistical probability of mutual interactions is determined, and the statistically significant, most common interspike intervals are stored and attributed to real pairs of spikes in the original record. Selected pairs of spikes, characterized by statistically significant intervals between them, are then assembled into a working model of the system. This method has revealed substantial differences between the information processing in the visual cortex, the inferior colliculus, the rostral ventromedial medulla and the ventrobasal complex of the thalamus. Even short 1-s records of the multiple neuronal activity may provide meaningful and statistically significant results.

  8. Topographic relationships for design rainfalls over Australia

    NASA Astrophysics Data System (ADS)

    Johnson, F.; Hutchinson, M. F.; The, C.; Beesley, C.; Green, J.

    2016-02-01

    Design rainfall statistics are the primary inputs used to assess flood risk across river catchments. These statistics normally take the form of Intensity-Duration-Frequency (IDF) curves that are derived from extreme value probability distributions fitted to observed daily, and sub-daily, rainfall data. The design rainfall relationships are often required for catchments where there are limited rainfall records, particularly catchments in remote areas with high topographic relief and hence some form of interpolation is required to provide estimates in these areas. This paper assesses the topographic dependence of rainfall extremes by using elevation-dependent thin plate smoothing splines to interpolate the mean annual maximum rainfall, for periods from one to seven days, across Australia. The analyses confirm the important impact of topography in explaining the spatial patterns of these extreme rainfall statistics. Continent-wide residual and cross validation statistics are used to demonstrate the 100-fold impact of elevation in relation to horizontal coordinates in explaining the spatial patterns, consistent with previous rainfall scaling studies and observational evidence. The impact of the complexity of the fitted spline surfaces, as defined by the number of knots, and the impact of applying variance stabilising transformations to the data, were also assessed. It was found that a relatively large number of 3570 knots, suitably chosen from 8619 gauge locations, was required to minimise the summary error statistics. Square root and log data transformations were found to deliver marginally superior continent-wide cross validation statistics, in comparison to applying no data transformation, but detailed assessments of residuals in complex high rainfall regions with high topographic relief showed that no data transformation gave superior performance in these regions. These results are consistent with the understanding that in areas with modest topographic relief, as for most of the Australian continent, extreme rainfall is closely aligned with elevation, but in areas with high topographic relief the impacts of topography on rainfall extremes are more complex. The interpolated extreme rainfall statistics, using no data transformation, have been used by the Australian Bureau of Meteorology to produce new IDF data for the Australian continent. The comprehensive methods presented for the evaluation of gridded design rainfall statistics will be useful for similar studies, in particular the importance of balancing the need for a continentally-optimum solution that maintains sufficient definition at the local scale.

  9. Normalization, bias correction, and peak calling for ChIP-seq

    PubMed Central

    Diaz, Aaron; Park, Kiyoub; Lim, Daniel A.; Song, Jun S.

    2012-01-01

    Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods. PMID:22499706

  10. Estimating the expected value of partial perfect information in health economic evaluations using integrated nested Laplace approximation.

    PubMed

    Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca

    2016-10-15

    The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the 'cost' of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  11. How Good Are Statistical Models at Approximating Complex Fitness Landscapes?

    PubMed Central

    du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian

    2016-01-01

    Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564

  12. Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

    NASA Technical Reports Server (NTRS)

    Pohorille, Andrew

    2006-01-01

    The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted

  13. TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies

    PubMed Central

    van der Sluis, Sophie; Posthuma, Danielle; Dolan, Conor V.

    2013-01-01

    To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor. PMID:23359524

  14. Combining statistics from two national complex surveys to estimate injury rates per hour exposed and variance by activity in the USA.

    PubMed

    Lin, Tin-Chi; Marucci-Wellman, Helen R; Willetts, Joanna L; Brennan, Melanye J; Verma, Santosh K

    2016-12-01

    A common issue in descriptive injury epidemiology is that in order to calculate injury rates that account for the time spent in an activity, both injury cases and exposure time of specific activities need to be collected. In reality, few national surveys have this capacity. To address this issue, we combined statistics from two different national complex surveys as inputs for the numerator and denominator to estimate injury rate, accounting for the time spent in specific activities and included a procedure to estimate variance using the combined surveys. The 2010 National Health Interview Survey (NHIS) was used to quantify injuries, and the 2010 American Time Use Survey (ATUS) was used to quantify time of exposure to specific activities. The injury rate was estimated by dividing the average number of injuries (from NHIS) by average exposure hours (from ATUS), both measured for specific activities. The variance was calculated using the 'delta method', a general method for variance estimation with complex surveys. Among the five types of injuries examined, 'sport and exercise' had the highest rate (12.64 injuries per 100 000 h), followed by 'working around house/yard' (6.14), driving/riding a motor vehicle (2.98), working (1.45) and sleeping/resting/eating/drinking (0.23). The results show a ranking of injury rate by activity quite different from estimates using population as the denominator. Our approach produces an estimate of injury risk which includes activity exposure time and may more reliably reflect the underlying injury risks, offering an alternative method for injury surveillance and research. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  15. An efficient empirical Bayes method for genomewide association studies.

    PubMed

    Wang, Q; Wei, J; Pan, Y; Xu, S

    2016-08-01

    Linear mixed model (LMM) is one of the most popular methods for genomewide association studies (GWAS). Numerous forms of LMM have been developed; however, there are two major issues in GWAS that have not been fully addressed before. The two issues are (i) the genomic background noise and (ii) low statistical power after Bonferroni correction. We proposed an empirical Bayes (EB) method by assigning each marker effect a normal prior distribution, resulting in shrinkage estimates of marker effects. We found that such a shrinkage approach can selectively shrink marker effects and reduce the noise level to zero for majority of non-associated markers. In the meantime, the EB method allows us to use an 'effective number of tests' to perform Bonferroni correction for multiple tests. Simulation studies for both human and pig data showed that EB method can significantly increase statistical power compared with the widely used exact GWAS methods, such as GEMMA and FaST-LMM-Select. Real data analyses in human breast cancer identified improved detection signals for markers previously known to be associated with breast cancer. We therefore believe that EB method is a valuable tool for identifying the genetic basis of complex traits. © 2015 Blackwell Verlag GmbH.

  16. A Guide to the Literature on Learning Graphical Models

    NASA Technical Reports Server (NTRS)

    Buntine, Wray L.; Friedland, Peter (Technical Monitor)

    1994-01-01

    This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and more generally, learning probabilistic graphical models. Because many problems in artificial intelligence, statistics and neural networks can be represented as a probabilistic graphical model, this area provides a unifying perspective on learning. This paper organizes the research in this area along methodological lines of increasing complexity.

  17. A Preliminary Bayesian Analysis of Incomplete Longitudinal Data from a Small Sample: Methodological Advances in an International Comparative Study of Educational Inequality

    ERIC Educational Resources Information Center

    Hsieh, Chueh-An; Maier, Kimberly S.

    2009-01-01

    The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection.…

  18. Econometrics of joint production: another approach. [Petroleum refining and petrochemicals production

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Griffin, J.M.

    1977-11-01

    The pseudo data approach to the joint production of petroleum refining and chemicals is described as an alternative that avoids the multicollinearity of time series data and allows a complex technology to be characterized in a statistical price possibility frontier. Intended primarily for long-range analysis, the pseudo data method can be used as a source of elasticity estimate for policy analysis. 19 references.

  19. Temporal Comparisons of Internet Topology

    DTIC Science & Technology

    2014-06-01

    Number CAIDA Cooperative Association of Internet Data Analysis CDN Content Delivery Network CI Confidence Interval DoS denial of service GMT Greenwich...the CAIDA data. Our methods include analysis of graph theoretical measures as well as complex network and statistical measures that will quantify the...tool that probes the Internet for topology analysis and performance [26]. Scamper uses network diagnostic tools, such as traceroute and ping, to probe

  20. Benchmarking quantitative label-free LC-MS data processing workflows using a complex spiked proteomic standard dataset.

    PubMed

    Ramus, Claire; Hovasse, Agnès; Marcellin, Marlène; Hesse, Anne-Marie; Mouton-Barbosa, Emmanuelle; Bouyssié, David; Vaca, Sebastian; Carapito, Christine; Chaoui, Karima; Bruley, Christophe; Garin, Jérôme; Cianférani, Sarah; Ferro, Myriam; Van Dorssaeler, Alain; Burlet-Schiltz, Odile; Schaeffer, Christine; Couté, Yohann; Gonzalez de Peredo, Anne

    2016-01-30

    Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    PubMed

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.

  2. Teaching Statistics--Despite Its Applications

    ERIC Educational Resources Information Center

    Ridgway, Jim; Nicholson, James; McCusker, Sean

    2007-01-01

    Evidence-based policy requires sophisticated modelling and reasoning about complex social data. The current UK statistics curricula do not equip tomorrow's citizens to understand such reasoning. We advocate radical curriculum reform, designed to require students to reason from complex data.

  3. Etoile Project : Social Intelligent ICT-System for very large scale education in complex systems

    NASA Astrophysics Data System (ADS)

    Bourgine, P.; Johnson, J.

    2009-04-01

    The project will devise new theory and implement new ICT-based methods of delivering high-quality low-cost postgraduate education to many thousands of people in a scalable way, with the cost of each extra student being negligible (< a few Euros). The research will create an in vivo laboratory of one to ten thousand postgraduate students studying courses in complex systems. This community is chosen because it is large and interdisciplinary and there is a known requirement for courses for thousand of students across Europe. The project involves every aspect of course production and delivery. Within this the research focused on the creation of a Socially Intelligent Resource Mining system to gather large volumes of high quality educational resources from the internet; new methods to deconstruct these to produce a semantically tagged Learning Object Database; a Living Course Ecology to support the creation and maintenance of evolving course materials; systems to deliver courses; and a ‘socially intelligent assessment system'. The system will be tested on one to ten thousand postgraduate students in Europe working towards the Complex System Society's title of European PhD in Complex Systems. Étoile will have a very high impact both scientifically and socially by (i) the provision of new scalable ICT-based methods for providing very low cost scientific education, (ii) the creation of new mathematical and statistical theory for the multiscale dynamics of complex systems, (iii) the provision of a working example of adaptation and emergence in complex socio-technical systems, and (iv) making a major educational contribution to European complex systems science and its applications.

  4. Understanding characteristics in multivariate traffic flow time series from complex network structure

    NASA Astrophysics Data System (ADS)

    Yan, Ying; Zhang, Shen; Tang, Jinjun; Wang, Xiaofei

    2017-07-01

    Discovering dynamic characteristics in traffic flow is the significant step to design effective traffic managing and controlling strategy for relieving traffic congestion in urban cities. A new method based on complex network theory is proposed to study multivariate traffic flow time series. The data were collected from loop detectors on freeway during a year. In order to construct complex network from original traffic flow, a weighted Froenius norm is adopt to estimate similarity between multivariate time series, and Principal Component Analysis is implemented to determine the weights. We discuss how to select optimal critical threshold for networks at different hour in term of cumulative probability distribution of degree. Furthermore, two statistical properties of networks: normalized network structure entropy and cumulative probability of degree, are utilized to explore hourly variation in traffic flow. The results demonstrate these two statistical quantities express similar pattern to traffic flow parameters with morning and evening peak hours. Accordingly, we detect three traffic states: trough, peak and transitional hours, according to the correlation between two aforementioned properties. The classifying results of states can actually represent hourly fluctuation in traffic flow by analyzing annual average hourly values of traffic volume, occupancy and speed in corresponding hours.

  5. Kernel methods and flexible inference for complex stochastic dynamics

    NASA Astrophysics Data System (ADS)

    Capobianco, Enrico

    2008-07-01

    Approximation theory suggests that series expansions and projections represent standard tools for random process applications from both numerical and statistical standpoints. Such instruments emphasize the role of both sparsity and smoothness for compression purposes, the decorrelation power achieved in the expansion coefficients space compared to the signal space, and the reproducing kernel property when some special conditions are met. We consider these three aspects central to the discussion in this paper, and attempt to analyze the characteristics of some known approximation instruments employed in a complex application domain such as financial market time series. Volatility models are often built ad hoc, parametrically and through very sophisticated methodologies. But they can hardly deal with stochastic processes with regard to non-Gaussianity, covariance non-stationarity or complex dependence without paying a big price in terms of either model mis-specification or computational efficiency. It is thus a good idea to look at other more flexible inference tools; hence the strategy of combining greedy approximation and space dimensionality reduction techniques, which are less dependent on distributional assumptions and more targeted to achieve computationally efficient performances. Advantages and limitations of their use will be evaluated by looking at algorithmic and model building strategies, and by reporting statistical diagnostics.

  6. Learning Predictive Statistics: Strategies and Brain Mechanisms.

    PubMed

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-08-30

    When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to changes in the environment's statistics. We provide evidence for an alternate route for learning complex temporal statistics: extracting the most probable outcome in a given context is implemented by interactions between executive and motor corticostriatal mechanisms compared with visual corticostriatal circuits (including hippocampal cortex) that support learning of the exact temporal statistics. Copyright © 2017 Wang et al.

  7. Cortical Auditory Evoked Potentials with Simple (Tone Burst) and Complex (Speech) Stimuli in Children with Cochlear Implant

    PubMed Central

    Martins, Kelly Vasconcelos Chaves; Gil, Daniela

    2017-01-01

    Introduction  The registry of the component P1 of the cortical auditory evoked potential has been widely used to analyze the behavior of auditory pathways in response to cochlear implant stimulation. Objective  To determine the influence of aural rehabilitation in the parameters of latency and amplitude of the P1 cortical auditory evoked potential component elicited by simple auditory stimuli (tone burst) and complex stimuli (speech) in children with cochlear implants. Method  The study included six individuals of both genders aged 5 to 10 years old who have been cochlear implant users for at least 12 months, and who attended auditory rehabilitation with an aural rehabilitation therapy approach. Participants were submitted to research of the cortical auditory evoked potential at the beginning of the study and after 3 months of aural rehabilitation. To elicit the responses, simple stimuli (tone burst) and complex stimuli (speech) were used and presented in free field at 70 dB HL. The results were statistically analyzed, and both evaluations were compared. Results  There was no significant difference between the type of eliciting stimulus of the cortical auditory evoked potential for the latency and the amplitude of P1. There was a statistically significant difference in the P1 latency between the evaluations for both stimuli, with reduction of the latency in the second evaluation after 3 months of auditory rehabilitation. There was no statistically significant difference regarding the amplitude of P1 under the two types of stimuli or in the two evaluations. Conclusion  A decrease in latency of the P1 component elicited by both simple and complex stimuli was observed within a three-month interval in children with cochlear implant undergoing aural rehabilitation. PMID:29018498

  8. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

    PubMed

    Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

    2015-01-08

    Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. A methodology for the design of experiments in computational intelligence with multiple regression models.

    PubMed

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  10. A methodology for the design of experiments in computational intelligence with multiple regression models

    PubMed Central

    Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952

  11. Change detection in a time series of polarimetric SAR data by an omnibus test statistic and its factorization (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Nielsen, Allan A.; Conradsen, Knut; Skriver, Henning

    2016-10-01

    Test statistics for comparison of real (as opposed to complex) variance-covariance matrices exist in the statistics literature [1]. In earlier publications we have described a test statistic for the equality of two variance-covariance matrices following the complex Wishart distribution with an associated p-value [2]. We showed their application to bitemporal change detection and to edge detection [3] in multilook, polarimetric synthetic aperture radar (SAR) data in the covariance matrix representation [4]. The test statistic and the associated p-value is described in [5] also. In [6] we focussed on the block-diagonal case, we elaborated on some computer implementation issues, and we gave examples on the application to change detection in both full and dual polarization bitemporal, bifrequency, multilook SAR data. In [7] we described an omnibus test statistic Q for the equality of k variance-covariance matrices following the complex Wishart distribution. We also described a factorization of Q = R2 R3 … Rk where Q and Rj determine if and when a difference occurs. Additionally, we gave p-values for Q and Rj. Finally, we demonstrated the use of Q and Rj and the p-values to change detection in truly multitemporal, full polarization SAR data. Here we illustrate the methods by means of airborne L-band SAR data (EMISAR) [8,9]. The methods may be applied to other polarimetric SAR data also such as data from Sentinel-1, COSMO-SkyMed, TerraSAR-X, ALOS, and RadarSat-2 and also to single-pol data. The account given here closely follows that given our recent IEEE TGRS paper [7]. Selected References [1] Anderson, T. W., An Introduction to Multivariate Statistical Analysis, John Wiley, New York, third ed. (2003). [2] Conradsen, K., Nielsen, A. A., Schou, J., and Skriver, H., "A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data," IEEE Transactions on Geoscience and Remote Sensing 41(1): 4-19, 2003. [3] Schou, J., Skriver, H., Nielsen, A. A., and Conradsen, K., "CFAR edge detector for polarimetric SAR images," IEEE Transactions on Geoscience and Remote Sensing 41(1): 20-32, 2003. [4] van Zyl, J. J. and Ulaby, F. T., "Scattering matrix representation for simple targets," in Radar Polarimetry for Geoscience Applications, Ulaby, F. T. and Elachi, C., eds., Artech, Norwood, MA (1990). [5] Canty, M. J., Image Analysis, Classification and Change Detection in Remote Sensing,with Algorithms for ENVI/IDL and Python, Taylor & Francis, CRC Press, third revised ed. (2014). [6] Nielsen, A. A., Conradsen, K., and Skriver, H., "Change detection in full and dual polarization, single- and multi-frequency SAR data," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 8(8): 4041-4048, 2015. [7] Conradsen, K., Nielsen, A. A., and Skriver, H., "Determining the points of change in time series of polarimetric SAR data," IEEE Transactions on Geoscience and Remote Sensing 54(5), 3007-3024, 2016. [9] Christensen, E. L., Skou, N., Dall, J., Woelders, K., rgensen, J. H. J., Granholm, J., and Madsen, S. N., "EMISAR: An absolutely calibrated polarimetric L- and C-band SAR," IEEE Transactions on Geoscience and Remote Sensing 36: 1852-1865 (1998).

  12. Probing the space of toric quiver theories

    NASA Astrophysics Data System (ADS)

    Hewlett, Joseph; He, Yang-Hui

    2010-03-01

    We demonstrate a practical and efficient method for generating toric Calabi-Yau quiver theories, applicable to both D3 and M2 brane world-volume physics. A new analytic method is presented at low order parametres and an algorithm for the general case is developed which has polynomial complexity in the number of edges in the quiver. Using this algorithm, carefully implemented, we classify the quiver diagram and assign possible superpotentials for various small values of the number of edges and nodes. We examine some preliminary statistics on this space of toric quiver theories.

  13. Nonparametric method for failures diagnosis in the actuating subsystem of aircraft control system

    NASA Astrophysics Data System (ADS)

    Terentev, M. N.; Karpenko, S. S.; Zybin, E. Yu; Kosyanchuk, V. V.

    2018-02-01

    In this paper we design a nonparametric method for failures diagnosis in the aircraft control system that uses the measurements of the control signals and the aircraft states only. It doesn’t require a priori information of the aircraft model parameters, training or statistical calculations, and is based on analytical nonparametric one-step-ahead state prediction approach. This makes it possible to predict the behavior of unidentified and failure dynamic systems, to weaken the requirements to control signals, and to reduce the diagnostic time and problem complexity.

  14. Selected-node stochastic simulation algorithm

    NASA Astrophysics Data System (ADS)

    Duso, Lorenzo; Zechner, Christoph

    2018-04-01

    Stochastic simulations of biochemical networks are of vital importance for understanding complex dynamics in cells and tissues. However, existing methods to perform such simulations are associated with computational difficulties and addressing those remains a daunting challenge to the present. Here we introduce the selected-node stochastic simulation algorithm (snSSA), which allows us to exclusively simulate an arbitrary, selected subset of molecular species of a possibly large and complex reaction network. The algorithm is based on an analytical elimination of chemical species, thereby avoiding explicit simulation of the associated chemical events. These species are instead described continuously in terms of statistical moments derived from a stochastic filtering equation, resulting in a substantial speedup when compared to Gillespie's stochastic simulation algorithm (SSA). Moreover, we show that statistics obtained via snSSA profit from a variance reduction, which can significantly lower the number of Monte Carlo samples needed to achieve a certain performance. We demonstrate the algorithm using several biological case studies for which the simulation time could be reduced by orders of magnitude.

  15. Protein and gene model inference based on statistical modeling in k-partite graphs.

    PubMed

    Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter

    2010-07-06

    One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.

  16. Untargeted Metabolic Quantitative Trait Loci Analyses Reveal a Relationship between Primary Metabolism and Potato Tuber Quality1[W][OA

    PubMed Central

    Carreno-Quintero, Natalia; Acharjee, Animesh; Maliepaard, Chris; Bachem, Christian W.B.; Mumm, Roland; Bouwmeester, Harro; Visser, Richard G.F.; Keurentjes, Joost J.B.

    2012-01-01

    Recent advances in -omics technologies such as transcriptomics, metabolomics, and proteomics along with genotypic profiling have permitted dissection of the genetics of complex traits represented by molecular phenotypes in nonmodel species. To identify the genetic factors underlying variation in primary metabolism in potato (Solanum tuberosum), we have profiled primary metabolite content in a diploid potato mapping population, derived from crosses between S. tuberosum and wild relatives, using gas chromatography-time of flight-mass spectrometry. In total, 139 polar metabolites were detected, of which we identified metabolite quantitative trait loci for approximately 72% of the detected compounds. In order to obtain an insight into the relationships between metabolic traits and classical phenotypic traits, we also analyzed statistical associations between them. The combined analysis of genetic information through quantitative trait locus coincidence and the application of statistical learning methods provide information on putative indicators associated with the alterations in metabolic networks that affect complex phenotypic traits. PMID:22223596

  17. Coupled disease-behavior dynamics on complex networks: A review.

    PubMed

    Wang, Zhen; Andrews, Michael A; Wu, Zhi-Xi; Wang, Lin; Bauch, Chris T

    2015-12-01

    It is increasingly recognized that a key component of successful infection control efforts is understanding the complex, two-way interaction between disease dynamics and human behavioral and social dynamics. Human behavior such as contact precautions and social distancing clearly influence disease prevalence, but disease prevalence can in turn alter human behavior, forming a coupled, nonlinear system. Moreover, in many cases, the spatial structure of the population cannot be ignored, such that social and behavioral processes and/or transmission of infection must be represented with complex networks. Research on studying coupled disease-behavior dynamics in complex networks in particular is growing rapidly, and frequently makes use of analysis methods and concepts from statistical physics. Here, we review some of the growing literature in this area. We contrast network-based approaches to homogeneous-mixing approaches, point out how their predictions differ, and describe the rich and often surprising behavior of disease-behavior dynamics on complex networks, and compare them to processes in statistical physics. We discuss how these models can capture the dynamics that characterize many real-world scenarios, thereby suggesting ways that policy makers can better design effective prevention strategies. We also describe the growing sources of digital data that are facilitating research in this area. Finally, we suggest pitfalls which might be faced by researchers in the field, and we suggest several ways in which the field could move forward in the coming years. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data

    PubMed Central

    Zhu, Yun; Fan, Ruzong; Xiong, Momiao

    2017-01-01

    Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274

  19. Ladar imaging detection of salient map based on PWVD and Rényi entropy

    NASA Astrophysics Data System (ADS)

    Xu, Yuannan; Zhao, Yuan; Deng, Rong; Dong, Yanbing

    2013-10-01

    Spatial-frequency information of a given image can be extracted by associating the grey-level spatial data with one of the well-known spatial/spatial-frequency distributions. The Wigner-Ville distribution (WVD) has a good characteristic that the images can be represented in spatial/spatial-frequency domains. For intensity and range images of ladar, through the pseudo Wigner-Ville distribution (PWVD) using one or two dimension window, the statistical property of Rényi entropy is studied. We also analyzed the change of Rényi entropy's statistical property in the ladar intensity and range images when the man-made objects appear. From this foundation, a novel method for generating saliency map based on PWVD and Rényi entropy is proposed. After that, target detection is completed when the saliency map is segmented using a simple and convenient threshold method. For the ladar intensity and range images, experimental results show the proposed method can effectively detect the military vehicles from complex earth background with low false alarm.

  20. Genetic assignment methods for gaining insight into the management of infectious disease by understanding pathogen, vector, and host movement.

    PubMed

    Remais, Justin V; Xiao, Ning; Akullian, Adam; Qiu, Dongchuan; Blair, David

    2011-04-01

    For many pathogens with environmental stages, or those carried by vectors or intermediate hosts, disease transmission is strongly influenced by pathogen, host, and vector movements across complex landscapes, and thus quantitative measures of movement rate and direction can reveal new opportunities for disease management and intervention. Genetic assignment methods are a set of powerful statistical approaches useful for establishing population membership of individuals. Recent theoretical improvements allow these techniques to be used to cost-effectively estimate the magnitude and direction of key movements in infectious disease systems, revealing important ecological and environmental features that facilitate or limit transmission. Here, we review the theory, statistical framework, and molecular markers that underlie assignment methods, and we critically examine recent applications of assignment tests in infectious disease epidemiology. Research directions that capitalize on use of the techniques are discussed, focusing on key parameters needing study for improved understanding of patterns of disease.

  1. A fractional factorial probabilistic collocation method for uncertainty propagation of hydrologic model parameters in a reduced dimensional space

    NASA Astrophysics Data System (ADS)

    Wang, S.; Huang, G. H.; Huang, W.; Fan, Y. R.; Li, Z.

    2015-10-01

    In this study, a fractional factorial probabilistic collocation method is proposed to reveal statistical significance of hydrologic model parameters and their multi-level interactions affecting model outputs, facilitating uncertainty propagation in a reduced dimensional space. The proposed methodology is applied to the Xiangxi River watershed in China to demonstrate its validity and applicability, as well as its capability of revealing complex and dynamic parameter interactions. A set of reduced polynomial chaos expansions (PCEs) only with statistically significant terms can be obtained based on the results of factorial analysis of variance (ANOVA), achieving a reduction of uncertainty in hydrologic predictions. The predictive performance of reduced PCEs is verified by comparing against standard PCEs and the Monte Carlo with Latin hypercube sampling (MC-LHS) method in terms of reliability, sharpness, and Nash-Sutcliffe efficiency (NSE). Results reveal that the reduced PCEs are able to capture hydrologic behaviors of the Xiangxi River watershed, and they are efficient functional representations for propagating uncertainties in hydrologic predictions.

  2. Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.

    PubMed

    McCallum, Kenneth J; Ionita-Laza, Iuliana

    2015-12-01

    Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.

  3. Bayesian selection of Markov models for symbol sequences: application to microsaccadic eye movements.

    PubMed

    Bettenbühl, Mario; Rusconi, Marco; Engbert, Ralf; Holschneider, Matthias

    2012-01-01

    Complex biological dynamics often generate sequences of discrete events which can be described as a Markov process. The order of the underlying Markovian stochastic process is fundamental for characterizing statistical dependencies within sequences. As an example for this class of biological systems, we investigate the Markov order of sequences of microsaccadic eye movements from human observers. We calculate the integrated likelihood of a given sequence for various orders of the Markov process and use this in a Bayesian framework for statistical inference on the Markov order. Our analysis shows that data from most participants are best explained by a first-order Markov process. This is compatible with recent findings of a statistical coupling of subsequent microsaccade orientations. Our method might prove to be useful for a broad class of biological systems.

  4. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    PubMed

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  5. Focused Belief Measures for Uncertainty Quantification in High Performance Semantic Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joslyn, Cliff A.; Weaver, Jesse R.

    In web-scale semantic data analytics there is a great need for methods which aggregate uncertainty claims, on the one hand respecting the information provided as accurately as possible, while on the other still being tractable. Traditional statistical methods are more robust, but only represent distributional, additive uncertainty. Generalized information theory methods, including fuzzy systems and Dempster-Shafer (DS) evidence theory, represent multiple forms of uncertainty, but are computationally and methodologically difficult. We require methods which provide an effective balance between the complete representation of the full complexity of uncertainty claims in their interaction, while satisfying the needs of both computational complexitymore » and human cognition. Here we build on J{\\o}sang's subjective logic to posit methods in focused belief measures (FBMs), where a full DS structure is focused to a single event. The resulting ternary logical structure is posited to be able to capture the minimal amount of generalized complexity needed at a maximum of computational efficiency. We demonstrate the efficacy of this approach in a web ingest experiment over the 2012 Billion Triple dataset from the Semantic Web Challenge.« less

  6. A Not-So-Fundamental Limitation on Studying Complex Systems with Statistics: Comment on Rabin (2011)

    NASA Astrophysics Data System (ADS)

    Thomas, Drew M.

    2012-12-01

    Although living organisms are affected by many interrelated and unidentified variables, this complexity does not automatically impose a fundamental limitation on statistical inference. Nor need one invoke such complexity as an explanation of the "Truth Wears Off" or "decline" effect; similar "decline" effects occur with far simpler systems studied in physics. Selective reporting and publication bias, and scientists' biases in favor of reporting eye-catching results (in general) or conforming to others' results (in physics) better explain this feature of the "Truth Wears Off" effect than Rabin's suggested limitation on statistical inference.

  7. New Sensitive Kinetic Spectrophotometric Methods for Determination of Omeprazole in Dosage Forms

    PubMed Central

    Mahmoud, Ashraf M.

    2009-01-01

    New rapid, sensitive, and accurate kinetic spectrophotometric methods were developed, for the first time, to determine omeprazole (OMZ) in its dosage forms. The methods were based on the formation of charge-transfer complexes with both iodine and 2,3-dichloro-5,6-dicyano-1,4-benzoquinone (DDQ). The variables that affected the reactions were carefully studied and optimized. The formed complexes and the site of interaction were examined by UV/VIS, IR, and 1H-NMR techniques, and computational molecular modeling. Under optimum conditions, the stoichiometry of the reactions between OMZ and the acceptors was found to be 1 : 1. The order of the reactions and the specific rate constants were determined. The thermodynamics of the complexes were computed and the mechanism of the reactions was postulated. The initial rate and fixed time methods were utilized for the determination of OMZ concentrations. The linear ranges for the proposed methods were 0.10–3.00 and 0.50–25.00 μg mL−1 with the lowest LOD of 0.03 and 0.14 μg mL−1 for iodine and DDQ, respectively. Analytical performance of the methods was statistically validated; RSD was <1.25% for the precision and <1.95% for the accuracy. The proposed methods were successfully applied to the analysis of OMZ in its dosage forms; the recovery was 98.91–100.32% ± 0.94–1.84, and was found to be comparable with that of reference method. PMID:20140076

  8. Controls of multi-modal wave conditions in a complex coastal setting

    USGS Publications Warehouse

    Hegermiller, Christie; Rueda, Ana C.; Erikson, Li H.; Barnard, Patrick L.; Antolinez, J.A.A.; Mendez, Fernando J.

    2017-01-01

    Coastal hazards emerge from the combined effect of wave conditions and sea level anomalies associated with storms or low-frequency atmosphere-ocean oscillations. Rigorous characterization of wave climate is limited by the availability of spectral wave observations, the computational cost of dynamical simulations, and the ability to link wave-generating atmospheric patterns with coastal conditions. We present a hybrid statistical-dynamical approach to simulating nearshore wave climate in complex coastal settings, demonstrated in the Southern California Bight, where waves arriving from distant, disparate locations are refracted over complex bathymetry and shadowed by offshore islands. Contributions of wave families and large-scale atmospheric drivers to nearshore wave energy flux are analyzed. Results highlight the variability of influences controlling wave conditions along neighboring coastlines. The universal method demonstrated here can be applied to complex coastal settings worldwide, facilitating analysis of the effects of climate change on nearshore wave climate.

  9. Controls of Multimodal Wave Conditions in a Complex Coastal Setting

    NASA Astrophysics Data System (ADS)

    Hegermiller, C. A.; Rueda, A.; Erikson, L. H.; Barnard, P. L.; Antolinez, J. A. A.; Mendez, F. J.

    2017-12-01

    Coastal hazards emerge from the combined effect of wave conditions and sea level anomalies associated with storms or low-frequency atmosphere-ocean oscillations. Rigorous characterization of wave climate is limited by the availability of spectral wave observations, the computational cost of dynamical simulations, and the ability to link wave-generating atmospheric patterns with coastal conditions. We present a hybrid statistical-dynamical approach to simulating nearshore wave climate in complex coastal settings, demonstrated in the Southern California Bight, where waves arriving from distant, disparate locations are refracted over complex bathymetry and shadowed by offshore islands. Contributions of wave families and large-scale atmospheric drivers to nearshore wave energy flux are analyzed. Results highlight the variability of influences controlling wave conditions along neighboring coastlines. The universal method demonstrated here can be applied to complex coastal settings worldwide, facilitating analysis of the effects of climate change on nearshore wave climate.

  10. Trends in Mediation Analysis in Nursing Research: Improving Current Practice.

    PubMed

    Hertzog, Melody

    2018-06-01

    The purpose of this study was to describe common approaches used by nursing researchers to test mediation models and evaluate them within the context of current methodological advances. MEDLINE was used to locate studies testing a mediation model and published from 2004 to 2015 in nursing journals. Design (experimental/correlation, cross-sectional/longitudinal, model complexity) and analysis (method, inclusion of test of mediated effect, violations/discussion of assumptions, sample size/power) characteristics were coded for 456 studies. General trends were identified using descriptive statistics. Consistent with findings of reviews in other disciplines, evidence was found that nursing researchers may not be aware of the strong assumptions and serious limitations of their analyses. Suggestions for strengthening the rigor of such studies and an overview of current methods for testing more complex models, including longitudinal mediation processes, are presented.

  11. Assessing Statistical Competencies in Clinical and Translational Science Education: One Size Does Not Fit All

    PubMed Central

    Lindsell, Christopher J.; Welty, Leah J.; Mazumdar, Madhu; Thurston, Sally W.; Rahbar, Mohammad H.; Carter, Rickey E.; Pollock, Bradley H.; Cucchiara, Andrew J.; Kopras, Elizabeth J.; Jovanovic, Borko D.; Enders, Felicity T.

    2014-01-01

    Abstract Introduction Statistics is an essential training component for a career in clinical and translational science (CTS). Given the increasing complexity of statistics, learners may have difficulty selecting appropriate courses. Our question was: what depth of statistical knowledge do different CTS learners require? Methods For three types of CTS learners (principal investigator, co‐investigator, informed reader of the literature), each with different backgrounds in research (no previous research experience, reader of the research literature, previous research experience), 18 experts in biostatistics, epidemiology, and research design proposed levels for 21 statistical competencies. Results Statistical competencies were categorized as fundamental, intermediate, or specialized. CTS learners who intend to become independent principal investigators require more specialized training, while those intending to become informed consumers of the medical literature require more fundamental education. For most competencies, less training was proposed for those with more research background. Discussion When selecting statistical coursework, the learner's research background and career goal should guide the decision. Some statistical competencies are considered to be more important than others. Baseline knowledge assessments may help learners identify appropriate coursework. Conclusion Rather than one size fits all, tailoring education to baseline knowledge, learner background, and future goals increases learning potential while minimizing classroom time. PMID:25212569

  12. Modelling of electronic excitation and radiation in the Direct Simulation Monte Carlo Macroscopic Chemistry Method

    NASA Astrophysics Data System (ADS)

    Goldsworthy, M. J.

    2012-10-01

    One of the most useful tools for modelling rarefied hypersonic flows is the Direct Simulation Monte Carlo (DSMC) method. Simulator particle movement and collision calculations are combined with statistical procedures to model thermal non-equilibrium flow-fields described by the Boltzmann equation. The Macroscopic Chemistry Method for DSMC simulations was developed to simplify the inclusion of complex thermal non-equilibrium chemistry. The macroscopic approach uses statistical information which is calculated during the DSMC solution process in the modelling procedures. Here it is shown how inclusion of macroscopic information in models of chemical kinetics, electronic excitation, ionization, and radiation can enhance the capabilities of DSMC to model flow-fields where a range of physical processes occur. The approach is applied to the modelling of a 6.4 km/s nitrogen shock wave and results are compared with those from existing shock-tube experiments and continuum calculations. Reasonable agreement between the methods is obtained. The quality of the comparison is highly dependent on the set of vibrational relaxation and chemical kinetic parameters employed.

  13. Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation.

    PubMed

    Arenas, Miguel

    2015-04-01

    NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.

  14. ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.

    PubMed

    Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka

    2015-01-01

    Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.

  15. Estimating trends in the global mean temperature record

    NASA Astrophysics Data System (ADS)

    Poppick, Andrew; Moyer, Elisabeth J.; Stein, Michael L.

    2017-06-01

    Given uncertainties in physical theory and numerical climate simulations, the historical temperature record is often used as a source of empirical information about climate change. Many historical trend analyses appear to de-emphasize physical and statistical assumptions: examples include regression models that treat time rather than radiative forcing as the relevant covariate, and time series methods that account for internal variability in nonparametric rather than parametric ways. However, given a limited data record and the presence of internal variability, estimating radiatively forced temperature trends in the historical record necessarily requires some assumptions. Ostensibly empirical methods can also involve an inherent conflict in assumptions: they require data records that are short enough for naive trend models to be applicable, but long enough for long-timescale internal variability to be accounted for. In the context of global mean temperatures, empirical methods that appear to de-emphasize assumptions can therefore produce misleading inferences, because the trend over the twentieth century is complex and the scale of temporal correlation is long relative to the length of the data record. We illustrate here how a simple but physically motivated trend model can provide better-fitting and more broadly applicable trend estimates and can allow for a wider array of questions to be addressed. In particular, the model allows one to distinguish, within a single statistical framework, between uncertainties in the shorter-term vs. longer-term response to radiative forcing, with implications not only on historical trends but also on uncertainties in future projections. We also investigate the consequence on inferred uncertainties of the choice of a statistical description of internal variability. While nonparametric methods may seem to avoid making explicit assumptions, we demonstrate how even misspecified parametric statistical methods, if attuned to the important characteristics of internal variability, can result in more accurate uncertainty statements about trends.

  16. Data-driven probability concentration and sampling on manifold

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu

    2016-09-15

    A new methodology is proposed for generating realizations of a random vector with values in a finite-dimensional Euclidean space that are statistically consistent with a dataset of observations of this vector. The probability distribution of this random vector, while a priori not known, is presumed to be concentrated on an unknown subset of the Euclidean space. A random matrix is introduced whose columns are independent copies of the random vector and for which the number of columns is the number of data points in the dataset. The approach is based on the use of (i) the multidimensional kernel-density estimation methodmore » for estimating the probability distribution of the random matrix, (ii) a MCMC method for generating realizations for the random matrix, (iii) the diffusion-maps approach for discovering and characterizing the geometry and the structure of the dataset, and (iv) a reduced-order representation of the random matrix, which is constructed using the diffusion-maps vectors associated with the first eigenvalues of the transition matrix relative to the given dataset. The convergence aspects of the proposed methodology are analyzed and a numerical validation is explored through three applications of increasing complexity. The proposed method is found to be robust to noise levels and data complexity as well as to the intrinsic dimension of data and the size of experimental datasets. Both the methodology and the underlying mathematical framework presented in this paper contribute new capabilities and perspectives at the interface of uncertainty quantification, statistical data analysis, stochastic modeling and associated statistical inverse problems.« less

  17. Comparison of individual-based model output to data using a model of walleye pollock early life history in the Gulf of Alaska

    NASA Astrophysics Data System (ADS)

    Hinckley, Sarah; Parada, Carolina; Horne, John K.; Mazur, Michael; Woillez, Mathieu

    2016-10-01

    Biophysical individual-based models (IBMs) have been used to study aspects of early life history of marine fishes such as recruitment, connectivity of spawning and nursery areas, and marine reserve design. However, there is no consistent approach to validating the spatial outputs of these models. In this study, we hope to rectify this gap. We document additions to an existing individual-based biophysical model for Alaska walleye pollock (Gadus chalcogrammus), some simulations made with this model and methods that were used to describe and compare spatial output of the model versus field data derived from ichthyoplankton surveys in the Gulf of Alaska. We used visual methods (e.g. distributional centroids with directional ellipses), several indices (such as a Normalized Difference Index (NDI), and an Overlap Coefficient (OC), and several statistical methods: the Syrjala method, the Getis-Ord Gi* statistic, and a geostatistical method for comparing spatial indices. We assess the utility of these different methods in analyzing spatial output and comparing model output to data, and give recommendations for their appropriate use. Visual methods are useful for initial comparisons of model and data distributions. Metrics such as the NDI and OC give useful measures of co-location and overlap, but care must be taken in discretizing the fields into bins. The Getis-Ord Gi* statistic is useful to determine the patchiness of the fields. The Syrjala method is an easily implemented statistical measure of the difference between the fields, but does not give information on the details of the distributions. Finally, the geostatistical comparison of spatial indices gives good information of details of the distributions and whether they differ significantly between the model and the data. We conclude that each technique gives quite different information about the model-data distribution comparison, and that some are easy to apply and some more complex. We also give recommendations for a multistep process to validate spatial output from IBMs.

  18. Review of Statistical Methods for Analysing Healthcare Resources and Costs

    PubMed Central

    Mihaylova, Borislava; Briggs, Andrew; O'Hagan, Anthony; Thompson, Simon G

    2011-01-01

    We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) single-distribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)-part and Tobit models, (VII) survival methods, (VIII) non-parametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the near-normality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20799344

  19. Nonlinear dynamic analysis of voices before and after surgical excision of vocal polyps

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; McGilligan, Clancy; Zhou, Liang; Vig, Mark; Jiang, Jack J.

    2004-05-01

    Phase space reconstruction, correlation dimension, and second-order entropy, methods from nonlinear dynamics, are used to analyze sustained vowels generated by patients before and after surgical excision of vocal polyps. Two conventional acoustic perturbation parameters, jitter and shimmer, are also employed to analyze voices before and after surgery. Presurgical and postsurgical analyses of jitter, shimmer, correlation dimension, and second-order entropy are statistically compared. Correlation dimension and second-order entropy show a statistically significant decrease after surgery, indicating reduced complexity and higher predictability of postsurgical voice dynamics. There is not a significant postsurgical difference in shimmer, although jitter shows a significant postsurgical decrease. The results suggest that jitter and shimmer should be applied to analyze disordered voices with caution; however, nonlinear dynamic methods may be useful for analyzing abnormal vocal function and quantitatively evaluating the effects of surgical excision of vocal polyps.

  20. Mapping sea ice leads with a coupled numeric/symbolic system

    NASA Technical Reports Server (NTRS)

    Key, J.; Schweiger, A. J.; Maslanik, J. A.

    1990-01-01

    A method is presented which facilitates the detection and delineation of leads with single-channel Landsat data by coupling numeric and symbolic procedures. The procedure consists of three steps: (1) using the dynamic threshold method, an image is mapped to a lead/no lead binary image; (2) the likelihood of fragments to be real leads is examined with a set of numeric rules; and (3) pairs of objects are examined geometrically and merged where possible. The processing ends when all fragments are merged and statistical characteristics are determined, and a map of valid lead objects are left which summarizes useful physical in the lead complexes. Direct implementation of domain knowledge and rapid prototyping are two benefits of the rule-based system. The approach is found to be more successfully applied to mid- and high-level processing, and the system can retrieve statistics about sea-ice leads as well as detect the leads.

  1. Complex patterns of abnormal heartbeats

    NASA Technical Reports Server (NTRS)

    Schulte-Frohlinde, Verena; Ashkenazy, Yosef; Goldberger, Ary L.; Ivanov, Plamen Ch; Costa, Madalena; Morley-Davies, Adrian; Stanley, H. Eugene; Glass, Leon

    2002-01-01

    Individuals having frequent abnormal heartbeats interspersed with normal heartbeats may be at an increased risk of sudden cardiac death. However, mechanistic understanding of such cardiac arrhythmias is limited. We present a visual and qualitative method to display statistical properties of abnormal heartbeats. We introduce dynamical "heartprints" which reveal characteristic patterns in long clinical records encompassing approximately 10(5) heartbeats and may provide information about underlying mechanisms. We test if these dynamics can be reproduced by model simulations in which abnormal heartbeats are generated (i) randomly, (ii) at a fixed time interval following a preceding normal heartbeat, or (iii) by an independent oscillator that may or may not interact with the normal heartbeat. We compare the results of these three models and test their limitations to comprehensively simulate the statistical features of selected clinical records. This work introduces methods that can be used to test mathematical models of arrhythmogenesis and to develop a new understanding of underlying electrophysiologic mechanisms of cardiac arrhythmia.

  2. Methods and means of Fourier-Stokes polarimetry and the spatial-frequency filtering of phase anisotropy manifestations in endometriosis diagnostics

    NASA Astrophysics Data System (ADS)

    Ushenko, A. G.; Dubolazov, O. V.; Ushenko, Vladimir A.; Ushenko, Yu. A.; Sakhnovskiy, M. Yu.; Prydiy, O. G.; Lakusta, I. I.; Novakovskaya, O. Yu.; Melenko, S. R.

    2016-12-01

    This research presents investigation results of diagnostic efficiency of a new azimuthally stable Mueller-matrix method of laser autofluorescence coordinate distributions analysis of dried polycrystalline films of uterine cavity peritoneal fluid. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of dried polycrystalline films of peritoneal fluid - group 1 (healthy donors) and group 2 (uterus endometriosis patients) are estimated.

  3. A weighted U statistic for association analyses considering genetic heterogeneity.

    PubMed

    Wei, Changshuai; Elston, Robert C; Lu, Qing

    2016-07-20

    Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  4. Why significant variables aren't automatically good predictors.

    PubMed

    Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

    2015-11-10

    Thus far, genome-wide association studies (GWAS) have been disappointing in the inability of investigators to use the results of identified, statistically significant variants in complex diseases to make predictions useful for personalized medicine. Why are significant variables not leading to good prediction of outcomes? We point out that this problem is prevalent in simple as well as complex data, in the sciences as well as the social sciences. We offer a brief explanation and some statistical insights on why higher significance cannot automatically imply stronger predictivity and illustrate through simulations and a real breast cancer example. We also demonstrate that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher using significance-based methods. We point out that what makes variables good for prediction versus significance depends on different properties of the underlying distributions. If prediction is the goal, we must lay aside significance as the only selection standard. We suggest that progress in prediction requires efforts toward a new research agenda of searching for a novel criterion to retrieve highly predictive variables rather than highly significant variables. We offer an alternative approach that was not designed for significance, the partition retention method, which was very effective predicting on a long-studied breast cancer data set, by reducing the classification error rate from 30% to 8%.

  5. New Spectrophotometric Assay of Pyrantel Pamoate in Pharmaceuticals and Spiked Human Urine Using Three Complexing Agents

    NASA Astrophysics Data System (ADS)

    Swamy, N.; Prashanth, K. N.; Basavaiah, K.

    2015-07-01

    Three simple, rapid, inexpensive, and highly sensitive spectrophotometric methods are described for the quantifi cation of pyrantel pamoate (PYP) in pure drug and formulations. The methods are based on the molecular charge-transfer (CT) complexation reaction involving pyrantel base (PYL) as n-donor and iodine as σ-acceptor (I 2 , method A), and 2,4-dinitrophenol (DNP, method B) or picric acid (PA, method C) as π-acceptors. Spectrophotometrically, the CT complexes showed absorption maxima at 380, 420, and 430 nm, for methods A, B, and C, respectively. Under optimum conditions, Beer's law was obeyed over the concentration ranges 0.12-2.9, 0.12-3.75, and 0.12-2.9 μg/ml for methods A, B, and C, respectively. The apparent molar absorptivity of the CT complexes at the respective λmax are calculated to be 2.63 × 10 5 , 6.91 × 10 4 , and 1.73 × 10 5 l/mol· cm respectively and the corresponding Sandell sensitivity values are 0.0009, 0.003, and 0.0012. The limits of detection (LOD) and quantification (LOQ) are calculated to be (0.02 and 0.07), (0.05 and 0.15), and (0.02 and 0.07) μg/ml with methods A, B, and C, respectively. The intra-day and inter-day accuracy expressed as %RE and precision expressed as %RSD are less than 3%. The methods have been applied to the determination of PYP in tablets, suspensions, and spiked human urine. Parallel assay by a reference method and statistical analysis of the results obtained show no significant difference between the proposed methods and the reference method with respect to accuracy and precision, as evident from the Student's t and variation ratio tests. The accuracy of the methods has been further ascertained by recovery tests via the standard addition technique.

  6. Built-up Areas Extraction in High Resolution SAR Imagery based on the method of Multiple Feature Weighted Fusion

    NASA Astrophysics Data System (ADS)

    Liu, X.; Zhang, J. X.; Zhao, Z.; Ma, A. D.

    2015-06-01

    Synthetic aperture radar in the application of remote sensing technology is becoming more and more widely because of its all-time and all-weather operation, feature extraction research in high resolution SAR image has become a hot topic of concern. In particular, with the continuous improvement of airborne SAR image resolution, image texture information become more abundant. It's of great significance to classification and extraction. In this paper, a novel method for built-up areas extraction using both statistical and structural features is proposed according to the built-up texture features. First of all, statistical texture features and structural features are respectively extracted by classical method of gray level co-occurrence matrix and method of variogram function, and the direction information is considered in this process. Next, feature weights are calculated innovatively according to the Bhattacharyya distance. Then, all features are weighted fusion. At last, the fused image is classified with K-means classification method and the built-up areas are extracted after post classification process. The proposed method has been tested by domestic airborne P band polarization SAR images, at the same time, two groups of experiments based on the method of statistical texture and the method of structural texture were carried out respectively. On the basis of qualitative analysis, quantitative analysis based on the built-up area selected artificially is enforced, in the relatively simple experimentation area, detection rate is more than 90%, in the relatively complex experimentation area, detection rate is also higher than the other two methods. In the study-area, the results show that this method can effectively and accurately extract built-up areas in high resolution airborne SAR imagery.

  7. New method of determination of spot welding-adhesive joint fatigue life using full field strain evolution

    NASA Astrophysics Data System (ADS)

    Sadowski, T.; Kneć, M.

    2016-04-01

    Fatigue tests were conducted since more than two hundred years ago. Despite this long period, as fatigue phenomena are very complex, assessment of fatigue response of standard materials or composites still requires a long time. Quite precise way to estimate fatigue parameters is to test at least 30 standardized specimens for the analysed material and further statistical post processing is required. In case of structural elements analysis like hybrid joints (Figure 1), the situation is much more complex as more factors influence the fatigue load capacity due to much more complicated structure of the joint in comparison to standard materials specimen, i.e. occurrence of: welded hot spots or rivets, adhesive layers, local notches creating the stress concentrations, etc. In order to shorten testing time some rapid methods are known: Locati's method [1] - step by step load increments up to failure, Prot's method [2] - constant increase of the load amplitude up to failure; Lehr's method [2] - seeking for the point during regular fatigue loading when an increase of temperature or strains become non-linear. The present article proposes new method of the fatigue response assessment - combination of the Locati's and Lehr's method.

  8. Event coincidence analysis for quantifying statistical interrelationships between event time series. On the role of flood events as triggers of epidemic outbreaks

    NASA Astrophysics Data System (ADS)

    Donges, J. F.; Schleussner, C.-F.; Siegmund, J. F.; Donner, R. V.

    2016-05-01

    Studying event time series is a powerful approach for analyzing the dynamics of complex dynamical systems in many fields of science. In this paper, we describe the method of event coincidence analysis to provide a framework for quantifying the strength, directionality and time lag of statistical interrelationships between event series. Event coincidence analysis allows to formulate and test null hypotheses on the origin of the observed interrelationships including tests based on Poisson processes or, more generally, stochastic point processes with a prescribed inter-event time distribution and other higher-order properties. Applying the framework to country-level observational data yields evidence that flood events have acted as triggers of epidemic outbreaks globally since the 1950s. Facing projected future changes in the statistics of climatic extreme events, statistical techniques such as event coincidence analysis will be relevant for investigating the impacts of anthropogenic climate change on human societies and ecosystems worldwide.

  9. Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation.

    PubMed

    Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don

    2015-10-01

    Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Determination of effective loss factors in reduced SEA models

    NASA Astrophysics Data System (ADS)

    Chimeno Manguán, M.; Fernández de las Heras, M. J.; Roibás Millán, E.; Simón Hidalgo, F.

    2017-01-01

    The definition of Statistical Energy Analysis (SEA) models for large complex structures is highly conditioned by the classification of the structure elements into a set of coupled subsystems and the subsequent determination of the loss factors representing both the internal damping and the coupling between subsystems. The accurate definition of the complete system can lead to excessively large models as the size and complexity increases. This fact can also rise practical issues for the experimental determination of the loss factors. This work presents a formulation of reduced SEA models for incomplete systems defined by a set of effective loss factors. This reduced SEA model provides a feasible number of subsystems for the application of the Power Injection Method (PIM). For structures of high complexity, their components accessibility can be restricted, for instance internal equipments or panels. For these cases the use of PIM to carry out an experimental SEA analysis is not possible. New methods are presented for this case in combination with the reduced SEA models. These methods allow defining some of the model loss factors that could not be obtained through PIM. The methods are validated with a numerical analysis case and they are also applied to an actual spacecraft structure with accessibility restrictions: a solar wing in folded configuration.

  11. Colorimetric and atomic absorption spectrometric determination of mucolytic drug ambroxol through ion-pair formation with iron and thiocyanate.

    PubMed

    Levent, Abdulkadir; Sentürk, Zühre

    2010-09-01

    Colorimetric and atomic absorption spectrometric methods have been developed for the determination of mucolytic drug Ambroxol. These procedures depend upon the reaction of iron(III) metal ion with the drug in the presence of thiocyanate ion to form stable ion-pair complex which extractable chloroform. The red-coloured complex was determined either colorimetrically at 510 nm or by indirect atomic absorption spectrometry (AAS) via the determination of the iron content in the formed complex. The optimum experimental conditions for pH, concentrations of Fe(3+) and SCN(-), shaking time, phase ratio, and the number of extractions were determined. Under the proposed conditions, linearity was obeyed in the concentration ranges 4.1x10(-6) - 5.7x10(-5) M (1.7-23.6 µg mL(-1)) using both methods, with detection limits of 4.6x10(-7) M (0.19 µg mL(-1)) for colorimetry and 1.1x10(-6) M (0.46 µg mL(-1)) for AAS. The proposed methods were applied for the determination of Ambroxol in tablet dosage forms. The results obtained were statistically analyzed and compared with those obtained by applying the high-performance liquid chromatographic method with diode-array detection.

  12. OPTIMIZING THROUGH CO-EVOLUTIONARY AVALANCHES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    S. BOETTCHER; A. PERCUS

    2000-08-01

    We explore a new general-purpose heuristic for finding high-quality solutions to hard optimization problems. The method, called extremal optimization, is inspired by ''self-organized critically,'' a concept introduced to describe emergent complexity in many physical systems. In contrast to Genetic Algorithms which operate on an entire ''gene-pool'' of possible solutions, extremal optimization successively replaces extremely undesirable elements of a sub-optimal solution with new, random ones. Large fluctuations, called ''avalanches,'' ensue that efficiently explore many local optima. Drawing upon models used to simulate far-from-equilibrium dynamics, extremal optimization complements approximation methods inspired by equilibrium statistical physics, such as simulated annealing. With only onemore » adjustable parameter, its performance has proved competitive with more elaborate methods, especially near phase transitions. Those phase transitions are found in the parameter space of most optimization problems, and have recently been conjectured to be the origin of some of the hardest instances in computational complexity. We will demonstrate how extremal optimization can be implemented for a variety of combinatorial optimization problems. We believe that extremal optimization will be a useful tool in the investigation of phase transitions in combinatorial optimization problems, hence valuable in elucidating the origin of computational complexity.« less

  13. Weighted multiscale Rényi permutation entropy of nonlinear time series

    NASA Astrophysics Data System (ADS)

    Chen, Shijian; Shang, Pengjian; Wu, Yue

    2018-04-01

    In this paper, based on Rényi permutation entropy (RPE), which has been recently suggested as a relative measure of complexity in nonlinear systems, we propose multiscale Rényi permutation entropy (MRPE) and weighted multiscale Rényi permutation entropy (WMRPE) to quantify the complexity of nonlinear time series over multiple time scales. First, we apply MPRE and WMPRE to the synthetic data and make a comparison of modified methods and RPE. Meanwhile, the influence of the change of parameters is discussed. Besides, we interpret the necessity of considering not only multiscale but also weight by taking the amplitude into account. Then MRPE and WMRPE methods are employed to the closing prices of financial stock markets from different areas. By observing the curves of WMRPE and analyzing the common statistics, stock markets are divided into 4 groups: (1) DJI, S&P500, and HSI, (2) NASDAQ and FTSE100, (3) DAX40 and CAC40, and (4) ShangZheng and ShenCheng. Results show that the standard deviations of weighted methods are smaller, showing WMRPE is able to ensure the results more robust. Besides, WMPRE can provide abundant dynamical properties of complex systems, and demonstrate the intrinsic mechanism.

  14. A weighted U-statistic for genetic association analyses of sequencing data.

    PubMed

    Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

    2014-12-01

    With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.

  15. Power-law statistics of neurophysiological processes analyzed using short signals

    NASA Astrophysics Data System (ADS)

    Pavlova, Olga N.; Runnova, Anastasiya E.; Pavlov, Alexey N.

    2018-04-01

    We discuss the problem of quantifying power-law statistics of complex processes from short signals. Based on the analysis of electroencephalograms (EEG) we compare three interrelated approaches which enable characterization of the power spectral density (PSD) and show that an application of the detrended fluctuation analysis (DFA) or the wavelet-transform modulus maxima (WTMM) method represents a useful way of indirect characterization of the PSD features from short data sets. We conclude that despite DFA- and WTMM-based measures can be obtained from the estimated PSD, these tools outperform the standard spectral analysis when characterization of the analyzed regime should be provided based on a very limited amount of data.

  16. Testing alternative ground water models using cross-validation and other methods

    USGS Publications Warehouse

    Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.

    2007-01-01

    Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.

  17. A bootstrapping method for development of Treebank

    NASA Astrophysics Data System (ADS)

    Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.

    2017-01-01

    Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].

  18. Discrimination of complex mixtures by a colorimetric sensor array: coffee aromas.

    PubMed

    Suslick, Benjamin A; Feng, Liang; Suslick, Kenneth S

    2010-03-01

    The analysis of complex mixtures presents a difficult challenge even for modern analytical techniques, and the ability to discriminate among closely similar such mixtures often remains problematic. Coffee provides a readily available archetype of such highly multicomponent systems. The use of a low-cost, sensitive colorimetric sensor array for the detection and identification of coffee aromas is reported. The color changes of the sensor array were used as a digital representation of the array response and analyzed with standard statistical methods, including principal component analysis (PCA) and hierarchical clustering analysis (HCA). PCA revealed that the sensor array has exceptionally high dimensionality with 18 dimensions required to define 90% of the total variance. In quintuplicate runs of 10 commercial coffees and controls, no confusions or errors in classification by HCA were observed in 55 trials. In addition, the effects of temperature and time in the roasting of green coffee beans were readily observed and distinguishable with a resolution better than 10 degrees C and 5 min, respectively. Colorimetric sensor arrays demonstrate excellent potential for complex systems analysis in real-world applications and provide a novel method for discrimination among closely similar complex mixtures.

  19. Efficient computation of the joint sample frequency spectra for multiple populations.

    PubMed

    Kamm, John A; Terhorst, Jonathan; Song, Yun S

    2017-01-01

    A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

  20. Discrimination of Complex Mixtures by a Colorimetric Sensor Array: Coffee Aromas

    PubMed Central

    Suslick, Benjamin A.; Feng, Liang; Suslick, Kenneth S.

    2010-01-01

    The analysis of complex mixtures presents a difficult challenge even for modern analytical techniques, and the ability to discriminate among closely similar such mixtures often remains problematic. Coffee provides a readily available archetype of such highly multicomponent systems. The use of a low-cost, sensitive colorimetric sensor array for the detection and identification of coffee aromas is reported. The color changes of the sensor array were used as a digital representation of the array response and analyzed with standard statistical methods, including principal component analysis (PCA) and hierarchical clustering analysis (HCA). PCA revealed that the sensor array has exceptionally high dimensionality with 18 dimensions required to define 90% of the total variance. In quintuplicate runs of 10 commercial coffees and controls, no confusions or errors in classification by HCA were observed in 55 trials. In addition, the effects of temperature and time in the roasting of green coffee beans were readily observed and distinguishable with a resolution better than 10 °C and 5 min, respectively. Colorimetric sensor arrays demonstrate excellent potential for complex systems analysis in real-world applications and provide a novel method for discrimination among closely similar complex mixtures. PMID:20143838

  1. Efficient computation of the joint sample frequency spectra for multiple populations

    PubMed Central

    Kamm, John A.; Terhorst, Jonathan; Song, Yun S.

    2016-01-01

    A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity. PMID:28239248

  2. Hierarchical modeling and robust synthesis for the preliminary design of large scale complex systems

    NASA Astrophysics Data System (ADS)

    Koch, Patrick Nathan

    Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: (1) Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis, (2) Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration, and (3) Noise modeling techniques for implementing robust preliminary design when approximate models are employed. The method developed and associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system; the turbofan system-level problem is partitioned into engine cycle and configuration design and a compressor module is integrated for more detailed subsystem-level design exploration, improving system evaluation.

  3. Data series embedding and scale invariant statistics.

    PubMed

    Michieli, I; Medved, B; Ristov, S

    2010-06-01

    Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated. (c) 2009 Elsevier B.V. All rights reserved.

  4. Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants.

    PubMed

    Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo

    2015-08-24

    Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.

  5. Statistical monitoring of the hand, foot and mouth disease in China.

    PubMed

    Zhang, Jingnan; Kang, Yicheng; Yang, Yang; Qiu, Peihua

    2015-09-01

    In a period starting around 2007, the Hand, Foot, and Mouth Disease (HFMD) became wide-spreading in China, and the Chinese public health was seriously threatened. To prevent the outbreak of infectious diseases like HFMD, effective disease surveillance systems would be especially helpful to give signals of disease outbreaks as early as possible. Statistical process control (SPC) charts provide a major statistical tool in industrial quality control for detecting product defectives in a timely manner. In recent years, SPC charts have been used for disease surveillance. However, disease surveillance data often have much more complicated structures, compared to the data collected from industrial production lines. Major challenges, including lack of in-control data, complex seasonal effects, and spatio-temporal correlations, make the surveillance data difficult to handle. In this article, we propose a three-step procedure for analyzing disease surveillance data, and our procedure is demonstrated using the HFMD data collected during 2008-2009 in China. Our method uses nonparametric longitudinal data and time series analysis methods to eliminate the possible impact of seasonality and temporal correlation before the disease incidence data are sequentially monitored by a SPC chart. At both national and provincial levels, our proposed method can effectively detect the increasing trend of disease incidence rate before the disease becomes wide-spreading. © 2015, The International Biometric Society.

  6. DNA viewed as an out-of-equilibrium structure

    NASA Astrophysics Data System (ADS)

    Provata, A.; Nicolis, C.; Nicolis, G.

    2014-05-01

    The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ2 tests shows that DNA can not be described as a low order Markov chain of order up to r =6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.

  7. Inference with viral quasispecies diversity indices: clonal and NGS approaches.

    PubMed

    Gregori, Josep; Salicrú, Miquel; Domingo, Esteban; Sanchez, Alex; Esteban, Juan I; Rodríguez-Frías, Francisco; Quer, Josep

    2014-04-15

    Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. A powerful score-based test statistic for detecting gene-gene co-association.

    PubMed

    Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

    2016-01-29

    The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.

  9. DNA viewed as an out-of-equilibrium structure.

    PubMed

    Provata, A; Nicolis, C; Nicolis, G

    2014-05-01

    The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ^{2} tests shows that DNA can not be described as a low order Markov chain of order up to r=6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.

  10. Stochastic Partial Differential Equation Solver for Hydroacoustic Modeling: Improvements to Paracousti Sound Propagation Solver

    NASA Astrophysics Data System (ADS)

    Preston, L. A.

    2017-12-01

    Marine hydrokinetic (MHK) devices offer a clean, renewable alternative energy source for the future. Responsible utilization of MHK devices, however, requires that the effects of acoustic noise produced by these devices on marine life and marine-related human activities be well understood. Paracousti is a 3-D full waveform acoustic modeling suite that can accurately propagate MHK noise signals in the complex bathymetry found in the near-shore to open ocean environment and considers real properties of the seabed, water column, and air-surface interface. However, this is a deterministic simulation that assumes the environment and source are exactly known. In reality, environmental and source characteristics are often only known in a statistical sense. Thus, to fully characterize the expected noise levels within the marine environment, this uncertainty in environmental and source factors should be incorporated into the acoustic simulations. One method is to use Monte Carlo (MC) techniques where simulation results from a large number of deterministic solutions are aggregated to provide statistical properties of the output signal. However, MC methods can be computationally prohibitive since they can require tens of thousands or more simulations to build up an accurate representation of those statistical properties. An alternative method, using the technique of stochastic partial differential equations (SPDE), allows computation of the statistical properties of output signals at a small fraction of the computational cost of MC. We are developing a SPDE solver for the 3-D acoustic wave propagation problem called Paracousti-UQ to help regulators and operators assess the statistical properties of environmental noise produced by MHK devices. In this presentation, we present the SPDE method and compare statistical distributions of simulated acoustic signals in simple models to MC simulations to show the accuracy and efficiency of the SPDE method. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525.

  11. Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

    NASA Astrophysics Data System (ADS)

    Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.

    2013-04-01

    Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.

  12. Methodological problems in the method used by IQWiG within early benefit assessment of new pharmaceuticals in Germany.

    PubMed

    Herpers, Matthias; Dintsios, Charalabos-Markos

    2018-04-25

    The decision matrix applied by the Institute for Quality and Efficiency in Health Care (IQWiG) for the quantification of added benefit within the early benefit assessment of new pharmaceuticals in Germany with its nine fields is quite complex and could be simplified. Furthermore, the method used by IQWiG is subject to manifold criticism: (1) it is implicitly weighting endpoints differently in its assessments favoring overall survival and, thereby, drug interventions in fatal diseases, (2) it is assuming that two pivotal trials are available when assessing the dossiers submitted by the pharmaceutical manufacturers, leading to far-reaching implications with respect to the quantification of added benefit, and, (3) it is basing the evaluation primarily on dichotomous endpoints and consequently leading to an information loss of usable evidence. To investigate if criticism is justified and to propose methodological adaptations. Analysis of the available dossiers up to the end of 2016 using statistical tests and multinomial logistic regression and simulations. It was shown that due to power losses, the method does not ensure that results are statistically valid and outcomes of the early benefit assessment may be compromised, though evidence on favoring overall survival remains unclear. Modifications, however, of the IQWiG method are possible to address the identified problems. By converging with the approach of approval authorities for confirmatory endpoints, the decision matrix could be simplified and the analysis method could be improved, to put the results on a more valid statistical basis.

  13. Statistics of baryon correlation functions in lattice QCD

    NASA Astrophysics Data System (ADS)

    Wagman, Michael L.; Savage, Martin J.; Nplqcd Collaboration

    2017-12-01

    A systematic analysis of the structure of single-baryon correlation functions calculated with lattice QCD is performed, with a particular focus on characterizing the structure of the noise associated with quantum fluctuations. The signal-to-noise problem in these correlation functions is shown, as long suspected, to result from a sign problem. The log-magnitude and complex phase are found to be approximately described by normal and wrapped normal distributions respectively. Properties of circular statistics are used to understand the emergence of a large time noise region where standard energy measurements are unreliable. Power-law tails in the distribution of baryon correlation functions, associated with stable distributions and "Lévy flights," are found to play a central role in their time evolution. A new method of analyzing correlation functions is considered for which the signal-to-noise ratio of energy measurements is constant, rather than exponentially degrading, with increasing source-sink separation time. This new method includes an additional systematic uncertainty that can be removed by performing an extrapolation, and the signal-to-noise problem reemerges in the statistics of this extrapolation. It is demonstrated that this new method allows accurate results for the nucleon mass to be extracted from the large-time noise region inaccessible to standard methods. The observations presented here are expected to apply to quantum Monte Carlo calculations more generally. Similar methods to those introduced here may lead to practical improvements in analysis of noisier systems.

  14. Automated detection of hospital outbreaks: A systematic review of methods.

    PubMed

    Leclère, Brice; Buckeridge, David L; Boëlle, Pierre-Yves; Astagneau, Pascal; Lepelletier, Didier

    2017-01-01

    Several automated algorithms for epidemiological surveillance in hospitals have been proposed. However, the usefulness of these methods to detect nosocomial outbreaks remains unclear. The goal of this review was to describe outbreak detection algorithms that have been tested within hospitals, consider how they were evaluated, and synthesize their results. We developed a search query using keywords associated with hospital outbreak detection and searched the MEDLINE database. To ensure the highest sensitivity, no limitations were initially imposed on publication languages and dates, although we subsequently excluded studies published before 2000. Every study that described a method to detect outbreaks within hospitals was included, without any exclusion based on study design. Additional studies were identified through citations in retrieved studies. Twenty-nine studies were included. The detection algorithms were grouped into 5 categories: simple thresholds (n = 6), statistical process control (n = 12), scan statistics (n = 6), traditional statistical models (n = 6), and data mining methods (n = 4). The evaluation of the algorithms was often solely descriptive (n = 15), but more complex epidemiological criteria were also investigated (n = 10). The performance measures varied widely between studies: e.g., the sensitivity of an algorithm in a real world setting could vary between 17 and 100%. Even if outbreak detection algorithms are useful complementary tools for traditional surveillance, the heterogeneity in results among published studies does not support quantitative synthesis of their performance. A standardized framework should be followed when evaluating outbreak detection methods to allow comparison of algorithms across studies and synthesis of results.

  15. Assimilating Flow Data into Complex Multiple-Point Statistical Facies Models Using Pilot Points Method

    NASA Astrophysics Data System (ADS)

    Ma, W.; Jafarpour, B.

    2017-12-01

    We develop a new pilot points method for conditioning discrete multiple-point statistical (MPS) facies simulation on dynamic flow data. While conditioning MPS simulation on static hard data is straightforward, their calibration against nonlinear flow data is nontrivial. The proposed method generates conditional models from a conceptual model of geologic connectivity, known as a training image (TI), by strategically placing and estimating pilot points. To place pilot points, a score map is generated based on three sources of information:: (i) the uncertainty in facies distribution, (ii) the model response sensitivity information, and (iii) the observed flow data. Once the pilot points are placed, the facies values at these points are inferred from production data and are used, along with available hard data at well locations, to simulate a new set of conditional facies realizations. While facies estimation at the pilot points can be performed using different inversion algorithms, in this study the ensemble smoother (ES) and its multiple data assimilation variant (ES-MDA) are adopted to update permeability maps from production data, which are then used to statistically infer facies types at the pilot point locations. The developed method combines the information in the flow data and the TI by using the former to infer facies values at select locations away from the wells and the latter to ensure consistent facies structure and connectivity where away from measurement locations. Several numerical experiments are used to evaluate the performance of the developed method and to discuss its important properties.

  16. Diffusion tensor imaging in children with tuberous sclerosis complex: tract-based spatial statistics assessment of brain microstructural changes.

    PubMed

    Zikou, Anastasia K; Xydis, Vasileios G; Astrakas, Loukas G; Nakou, Iliada; Tzarouchi, Loukia C; Tzoufi, Meropi; Argyropoulou, Maria I

    2016-07-01

    There is evidence of microstructural changes in normal-appearing white matter of patients with tuberous sclerosis complex. To evaluate major white matter tracts in children with tuberous sclerosis complex using tract-based spatial statistics diffusion tensor imaging (DTI) analysis. Eight children (mean age ± standard deviation: 8.5 ± 5.5 years) with an established diagnosis of tuberous sclerosis complex and 8 age-matched controls were studied. The imaging protocol consisted of T1-weighted high-resolution 3-D spoiled gradient-echo sequence and a spin-echo, echo-planar diffusion-weighted sequence. Differences in the diffusion indices were evaluated using tract-based spatial statistics. Tract-based spatial statistics showed increased axial diffusivity in the children with tuberous sclerosis complex in the superior and anterior corona radiata, the superior longitudinal fascicle, the inferior fronto-occipital fascicle, the uncinate fascicle and the anterior thalamic radiation. No significant differences were observed in fractional anisotropy, mean diffusivity and radial diffusivity between patients and control subjects. No difference was found in the diffusion indices between the baseline and follow-up examination in the patient group. Patients with tuberous sclerosis complex have increased axial diffusivity in major white matter tracts, probably related to reduced axonal integrity.

  17. A simple iterative independent component analysis algorithm for vibration source signal identification of complex structures

    NASA Astrophysics Data System (ADS)

    Lee, Dong-Sup; Cho, Dae-Seung; Kim, Kookhyun; Jeon, Jae-Jin; Jung, Woo-Jin; Kang, Myeng-Hwan; Kim, Jae-Ho

    2015-01-01

    Independent Component Analysis (ICA), one of the blind source separation methods, can be applied for extracting unknown source signals only from received signals. This is accomplished by finding statistical independence of signal mixtures and has been successfully applied to myriad fields such as medical science, image processing, and numerous others. Nevertheless, there are inherent problems that have been reported when using this technique: instability and invalid ordering of separated signals, particularly when using a conventional ICA technique in vibratory source signal identification of complex structures. In this study, a simple iterative algorithm of the conventional ICA has been proposed to mitigate these problems. The proposed method to extract more stable source signals having valid order includes an iterative and reordering process of extracted mixing matrix to reconstruct finally converged source signals, referring to the magnitudes of correlation coefficients between the intermediately separated signals and the signals measured on or nearby sources. In order to review the problems of the conventional ICA technique and to validate the proposed method, numerical analyses have been carried out for a virtual response model and a 30 m class submarine model. Moreover, in order to investigate applicability of the proposed method to real problem of complex structure, an experiment has been carried out for a scaled submarine mockup. The results show that the proposed method could resolve the inherent problems of a conventional ICA technique.

  18. Comparative evaluation of spectroscopic models using different multivariate statistical tools in a multicancer scenario

    NASA Astrophysics Data System (ADS)

    Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali

    2011-02-01

    Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.

  19. A classification scheme for turbulent flows based on their joint velocity-intermittency structure

    NASA Astrophysics Data System (ADS)

    Keylock, C. J.; Nishimura, K.; Peinke, J.

    2011-12-01

    Kolmogorov's classic theory for turbulence assumed an independence between velocity increments and the value for the velocity itself. However, this assumption is questionable, particularly in complex geophysical flows. Here we propose a framework for studying velocity-intermittency coupling that is similar in essence to the popular quadrant analysis method for studying near-wall flows. However, we study the dominant (longitudinal) velocity component along with a measure of the roughness of the signal, given mathematically by its series of Hölder exponents. Thus, we permit a possible dependence between velocity and intermittency. We compare boundary layer data obtained in a wind tunnel to turbulent jets and wake flows. These flow classes all have distinct velocity-intermittency characteristics, which cause them to be readily distinguished using our technique. Our method is much simpler and quicker to apply than approaches that condition the velocity increment statistics at some scale, r, on the increment statistics at a neighbouring, larger spatial scale, r+Δ, and the velocity itself. Classification of environmental flows is then possible based on their similarities to the idealised flow classes and we demonstrate this using laboratory data for flow in a parallel-channel confluence where the region of flow recirculation in the lee of the step is discriminated as a flow class distinct from boundary layer, jet and wake flows. Hence, using our method, it is possible to assign a flow classification to complex geophysical, turbulent flows depending upon which idealised flow class they most resemble.

  20. Rapid determination of tartaric acid in wines.

    PubMed

    Bastos, Sandra S T; Tafulo, Paula A R; Queirós, Raquel B; Matos, Cristina D; Sales, M Goreti F

    2009-08-01

    A flow-spectrophotometric method is proposed for the routine determination of tartaric acid in wines. The reaction between tartaric acid and vanadate in acetic media is carried out in flowing conditions and the subsequent colored complex is monitored at 475 nm. The stability of the complex and the corresponding formation constant are presented. The effect of wavelength and pH was evaluated by batch experiments. The selected conditions were transposed to a flow-injection analytical system. Optimization of several flow parameters such as reactor lengths, flow-rate and injection volume was carried out. Using optimized conditions, a linear behavior was observed up to 1000 microg mL(-1) tartaric acid, with a molar extinction coefficient of 450 L mg(-1) cm(-1) and +/- 1 % repeatability. Sample throughput was 25 samples per hour. The flow-spectrophotometric method was satisfactorily applied to the quantification of TA in wines from different sources. Its accuracy was confirmed by statistical comparison to the conventional Rebelein procedure and to a certified analytical method carried out in a routine laboratory.

  1. Chemometric strategy for automatic chromatographic peak detection and background drift correction in chromatographic data.

    PubMed

    Yu, Yong-Jie; Xia, Qiao-Ling; Wang, Sheng; Wang, Bing; Xie, Fu-Wei; Zhang, Xiao-Bing; Ma, Yun-Ming; Wu, Hai-Long

    2014-09-12

    Peak detection and background drift correction (BDC) are the key stages in using chemometric methods to analyze chromatographic fingerprints of complex samples. This study developed a novel chemometric strategy for simultaneous automatic chromatographic peak detection and BDC. A robust statistical method was used for intelligent estimation of instrumental noise level coupled with first-order derivative of chromatographic signal to automatically extract chromatographic peaks in the data. A local curve-fitting strategy was then employed for BDC. Simulated and real liquid chromatographic data were designed with various kinds of background drift and degree of overlapped chromatographic peaks to verify the performance of the proposed strategy. The underlying chromatographic peaks can be automatically detected and reasonably integrated by this strategy. Meanwhile, chromatograms with BDC can be precisely obtained. The proposed method was used to analyze a complex gas chromatography dataset that monitored quality changes in plant extracts during storage procedure. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. A permutation testing framework to compare groups of brain networks.

    PubMed

    Simpson, Sean L; Lyday, Robert G; Hayasaka, Satoru; Marsh, Anthony P; Laurienti, Paul J

    2013-01-01

    Brain network analyses have moved to the forefront of neuroimaging research over the last decade. However, methods for statistically comparing groups of networks have lagged behind. These comparisons have great appeal for researchers interested in gaining further insight into complex brain function and how it changes across different mental states and disease conditions. Current comparison approaches generally either rely on a summary metric or on mass-univariate nodal or edge-based comparisons that ignore the inherent topological properties of the network, yielding little power and failing to make network level comparisons. Gleaning deeper insights into normal and abnormal changes in complex brain function demands methods that take advantage of the wealth of data present in an entire brain network. Here we propose a permutation testing framework that allows comparing groups of networks while incorporating topological features inherent in each individual network. We validate our approach using simulated data with known group differences. We then apply the method to functional brain networks derived from fMRI data.

  3. Astronomical Context of Georgian Folklore

    NASA Astrophysics Data System (ADS)

    Jijelava1, Badri; Holbrook, Jarita; Simonia, Irakli

    2016-10-01

    Objectives: The religious Ancient megalithic monuments are accordingly o/riente to the ancient Gods - The Sun, Moon, luminaries. The aim of this work to research the ethnographic data, current folklore and based on the results, harmonize the ancient Gods and the orientations of the religious megalithic complexes. Methods/Statistical Analysis: We harmonized the ethnographical, folklore and historical information and restoration of ancient celestial sphere (using special astronomy application) and identified the correlations between the some acronychal or helical rising/set of luminaries and orientations of megalithic objects. Such connections are stored in a folklore. Findings: This technique of investigations gives us more clear understanding of ancient universe. Using this method, we can receive additional information about the ancient Gods - Luminaries, clarify current mythology, date the megalithic complex. Application/Improvements: This method of investigation - Harmonization cultural astronomy and archae or astronomy with the archeological investigations will be more fruitful, because it gives us reliable information concerning the ancient culture, ancient religion and ancient people.

  4. Simple F Test Reveals Gene-Gene Interactions in Case-Control Studies

    PubMed Central

    Chen, Guanjie; Yuan, Ao; Zhou, Jie; Bentley, Amy R.; Adeyemo, Adebowale; Rotimi, Charles N.

    2012-01-01

    Missing heritability is still a challenge for Genome Wide Association Studies (GWAS). Gene-gene interactions may partially explain this residual genetic influence and contribute broadly to complex disease. To analyze the gene-gene interactions in case-control studies of complex disease, we propose a simple, non-parametric method that utilizes the F-statistic. This approach consists of three steps. First, we examine the joint distribution of a pair of SNPs in cases and controls separately. Second, an F-test is used to evaluate the ratio of dependence in cases to that of controls. Finally, results are adjusted for multiple tests. This method was used to evaluate gene-gene interactions that are associated with risk of Type 2 Diabetes among African Americans in the Howard University Family Study. We identified 18 gene-gene interactions (P < 0.0001). Compared with the commonly-used logistical regression method, we demonstrate that the F-ratio test is an efficient approach to measuring gene-gene interactions, especially for studies with limited sample size. PMID:22837643

  5. Mixed Effects Models for Resampled Network Statistics Improves Statistical Power to Find Differences in Multi-Subject Functional Connectivity

    PubMed Central

    Narayan, Manjari; Allen, Genevera I.

    2016-01-01

    Many complex brain disorders, such as autism spectrum disorders, exhibit a wide range of symptoms and disability. To understand how brain communication is impaired in such conditions, functional connectivity studies seek to understand individual differences in brain network structure in terms of covariates that measure symptom severity. In practice, however, functional connectivity is not observed but estimated from complex and noisy neural activity measurements. Imperfect subject network estimates can compromise subsequent efforts to detect covariate effects on network structure. We address this problem in the case of Gaussian graphical models of functional connectivity, by proposing novel two-level models that treat both subject level networks and population level covariate effects as unknown parameters. To account for imperfectly estimated subject level networks when fitting these models, we propose two related approaches—R2 based on resampling and random effects test statistics, and R3 that additionally employs random adaptive penalization. Simulation studies using realistic graph structures reveal that R2 and R3 have superior statistical power to detect covariate effects compared to existing approaches, particularly when the number of within subject observations is comparable to the size of subject networks. Using our novel models and methods to study parts of the ABIDE dataset, we find evidence of hypoconnectivity associated with symptom severity in autism spectrum disorders, in frontoparietal and limbic systems as well as in anterior and posterior cingulate cortices. PMID:27147940

  6. Combining biophysical methods for the analysis of protein complex stoichiometry and affinity in SEDPHAT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Huaying, E-mail: zhaoh3@mail.nih.gov; Schuck, Peter, E-mail: zhaoh3@mail.nih.gov

    2015-01-01

    Global multi-method analysis for protein interactions (GMMA) can increase the precision and complexity of binding studies for the determination of the stoichiometry, affinity and cooperativity of multi-site interactions. The principles and recent developments of biophysical solution methods implemented for GMMA in the software SEDPHAT are reviewed, their complementarity in GMMA is described and a new GMMA simulation tool set in SEDPHAT is presented. Reversible macromolecular interactions are ubiquitous in signal transduction pathways, often forming dynamic multi-protein complexes with three or more components. Multivalent binding and cooperativity in these complexes are often key motifs of their biological mechanisms. Traditional solution biophysicalmore » techniques for characterizing the binding and cooperativity are very limited in the number of states that can be resolved. A global multi-method analysis (GMMA) approach has recently been introduced that can leverage the strengths and the different observables of different techniques to improve the accuracy of the resulting binding parameters and to facilitate the study of multi-component systems and multi-site interactions. Here, GMMA is described in the software SEDPHAT for the analysis of data from isothermal titration calorimetry, surface plasmon resonance or other biosensing, analytical ultracentrifugation, fluorescence anisotropy and various other spectroscopic and thermodynamic techniques. The basic principles of these techniques are reviewed and recent advances in view of their particular strengths in the context of GMMA are described. Furthermore, a new feature in SEDPHAT is introduced for the simulation of multi-method data. In combination with specific statistical tools for GMMA in SEDPHAT, simulations can be a valuable step in the experimental design.« less

  7. MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.

    PubMed

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M

    2011-07-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

  8. MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines

    PubMed Central

    Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.

    2011-01-01

    Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652

  9. An Efficient Statistical Method to Compute Molecular Collisional Rate Coefficients

    NASA Astrophysics Data System (ADS)

    Loreau, Jérôme; Lique, François; Faure, Alexandre

    2018-01-01

    Our knowledge about the “cold” universe often relies on molecular spectra. A general property of such spectra is that the energy level populations are rarely at local thermodynamic equilibrium. Solving the radiative transfer thus requires the availability of collisional rate coefficients with the main colliding partners over the temperature range ∼10–1000 K. These rate coefficients are notoriously difficult to measure and expensive to compute. In particular, very few reliable collisional data exist for inelastic collisions involving reactive radicals or ions. In this Letter, we explore the use of a fast quantum statistical method to determine molecular collisional excitation rate coefficients. The method is benchmarked against accurate (but costly) rigid-rotor close-coupling calculations. For collisions proceeding through the formation of a strongly bound complex, the method is found to be highly satisfactory up to room temperature. Its accuracy decreases with decreasing potential well depth and with increasing temperature, as expected. This new method opens the way to the determination of accurate inelastic collisional data involving key reactive species such as {{{H}}}3+, H2O+, and H3O+ for which exact quantum calculations are currently not feasible.

  10. Volcanic hazard assessment for the Canary Islands (Spain) using extreme value theory, and the recent volcanic eruption of El Hierro

    NASA Astrophysics Data System (ADS)

    Sobradelo, R.; Martí, J.; Mendoza-Rosas, A. T.; Gómez, G.

    2012-04-01

    The Canary Islands are an active volcanic region densely populated and visited by several millions of tourists every year. Nearly twenty eruptions have been reported through written chronicles in the last 600 years, suggesting that the probability of a new eruption in the near future is far from zero. This shows the importance of assessing and monitoring the volcanic hazard of the region in order to reduce and manage its potential volcanic risk, and ultimately contribute to the design of appropriate preparedness plans. Hence, the probabilistic analysis of the volcanic eruption time series for the Canary Islands is an essential step for the assessment of volcanic hazard and risk in the area. Such a series describes complex processes involving different types of eruptions over different time scales. Here we propose a statistical method for calculating the probabilities of future eruptions which is most appropriate given the nature of the documented historical eruptive data. We first characterise the eruptions by their magnitudes, and then carry out a preliminary analysis of the data to establish the requirements for the statistical method. Past studies in eruptive time series used conventional statistics and treated the series as an homogeneous process. In this paper, we will use a method that accounts for the time-dependence of the series and includes rare or extreme events, in the form of few data of large eruptions, since these data require special methods of analysis. Hence, we will use a statistical method from extreme value theory. In particular, we will apply a non-homogeneous Poisson process to the historical eruptive data of the Canary Islands to estimate the probability of having at least one volcanic event of a magnitude greater than one in the upcoming years. Shortly after the publication of this method an eruption in the island of El Hierro took place for the first time in historical times, supporting our method and contributing towards the validation of our results.

  11. Quantifying and reducing statistical uncertainty in sample-based health program costing studies in low- and middle-income countries

    PubMed Central

    Resch, Stephen

    2018-01-01

    Objectives: In many low- and middle-income countries, the costs of delivering public health programs such as for HIV/AIDS, nutrition, and immunization are not routinely tracked. A number of recent studies have sought to estimate program costs on the basis of detailed information collected on a subsample of facilities. While unbiased estimates can be obtained via accurate measurement and appropriate analyses, they are subject to statistical uncertainty. Quantification of this uncertainty, for example, via standard errors and/or 95% confidence intervals, provides important contextual information for decision-makers and for the design of future costing studies. While other forms of uncertainty, such as that due to model misspecification, are considered and can be investigated through sensitivity analyses, statistical uncertainty is often not reported in studies estimating the total program costs. This may be due to a lack of awareness/understanding of (1) the technical details regarding uncertainty estimation and (2) the availability of software with which to calculate uncertainty for estimators resulting from complex surveys. We provide an overview of statistical uncertainty in the context of complex costing surveys, emphasizing the various potential specific sources that contribute to overall uncertainty. Methods: We describe how analysts can compute measures of uncertainty, either via appropriately derived formulae or through resampling techniques such as the bootstrap. We also provide an overview of calibration as a means of using additional auxiliary information that is readily available for the entire program, such as the total number of doses administered, to decrease uncertainty and thereby improve decision-making and the planning of future studies. Results: A recent study of the national program for routine immunization in Honduras shows that uncertainty can be reduced by using information available prior to the study. This method can not only be used when estimating the total cost of delivering established health programs but also to decrease uncertainty when the interest lies in assessing the incremental effect of an intervention. Conclusion: Measures of statistical uncertainty associated with survey-based estimates of program costs, such as standard errors and 95% confidence intervals, provide important contextual information for health policy decision-making and key inputs for the design of future costing studies. Such measures are often not reported, possibly because of technical challenges associated with their calculation and a lack of awareness of appropriate software. Modern statistical analysis methods for survey data, such as calibration, provide a means to exploit additional information that is readily available but was not used in the design of the study to significantly improve the estimation of total cost through the reduction of statistical uncertainty. PMID:29636964

  12. Assessment of changing interdependencies between human electroencephalograms using nonlinear methods

    NASA Astrophysics Data System (ADS)

    Pereda, E.; Rial, R.; Gamundi, A.; González, J.

    2001-01-01

    We investigate the problems that might arise when two recently developed methods for detecting interdependencies between time series using state space embedding are applied to signals of different complexity. With this aim, these methods were used to assess the interdependencies between two electroencephalographic channels from 10 adult human subjects during different vigilance states. The significance and nature of the measured interdependencies were checked by comparing the results of the original data with those of different types of surrogates. We found that even with proper reconstructions of the dynamics of the time series, both methods may give wrong statistical evidence of decreasing interdependencies during deep sleep due to changes in the complexity of each individual channel. The main factor responsible for this result was the use of an insufficient number of neighbors in the calculations. Once this problem was surmounted, both methods showed the existence of a significant relationship between the channels which was mostly of linear type and increased from awake to slow wave sleep. We conclude that the significance of the qualitative results provided for both methods must be carefully tested before drawing any conclusion about the implications of such results.

  13. S-curve networks and an approximate method for estimating degree distributions of complex networks

    NASA Astrophysics Data System (ADS)

    Guo, Jin-Li

    2010-12-01

    In the study of complex networks almost all theoretical models have the property of infinite growth, but the size of actual networks is finite. According to statistics from the China Internet IPv4 (Internet Protocol version 4) addresses, this paper proposes a forecasting model by using S curve (logistic curve). The growing trend of IPv4 addresses in China is forecasted. There are some reference values for optimizing the distribution of IPv4 address resource and the development of IPv6. Based on the laws of IPv4 growth, that is, the bulk growth and the finitely growing limit, it proposes a finite network model with a bulk growth. The model is said to be an S-curve network. Analysis demonstrates that the analytic method based on uniform distributions (i.e., Barabási-Albert method) is not suitable for the network. It develops an approximate method to predict the growth dynamics of the individual nodes, and uses this to calculate analytically the degree distribution and the scaling exponents. The analytical result agrees with the simulation well, obeying an approximately power-law form. This method can overcome a shortcoming of Barabási-Albert method commonly used in current network research.

  14. Cost-Effectiveness Analysis: a proposal of new reporting standards in statistical analysis

    PubMed Central

    Bang, Heejung; Zhao, Hongwei

    2014-01-01

    Cost-effectiveness analysis (CEA) is a method for evaluating the outcomes and costs of competing strategies designed to improve health, and has been applied to a variety of different scientific fields. Yet, there are inherent complexities in cost estimation and CEA from statistical perspectives (e.g., skewness, bi-dimensionality, and censoring). The incremental cost-effectiveness ratio that represents the additional cost per one unit of outcome gained by a new strategy has served as the most widely accepted methodology in the CEA. In this article, we call for expanded perspectives and reporting standards reflecting a more comprehensive analysis that can elucidate different aspects of available data. Specifically, we propose that mean and median-based incremental cost-effectiveness ratios and average cost-effectiveness ratios be reported together, along with relevant summary and inferential statistics as complementary measures for informed decision making. PMID:24605979

  15. Statistical estimation of the potential possibilities for panoramic hydro-optic laser sensing

    NASA Astrophysics Data System (ADS)

    Shamanaev, Vitalii S.; Lisenko, Andrey A.

    2017-11-01

    For statistical estimation of the potential possibilities of the lidar with matrix photodetector placed on board an aircraft, the nonstationary equation of laser sensing of a complex multicomponent sea water medium is solved by the Monte Carlo method. The lidar return power is estimated for various optical sea water characteristics in the presence of solar background radiation. For clear waters and brightness of external background illumination of 50, 1, and 10-3 W/(m2ṡμmṡsr), the signal/noise ratio (SNR) exceeds 10 to water depths h = 45-50 m. For coastal waters, SNR >= 10 for h = 17-24 m, whereas for turbid sea waters, SNR >= 10 only to depths h = 8-12 m. Results of statistical simulation have shown that the lidar system with optimal parameters can be used for water sensing to depths of 50 m.

  16. Stan: Statistical inference

    NASA Astrophysics Data System (ADS)

    Stan Development Team

    2018-01-01

    Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.

  17. Evaluation of physiologic complexity in time series using generalized sample entropy and surrogate data analysis

    NASA Astrophysics Data System (ADS)

    Eduardo Virgilio Silva, Luiz; Otavio Murta, Luiz

    2012-12-01

    Complexity in time series is an intriguing feature of living dynamical systems, with potential use for identification of system state. Although various methods have been proposed for measuring physiologic complexity, uncorrelated time series are often assigned high values of complexity, errouneously classifying them as a complex physiological signals. Here, we propose and discuss a method for complex system analysis based on generalized statistical formalism and surrogate time series. Sample entropy (SampEn) was rewritten inspired in Tsallis generalized entropy, as function of q parameter (qSampEn). qSDiff curves were calculated, which consist of differences between original and surrogate series qSampEn. We evaluated qSDiff for 125 real heart rate variability (HRV) dynamics, divided into groups of 70 healthy, 44 congestive heart failure (CHF), and 11 atrial fibrillation (AF) subjects, and for simulated series of stochastic and chaotic process. The evaluations showed that, for nonperiodic signals, qSDiff curves have a maximum point (qSDiffmax) for q ≠1. Values of q where the maximum point occurs and where qSDiff is zero were also evaluated. Only qSDiffmax values were capable of distinguish HRV groups (p-values 5.10×10-3, 1.11×10-7, and 5.50×10-7 for healthy vs. CHF, healthy vs. AF, and CHF vs. AF, respectively), consistently with the concept of physiologic complexity, and suggests a potential use for chaotic system analysis.

  18. Comparison of ibuprofen release from minitablets and capsules containing ibuprofen: β-cyclodextrin complex.

    PubMed

    Salústio, P J; Cabral-Marques, H M; Costa, P C; Pinto, J F

    2011-05-01

    Mixtures containing ibuprofen (IB) complexed with β-cyclodextrin (βCD) obtained by two complexation methods [suspension/solution (with water removed by air stream, spray- and freeze-drying) and kneading technique] were processed into pharmaceutical dosage forms (minitablets and capsules). Powders (IB, βCD and IBβCD) were characterized for moisture content, densities (true and bulk), angle of repose and Carr's index, X-ray and NMR. From physical mixtures and IBβCD complexes without other excipients were prepared 2.5-mm-diameter minitablets and capsules. Minitablets were characterized for the energy of compaction, tensile strength, friability, density and IB release (at pH 1.0 and 7.2), whereby capsules were characterized for IB release. The results from the release of IB were analyzed using different parameters, namely, the similarity factor (f(2)), the dissolution efficiency (DE) and the amounts released at a certain time (30, 60 and 180 min) and compared statistically (α=0.05). The release of IB from the minitablets showed no dependency on the amount of water used in the formation of the complexes. Differences were due to the compaction force used or the presence of a shell for the capsules. The differences observed were mostly due to the characteristics of the particles (dependent on the method considered on the formation of the complexes) and neither to the dosage form nor to the complex of the IB. Copyright © 2011 Elsevier B.V. All rights reserved.

  19. The Statistical Consulting Center for Astronomy (SCCA)

    NASA Technical Reports Server (NTRS)

    Akritas, Michael

    2001-01-01

    The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.

  20. Coevolution at protein complex interfaces can be detected by the complementarity trace with important impact for predictive docking

    PubMed Central

    Madaoui, Hocine; Guerois, Raphaël

    2008-01-01

    Protein surfaces are under significant selection pressure to maintain interactions with their partners throughout evolution. Capturing how selection pressure acts at the interfaces of protein–protein complexes is a fundamental issue with high interest for the structural prediction of macromolecular assemblies. We tackled this issue under the assumption that, throughout evolution, mutations should minimally disrupt the physicochemical compatibility between specific clusters of interacting residues. This constraint drove the development of the so-called Surface COmplementarity Trace in Complex History score (SCOTCH), which was found to discriminate with high efficiency the structure of biological complexes. SCOTCH performances were assessed not only with respect to other evolution-based approaches, such as conservation and coevolution analyses, but also with respect to statistically based scoring methods. Validated on a set of 129 complexes of known structure exhibiting both permanent and transient intermolecular interactions, SCOTCH appears as a robust strategy to guide the prediction of protein–protein complex structures. Of particular interest, it also provides a basic framework to efficiently track how protein surfaces could evolve while keeping their partners in contact. PMID:18511568

Top