statistical framework based: Topics by Science.gov

Sample records for statistical framework based

A Framework for Assessing High School Students' Statistical Reasoning.

PubMed

Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang

2016-01-01

Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.
A Framework for Assessing High School Students' Statistical Reasoning

PubMed Central

2016-01-01

Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091
The APA Task Force on Statistical Inference (TFSI) Report as a Framework for Teaching and Evaluating Students' Understandings of Study Validity.

ERIC Educational Resources Information Center

Thompson, Bruce

Web-based statistical instruction, like all statistical instruction, ought to focus on teaching the essence of the research endeavor: the exercise of reflective judgment. Using the framework of the recent report of the American Psychological Association (APA) Task Force on Statistical Inference (Wilkinson and the APA Task Force on Statistical…
Statistical Irreversible Thermodynamics in the Framework of Zubarev's Nonequilibrium Statistical Operator Method

NASA Astrophysics Data System (ADS)

Luzzi, R.; Vasconcellos, A. R.; Ramos, J. G.; Rodrigues, C. G.

2018-01-01

We describe the formalism of statistical irreversible thermodynamics constructed based on Zubarev's nonequilibrium statistical operator (NSO) method, which is a powerful and universal tool for investigating the most varied physical phenomena. We present brief overviews of the statistical ensemble formalism and statistical irreversible thermodynamics. The first can be constructed either based on a heuristic approach or in the framework of information theory in the Jeffreys-Jaynes scheme of scientific inference; Zubarev and his school used both approaches in formulating the NSO method. We describe the main characteristics of statistical irreversible thermodynamics and discuss some particular considerations of several authors. We briefly describe how Rosenfeld, Bohr, and Prigogine proposed to derive a thermodynamic uncertainty principle.
The extraction and integration framework: a two-process account of statistical learning.

PubMed

Thiessen, Erik D; Kronstein, Alexandra T; Hufnagle, Daniel G

2013-07-01

The term statistical learning in infancy research originally referred to sensitivity to transitional probabilities. Subsequent research has demonstrated that statistical learning contributes to infant development in a wide array of domains. The range of statistical learning phenomena necessitates a broader view of the processes underlying statistical learning. Learners are sensitive to a much wider range of statistical information than the conditional relations indexed by transitional probabilities, including distributional and cue-based statistics. We propose a novel framework that unifies learning about all of these kinds of statistical structure. From our perspective, learning about conditional relations outputs discrete representations (such as words). Integration across these discrete representations yields sensitivity to cues and distributional information. To achieve sensitivity to all of these kinds of statistical structure, our framework combines processes that extract segments of the input with processes that compare across these extracted items. In this framework, the items extracted from the input serve as exemplars in long-term memory. The similarity structure of those exemplars in long-term memory leads to the discovery of cues and categorical structure, which guides subsequent extraction. The extraction and integration framework provides a way to explain sensitivity to both conditional statistical structure (such as transitional probabilities) and distributional statistical structure (such as item frequency and variability), and also a framework for thinking about how these different aspects of statistical learning influence each other. 2013 APA, all rights reserved
Structure-Specific Statistical Mapping of White Matter Tracts

PubMed Central

Yushkevich, Paul A.; Zhang, Hui; Simon, Tony; Gee, James C.

2008-01-01

We present a new model-based framework for the statistical analysis of diffusion imaging data associated with specific white matter tracts. The framework takes advantage of the fact that several of the major white matter tracts are thin sheet-like structures that can be effectively modeled by medial representations. The approach involves segmenting major tracts and fitting them with deformable geometric medial models. The medial representation makes it possible to average and combine tensor-based features along directions locally perpendicular to the tracts, thus reducing data dimensionality and accounting for errors in normalization. The framework enables the analysis of individual white matter structures, and provides a range of possibilities for computing statistics and visualizing differences between cohorts. The framework is demonstrated in a study of white matter differences in pediatric chromosome 22q11.2 deletion syndrome. PMID:18407524
How Does Teacher Knowledge in Statistics Impact on Teacher Listening?

ERIC Educational Resources Information Center

Burgess, Tim

2012-01-01

For teaching statistics investigations at primary school level, teacher knowledge has been identified using a framework developed from a classroom based study. Through development of the framework, three types of teacher listening problems were identified, each of which had potential impact on the students' learning. The three types of problems…
A data fusion framework for meta-evaluation of intelligent transportation system effectiveness

DOT National Transportation Integrated Search

This study presents a framework for the meta-evaluation of Intelligent Transportation System effectiveness. The framework is based on data fusion approaches that adjust for data biases and violations of other standard statistical assumptions. Operati...
Mourning dove hunting regulation strategy based on annual harvest statistics and banding data

USGS Publications Warehouse

Otis, D.L.

2006-01-01

Although managers should strive to base game bird harvest management strategies on mechanistic population models, monitoring programs required to build and continuously update these models may not be in place. Alternatively, If estimates of total harvest and harvest rates are available, then population estimates derived from these harvest data can serve as the basis for making hunting regulation decisions based on population growth rates derived from these estimates. I present a statistically rigorous approach for regulation decision-making using a hypothesis-testing framework and an assumed framework of 3 hunting regulation alternatives. I illustrate and evaluate the technique with historical data on the mid-continent mallard (Anas platyrhynchos) population. I evaluate the statistical properties of the hypothesis-testing framework using the best available data on mourning doves (Zenaida macroura). I use these results to discuss practical implementation of the technique as an interim harvest strategy for mourning doves until reliable mechanistic population models and associated monitoring programs are developed.
Scene-based nonuniformity correction and enhancement: pixel statistics and subpixel motion.

PubMed

Zhao, Wenyi; Zhang, Chao

2008-07-01

We propose a framework for scene-based nonuniformity correction (NUC) and nonuniformity correction and enhancement (NUCE) that is required for focal-plane array-like sensors to obtain clean and enhanced-quality images. The core of the proposed framework is a novel registration-based nonuniformity correction super-resolution (NUCSR) method that is bootstrapped by statistical scene-based NUC methods. Based on a comprehensive imaging model and an accurate parametric motion estimation, we are able to remove severe/structured nonuniformity and in the presence of subpixel motion to simultaneously improve image resolution. One important feature of our NUCSR method is the adoption of a parametric motion model that allows us to (1) handle many practical scenarios where parametric motions are present and (2) carry out perfect super-resolution in principle by exploring available subpixel motions. Experiments with real data demonstrate the efficiency of the proposed NUCE framework and the effectiveness of the NUCSR method.
Library Statistical Data Base Formats and Definitions.

ERIC Educational Resources Information Center

Jones, Dennis; And Others

Represented are the detailed set of data structures relevant to the categorization of information, terminology, and definitions employed in the design of the library statistical data base. The data base, or management information system, provides administrators with a framework of information and standardized data for library management, planning,…
Some Statistics for Assessing Person-Fit Based on Continuous-Response Models

ERIC Educational Resources Information Center

Ferrando, Pere Joan

2010-01-01

This article proposes several statistics for assessing individual fit based on two unidimensional models for continuous responses: linear factor analysis and Samejima's continuous response model. Both models are approached using a common framework based on underlying response variables and are formulated at the individual level as fixed regression…
Commentary to Library Statistical Data Base.

ERIC Educational Resources Information Center

Jones, Dennis; And Others

The National Center for Higher Education Management Systems (NCHEMS) has developed a library statistical data base which concentrates on the management information needs of administrators of public and academic libraries. This document provides an overview of the framework and conceptual approach employed in the design of the data base. The data…
SOCR: Statistics Online Computational Resource

PubMed Central

Dinov, Ivo D.

2011-01-01

The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741
A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

PubMed Central

Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

2009-01-01

Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885
Learning Axes and Bridging Tools in a Technology-Based Design for Statistics

ERIC Educational Resources Information Center

Abrahamson, Dor; Wilensky, Uri

2007-01-01

We introduce a design-based research framework, "learning axes and bridging tools," and demonstrate its application in the preparation and study of an implementation of a middle-school experimental computer-based unit on probability and statistics, "ProbLab" (Probability Laboratory, Abrahamson and Wilensky 2002 [Abrahamson, D., & Wilensky, U.…
Inferring Demographic History Using Two-Locus Statistics.

PubMed

Ragsdale, Aaron P; Gutenkunst, Ryan N

2017-06-01

Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
On effectiveness of network sensor-based defense framework

NASA Astrophysics Data System (ADS)

Zhang, Difan; Zhang, Hanlin; Ge, Linqiang; Yu, Wei; Lu, Chao; Chen, Genshe; Pham, Khanh

2012-06-01

Cyber attacks are increasing in frequency, impact, and complexity, which demonstrate extensive network vulnerabilities with the potential for serious damage. Defending against cyber attacks calls for the distributed collaborative monitoring, detection, and mitigation. To this end, we develop a network sensor-based defense framework, with the aim of handling network security awareness, mitigation, and prediction. We implement the prototypical system and show its effectiveness on detecting known attacks, such as port-scanning and distributed denial-of-service (DDoS). Based on this framework, we also implement the statistical-based detection and sequential testing-based detection techniques and compare their respective detection performance. The future implementation of defensive algorithms can be provisioned in our proposed framework for combating cyber attacks.
FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption

PubMed Central

2015-01-01

Background The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. Methods We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. Results The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Conclusions Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics. PMID:26733391
FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption.

PubMed

Zhang, Yuchen; Dai, Wenrui; Jiang, Xiaoqian; Xiong, Hongkai; Wang, Shuang

2015-01-01

The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics.

Development and Demonstration of a Statistical Data Base System for Library and Network Planning and Evaluation. Fourth Quarterly Report.

ERIC Educational Resources Information Center

Jones, Dennis; And Others

The National Center for Higher Education Management Systems (NCHEMS) has completed the development and demonstration of a library statistical data base. The data base, or management information system, was developed for administrators of public and academic libraries. The system provides administrators with a framework of information and…
A Modeling Framework for Optimal Computational Resource Allocation Estimation: Considering the Trade-offs between Physical Resolutions, Uncertainty and Computational Costs

NASA Astrophysics Data System (ADS)

Moslehi, M.; de Barros, F.; Rajagopal, R.

2014-12-01

Hydrogeological models that represent flow and transport in subsurface domains are usually large-scale with excessive computational complexity and uncertain characteristics. Uncertainty quantification for predicting flow and transport in heterogeneous formations often entails utilizing a numerical Monte Carlo framework, which repeatedly simulates the model according to a random field representing hydrogeological characteristics of the field. The physical resolution (e.g. grid resolution associated with the physical space) for the simulation is customarily chosen based on recommendations in the literature, independent of the number of Monte Carlo realizations. This practice may lead to either excessive computational burden or inaccurate solutions. We propose an optimization-based methodology that considers the trade-off between the following conflicting objectives: time associated with computational costs, statistical convergence of the model predictions and physical errors corresponding to numerical grid resolution. In this research, we optimally allocate computational resources by developing a modeling framework for the overall error based on a joint statistical and numerical analysis and optimizing the error model subject to a given computational constraint. The derived expression for the overall error explicitly takes into account the joint dependence between the discretization error of the physical space and the statistical error associated with Monte Carlo realizations. The accuracy of the proposed framework is verified in this study by applying it to several computationally extensive examples. Having this framework at hand aims hydrogeologists to achieve the optimum physical and statistical resolutions to minimize the error with a given computational budget. Moreover, the influence of the available computational resources and the geometric properties of the contaminant source zone on the optimum resolutions are investigated. We conclude that the computational cost associated with optimal allocation can be substantially reduced compared with prevalent recommendations in the literature.
Teaching Statistics with Technology

ERIC Educational Resources Information Center

Prodromou, Theodosia

2015-01-01

The Technological Pedagogical Content Knowledge (TPACK) conceptual framework for teaching mathematics, developed by Mishra and Koehler (2006), emphasises the importance of developing integrated and interdependent understanding of three primary forms of knowledge: technology, pedagogy, and content. The TPACK conceptual framework is based upon the…
Zubarev's Nonequilibrium Statistical Operator Method in the Generalized Statistics of Multiparticle Systems

NASA Astrophysics Data System (ADS)

Glushak, P. A.; Markiv, B. B.; Tokarchuk, M. V.

2018-01-01

We present a generalization of Zubarev's nonequilibrium statistical operator method based on the principle of maximum Renyi entropy. In the framework of this approach, we obtain transport equations for the basic set of parameters of the reduced description of nonequilibrium processes in a classical system of interacting particles using Liouville equations with fractional derivatives. For a classical systems of particles in a medium with a fractal structure, we obtain a non-Markovian diffusion equation with fractional spatial derivatives. For a concrete model of the frequency dependence of a memory function, we obtain generalized Kettano-type diffusion equation with the spatial and temporal fractality taken into account. We present a generalization of nonequilibrium thermofield dynamics in Zubarev's nonequilibrium statistical operator method in the framework of Renyi statistics.
A framework for medical image retrieval using machine learning and statistical similarity matching techniques with relevance feedback.

PubMed

Rahman, Md Mahmudur; Bhattacharya, Prabir; Desai, Bipin C

2007-01-01

A content-based image retrieval (CBIR) framework for diverse collection of medical images of different imaging modalities, anatomic regions with different orientations and biological systems is proposed. Organization of images in such a database (DB) is well defined with predefined semantic categories; hence, it can be useful for category-specific searching. The proposed framework consists of machine learning methods for image prefiltering, similarity matching using statistical distance measures, and a relevance feedback (RF) scheme. To narrow down the semantic gap and increase the retrieval efficiency, we investigate both supervised and unsupervised learning techniques to associate low-level global image features (e.g., color, texture, and edge) in the projected PCA-based eigenspace with their high-level semantic and visual categories. Specially, we explore the use of a probabilistic multiclass support vector machine (SVM) and fuzzy c-mean (FCM) clustering for categorization and prefiltering of images to reduce the search space. A category-specific statistical similarity matching is proposed in a finer level on the prefiltered images. To incorporate a better perception subjectivity, an RF mechanism is also added to update the query parameters dynamically and adjust the proposed matching functions. Experiments are based on a ground-truth DB consisting of 5000 diverse medical images of 20 predefined categories. Analysis of results based on cross-validation (CV) accuracy and precision-recall for image categorization and retrieval is reported. It demonstrates the improvement, effectiveness, and efficiency achieved by the proposed framework.
Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

DTIC Science & Technology

2016-04-26

REPORT TYPE Final 3. DATES COVERED (From - To) 15 Oct 2014 to 14 Jan 2015 4. TITLE AND SUBTITLE Detecting statistically significant clusters of...extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif-based clusters in complex...priori, the need for triangle motif-based clustering . 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was
Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

PubMed

Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

2016-02-01

Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
EFFECTS-BASED CUMULATIVE RISK ASSESSMENT IN A LOW-INCOME URBAN COMMUNITY NEAR A SUPERFUND SITE

EPA Science Inventory

We will introduce into the cumulative risk assessment framework novel methods for non-cancer risk assessment, techniques for dose-response modeling that extend insights from chemical mixtures frameworks to non-chemical stressors, multilevel statistical methods used to address ...
A consistent framework for Horton regression statistics that leads to a modified Hack's law

USGS Publications Warehouse

Furey, P.R.; Troutman, B.M.

2008-01-01

A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
Fracture behavior of metal-ceramic fixed dental prostheses with frameworks from cast or a newly developed sintered cobalt-chromium alloy.

PubMed

Krug, Klaus-Peter; Knauber, Andreas W; Nothdurft, Frank P

2015-03-01

The aim of this study was to investigate the fracture behavior of metal-ceramic bridges with frameworks from cobalt-chromium-molybdenum (CoCrMo), which are manufactured using conventional casting or a new computer-aided design/computer-aided manufacturing (CAD/CAM) milling and sintering technique. A total of 32 metal-ceramic fixed dental prostheses (FDPs), which are based on a nonprecious metal framework, was produced using a conventional casting process (n = 16) or a new CAD/CAM milling and sintering process (n = 16). Eight unveneered frameworks were manufactured using each of the techniques. After thermal and mechanical aging of half of the restorations, all samples were subjected to a static loading test in a universal testing machine, in which acoustic emission monitoring was performed. Three different critical forces were revealed: the fracture force (F max), the force at the first reduction in force (F decr1), and the force at the critical acoustic event (F acoust1). With the exception of the veneered restorations with cast or sintered metal frameworks without artificial aging, which presented a statistically significant but slightly different F max, no statistically significant differences between cast and CAD/CAM sintered and milled FDPs were detected. Thermal and mechanical loading did not significantly affect the resulting forces. Cast and CAD/CAM milled and sintered metal-ceramic bridges were determined to be comparable with respect to the fracture behavior. FDPs based on CAD/CAM milled and sintered frameworks may be an applicable and less technique-sensitive alternative to frameworks that are based on conventionally cast frameworks.
A framework for conducting mechanistic based reliability assessments of components operating in complex systems

NASA Astrophysics Data System (ADS)

Wallace, Jon Michael

2003-10-01

Reliability prediction of components operating in complex systems has historically been conducted in a statistically isolated manner. Current physics-based, i.e. mechanistic, component reliability approaches focus more on component-specific attributes and mathematical algorithms and not enough on the influence of the system. The result is that significant error can be introduced into the component reliability assessment process. The objective of this study is the development of a framework that infuses the needs and influence of the system into the process of conducting mechanistic-based component reliability assessments. The formulated framework consists of six primary steps. The first three steps, identification, decomposition, and synthesis, are primarily qualitative in nature and employ system reliability and safety engineering principles to construct an appropriate starting point for the component reliability assessment. The following two steps are the most unique. They involve a step to efficiently characterize and quantify the system-driven local parameter space and a subsequent step using this information to guide the reduction of the component parameter space. The local statistical space quantification step is accomplished using two proposed multivariate probability models: Multi-Response First Order Second Moment and Taylor-Based Inverse Transformation. Where existing joint probability models require preliminary distribution and correlation information of the responses, these models combine statistical information of the input parameters with an efficient sampling of the response analyses to produce the multi-response joint probability distribution. Parameter space reduction is accomplished using Approximate Canonical Correlation Analysis (ACCA) employed as a multi-response screening technique. The novelty of this approach is that each individual local parameter and even subsets of parameters representing entire contributing analyses can now be rank ordered with respect to their contribution to not just one response, but the entire vector of component responses simultaneously. The final step of the framework is the actual probabilistic assessment of the component. Although the same multivariate probability tools employed in the characterization step can be used for the component probability assessment, variations of this final step are given to allow for the utilization of existing probabilistic methods such as response surface Monte Carlo and Fast Probability Integration. The overall framework developed in this study is implemented to assess the finite-element based reliability prediction of a gas turbine airfoil involving several failure responses. Results of this implementation are compared to results generated using the conventional 'isolated' approach as well as a validation approach conducted through large sample Monte Carlo simulations. The framework resulted in a considerable improvement to the accuracy of the part reliability assessment and an improved understanding of the component failure behavior. Considerable statistical complexity in the form of joint non-normal behavior was found and accounted for using the framework. Future applications of the framework elements are discussed.
A Statistical Framework for the Functional Analysis of Metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharon, Itai; Pati, Amrita; Markowitz, Victor

2008-10-01

Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements.more » They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.« less
A statistical framework for biomedical literature mining.

PubMed

Chung, Dongjun; Lawson, Andrew; Zheng, W Jim

2017-09-30

In systems biology, it is of great interest to identify new genes that were not previously reported to be associated with biological pathways related to various functions and diseases. Identification of these new pathway-modulating genes does not only promote understanding of pathway regulation mechanisms but also allow identification of novel targets for therapeutics. Recently, biomedical literature has been considered as a valuable resource to investigate pathway-modulating genes. While the majority of currently available approaches are based on the co-occurrence of genes within an abstract, it has been reported that these approaches show only sub-optimal performances because 70% of abstracts contain information only for a single gene. To overcome such limitation, we propose a novel statistical framework based on the concept of ontology fingerprint that uses gene ontology to extract information from large biomedical literature data. The proposed framework simultaneously identifies pathway-modulating genes and facilitates interpreting functions of these new genes. We also propose a computationally efficient posterior inference procedure based on Metropolis-Hastings within Gibbs sampler for parameter updates and the poor man's reversible jump Markov chain Monte Carlo approach for model selection. We evaluate the proposed statistical framework with simulation studies, experimental validation, and an application to studies of pathway-modulating genes in yeast. The R implementation of the proposed model is currently available at https://dongjunchung.github.io/bayesGO/. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations.

PubMed

Schaid, Daniel J

2010-01-01

Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
Using connectivity to identify climatic drivers of local adaptation: a response to Macdonald et al.

PubMed

Prunier, Jérôme G; Blanchet, Simon

2018-04-30

Macdonald et al. (Ecol. Lett., 21, 2018, 207-216) proposed an analytical framework for identifying evolutionary processes underlying trait-environment relationships observed in natural populations. Here, we propose an expanded and refined framework based on simulations and bootstrap-based approaches, and we elaborate on an important statistical caveat common to most datasets. © 2018 John Wiley & Sons Ltd/CNRS.
A general science-based framework for dynamical spatio-temporal models

USGS Publications Warehouse

Wikle, C.K.; Hooten, M.B.

2010-01-01

Spatio-temporal statistical models are increasingly being used across a wide variety of scientific disciplines to describe and predict spatially-explicit processes that evolve over time. Correspondingly, in recent years there has been a significant amount of research on new statistical methodology for such models. Although descriptive models that approach the problem from the second-order (covariance) perspective are important, and innovative work is being done in this regard, many real-world processes are dynamic, and it can be more efficient in some cases to characterize the associated spatio-temporal dependence by the use of dynamical models. The chief challenge with the specification of such dynamical models has been related to the curse of dimensionality. Even in fairly simple linear, first-order Markovian, Gaussian error settings, statistical models are often over parameterized. Hierarchical models have proven invaluable in their ability to deal to some extent with this issue by allowing dependency among groups of parameters. In addition, this framework has allowed for the specification of science based parameterizations (and associated prior distributions) in which classes of deterministic dynamical models (e. g., partial differential equations (PDEs), integro-difference equations (IDEs), matrix models, and agent-based models) are used to guide specific parameterizations. Most of the focus for the application of such models in statistics has been in the linear case. The problems mentioned above with linear dynamic models are compounded in the case of nonlinear models. In this sense, the need for coherent and sensible model parameterizations is not only helpful, it is essential. Here, we present an overview of a framework for incorporating scientific information to motivate dynamical spatio-temporal models. First, we illustrate the methodology with the linear case. We then develop a general nonlinear spatio-temporal framework that we call general quadratic nonlinearity and demonstrate that it accommodates many different classes of scientific-based parameterizations as special cases. The model is presented in a hierarchical Bayesian framework and is illustrated with examples from ecology and oceanography. ?? 2010 Sociedad de Estad??stica e Investigaci??n Operativa.
Statistically optimal perception and learning: from behavior to neural representations

PubMed Central

Fiser, József; Berkes, Pietro; Orbán, Gergő; Lengyel, Máté

2010-01-01

Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and reevaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. PMID:20153683
Visualizing Teacher Education as a Complex System: A Nested Simplex System Approach

ERIC Educational Resources Information Center

Ludlow, Larry; Ell, Fiona; Cochran-Smith, Marilyn; Newton, Avery; Trefcer, Kaitlin; Klein, Kelsey; Grudnoff, Lexie; Haigh, Mavis; Hill, Mary F.

2017-01-01

Our purpose is to provide an exploratory statistical representation of initial teacher education as a complex system comprised of dynamic influential elements. More precisely, we reveal what the system looks like for differently-positioned teacher education stakeholders based on our framework for gathering, statistically analyzing, and graphically…
General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

PubMed Central

Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

2013-01-01

We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515
A segmentation editing framework based on shape change statistics

NASA Astrophysics Data System (ADS)

Mostapha, Mahmoud; Vicory, Jared; Styner, Martin; Pizer, Stephen

2017-02-01

Segmentation is a key task in medical image analysis because its accuracy significantly affects successive steps. Automatic segmentation methods often produce inadequate segmentations, which require the user to manually edit the produced segmentation slice by slice. Because editing is time-consuming, an editing tool that enables the user to produce accurate segmentations by only drawing a sparse set of contours would be needed. This paper describes such a framework as applied to a single object. Constrained by the additional information enabled by the manually segmented contours, the proposed framework utilizes object shape statistics to transform the failed automatic segmentation to a more accurate version. Instead of modeling the object shape, the proposed framework utilizes shape change statistics that were generated to capture the object deformation from the failed automatic segmentation to its corresponding correct segmentation. An optimization procedure was used to minimize an energy function that consists of two terms, an external contour match term and an internal shape change regularity term. The high accuracy of the proposed segmentation editing approach was confirmed by testing it on a simulated data set based on 10 in-vivo infant magnetic resonance brain data sets using four similarity metrics. Segmentation results indicated that our method can provide efficient and adequately accurate segmentations (Dice segmentation accuracy increase of 10%), with very sparse contours (only 10%), which is promising in greatly decreasing the work expected from the user.

Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solaimani, Mohiuddin; Iftekhar, Mohammed; Khan, Latifur

Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. Asmore » a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.« less
New statistical scission-point model to predict fission fragment observables

NASA Astrophysics Data System (ADS)

Lemaître, Jean-François; Panebianco, Stefano; Sida, Jean-Luc; Hilaire, Stéphane; Heinrich, Sophie

2015-09-01

The development of high performance computing facilities makes possible a massive production of nuclear data in a full microscopic framework. Taking advantage of the individual potential calculations of more than 7000 nuclei, a new statistical scission-point model, called SPY, has been developed. It gives access to the absolute available energy at the scission point, which allows the use of a parameter-free microcanonical statistical description to calculate the distributions and the mean values of all fission observables. SPY uses the richness of microscopy in a rather simple theoretical framework, without any parameter except the scission-point definition, to draw clear answers based on perfect knowledge of the ingredients involved in the model, with very limited computing cost.
Measuring the Impacts of ICT Using Official Statistics. OECD Digital Economy Papers, No. 136

ERIC Educational Resources Information Center

Roberts, Sheridan

2008-01-01

This paper describes the findings of an OECD project examining ICT impact measurement and analyses based on official statistics. Both economic and social impacts are covered and some results are presented. It attempts to place ICT impacts measurement into an Information Society conceptual framework, provides some suggestions for standardising…
Interaction with Machine Improvisation

NASA Astrophysics Data System (ADS)

Assayag, Gerard; Bloch, George; Cont, Arshia; Dubnov, Shlomo

We describe two multi-agent architectures for an improvisation oriented musician-machine interaction systems that learn in real time from human performers. The improvisation kernel is based on sequence modeling and statistical learning. We present two frameworks of interaction with this kernel. In the first, the stylistic interaction is guided by a human operator in front of an interactive computer environment. In the second framework, the stylistic interaction is delegated to machine intelligence and therefore, knowledge propagation and decision are taken care of by the computer alone. The first framework involves a hybrid architecture using two popular composition/performance environments, Max and OpenMusic, that are put to work and communicate together, each one handling the process at a different time/memory scale. The second framework shares the same representational schemes with the first but uses an Active Learning architecture based on collaborative, competitive and memory-based learning to handle stylistic interactions. Both systems are capable of processing real-time audio/video as well as MIDI. After discussing the general cognitive background of improvisation practices, the statistical modelling tools and the concurrent agent architecture are presented. Then, an Active Learning scheme is described and considered in terms of using different improvisation regimes for improvisation planning. Finally, we provide more details about the different system implementations and describe several performances with the system.
Probabilistic models in human sensorimotor control

PubMed Central

Wolpert, Daniel M.

2009-01-01

Sensory and motor uncertainty form a fundamental constraint on human sensorimotor control. Bayesian decision theory (BDT) has emerged as a unifying framework to understand how the central nervous system performs optimal estimation and control in the face of such uncertainty. BDT has two components: Bayesian statistics and decision theory. Here we review Bayesian statistics and show how it applies to estimating the state of the world and our own body. Recent results suggest that when learning novel tasks we are able to learn the statistical properties of both the world and our own sensory apparatus so as to perform estimation using Bayesian statistics. We review studies which suggest that humans can combine multiple sources of information to form maximum likelihood estimates, can incorporate prior beliefs about possible states of the world so as to generate maximum a posteriori estimates and can use Kalman filter-based processes to estimate time-varying states. Finally, we review Bayesian decision theory in motor control and how the central nervous system processes errors to determine loss functions and optimal actions. We review results that suggest we plan movements based on statistics of our actions that result from signal-dependent noise on our motor outputs. Taken together these studies provide a statistical framework for how the motor system performs in the presence of uncertainty. PMID:17628731
A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

PubMed Central

Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.

2016-01-01

Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095
Efficient and Flexible Climate Analysis with Python in a Cloud-Based Distributed Computing Framework

NASA Astrophysics Data System (ADS)

Gannon, C.

2017-12-01

As climate models become progressively more advanced, and spatial resolution further improved through various downscaling projects, climate projections at a local level are increasingly insightful and valuable. However, the raw size of climate datasets presents numerous hurdles for analysts wishing to develop customized climate risk metrics or perform site-specific statistical analysis. Four Twenty Seven, a climate risk consultancy, has implemented a Python-based distributed framework to analyze large climate datasets in the cloud. With the freedom afforded by efficiently processing these datasets, we are able to customize and continually develop new climate risk metrics using the most up-to-date data. Here we outline our process for using Python packages such as XArray and Dask to evaluate netCDF files in a distributed framework, StarCluster to operate in a cluster-computing environment, cloud computing services to access publicly hosted datasets, and how this setup is particularly valuable for generating climate change indicators and performing localized statistical analysis.
Uncovering hidden heterogeneity: Geo-statistical models illuminate the fine scale effects of boating infrastructure on sediment characteristics and contaminants.

PubMed

Hedge, L H; Dafforn, K A; Simpson, S L; Johnston, E L

2017-06-30

Infrastructure associated with coastal communities is likely to not only directly displace natural systems, but also leave environmental footprints' that stretch over multiple scales. Some coastal infrastructure will, there- fore, generate a hidden layer of habitat heterogeneity in sediment systems that is not immediately observable in classical impact assessment frameworks. We examine the hidden heterogeneity associated with one of the most ubiquitous coastal modifications; dense swing moorings fields. Using a model based geo-statistical framework we highlight the variation in sedimentology throughout mooring fields and reference locations. Moorings were correlated with patches of sediment with larger particle sizes, and associated metal(loid) concentrations in these patches were depressed. Our work highlights two important ideas i) mooring fields create a mosaic of habitat in which contamination decreases and grain sizes increase close to moorings, and ii) model- based frameworks provide an information rich, easy-to-interpret way to communicate complex analyses to stakeholders. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Statistical mechanics framework for static granular matter.

PubMed

Henkes, Silke; Chakraborty, Bulbul

2009-06-01

The physical properties of granular materials have been extensively studied in recent years. So far, however, there exists no theoretical framework which can explain the observations in a unified manner beyond the phenomenological jamming diagram. This work focuses on the case of static granular matter, where we have constructed a statistical ensemble which mirrors equilibrium statistical mechanics. This ensemble, which is based on the conservation properties of the stress tensor, is distinct from the original Edwards ensemble and applies to packings of deformable grains. We combine it with a field theoretical analysis of the packings, where the field is the Airy stress function derived from the force and torque balance conditions. In this framework, Point J characterized by a diverging stiffness of the pressure fluctuations. Separately, we present a phenomenological mean-field theory of the jamming transition, which incorporates the mean contact number as a variable. We link both approaches in the context of the marginal rigidity picture proposed by Wyart and others.
A Framework for Establishing Standard Reference Scale of Texture by Multivariate Statistical Analysis Based on Instrumental Measurement and Sensory Evaluation.

PubMed

Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye

2016-01-13

A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

PubMed Central

Qi, Jin-Peng; Qi, Jie; Zhang, Qing

2016-01-01

Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.

PubMed

Qi, Jin-Peng; Qi, Jie; Zhang, Qing

2016-01-01

Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.
Steganalysis based on reducing the differences of image statistical characteristics

NASA Astrophysics Data System (ADS)

Wang, Ran; Niu, Shaozhang; Ping, Xijian; Zhang, Tao

2018-04-01

Compared with the process of embedding, the image contents make a more significant impact on the differences of image statistical characteristics. This makes the image steganalysis to be a classification problem with bigger withinclass scatter distances and smaller between-class scatter distances. As a result, the steganalysis features will be inseparate caused by the differences of image statistical characteristics. In this paper, a new steganalysis framework which can reduce the differences of image statistical characteristics caused by various content and processing methods is proposed. The given images are segmented to several sub-images according to the texture complexity. Steganalysis features are separately extracted from each subset with the same or close texture complexity to build a classifier. The final steganalysis result is figured out through a weighted fusing process. The theoretical analysis and experimental results can demonstrate the validity of the framework.
Integrated framework for developing search and discrimination metrics

NASA Astrophysics Data System (ADS)

Copeland, Anthony C.; Trivedi, Mohan M.

1997-06-01

This paper presents an experimental framework for evaluating target signature metrics as models of human visual search and discrimination. This framework is based on a prototype eye tracking testbed, the Integrated Testbed for Eye Movement Studies (ITEMS). ITEMS determines an observer's visual fixation point while he studies a displayed image scene, by processing video of the observer's eye. The utility of this framework is illustrated with an experiment using gray-scale images of outdoor scenes that contain randomly placed targets. Each target is a square region of a specific size containing pixel values from another image of an outdoor scene. The real-world analogy of this experiment is that of a military observer looking upon the sensed image of a static scene to find camouflaged enemy targets that are reported to be in the area. ITEMS provides the data necessary to compute various statistics for each target to describe how easily the observers located it, including the likelihood the target was fixated or identified and the time required to do so. The computed values of several target signature metrics are compared to these statistics, and a second-order metric based on a model of image texture was found to be the most highly correlated.
Stochastic Individual-Based Modeling of Bacterial Growth and Division Using Flow Cytometry.

PubMed

García, Míriam R; Vázquez, José A; Teixeira, Isabel G; Alonso, Antonio A

2017-01-01

A realistic description of the variability in bacterial growth and division is critical to produce reliable predictions of safety risks along the food chain. Individual-based modeling of bacteria provides the theoretical framework to deal with this variability, but it requires information about the individual behavior of bacteria inside populations. In this work, we overcome this problem by estimating the individual behavior of bacteria from population statistics obtained with flow cytometry. For this objective, a stochastic individual-based modeling framework is defined based on standard assumptions during division and exponential growth. The unknown single-cell parameters required for running the individual-based modeling simulations, such as cell size growth rate, are estimated from the flow cytometry data. Instead of using directly the individual-based model, we make use of a modified Fokker-Plank equation. This only equation simulates the population statistics in function of the unknown single-cell parameters. We test the validity of the approach by modeling the growth and division of Pediococcus acidilactici within the exponential phase. Estimations reveal the statistics of cell growth and division using only data from flow cytometry at a given time. From the relationship between the mother and daughter volumes, we also predict that P. acidilactici divide into two successive parallel planes.
RooStatsCms: A tool for analysis modelling, combination and statistical studies

NASA Astrophysics Data System (ADS)

Piparo, D.; Schott, G.; Quast, G.

2010-04-01

RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.
A new framework to increase the efficiency of large-scale solar power plants.

NASA Astrophysics Data System (ADS)

Alimohammadi, Shahrouz; Kleissl, Jan P.

2015-11-01

A new framework to estimate the spatio-temporal behavior of solar power is introduced, which predicts the statistical behavior of power output at utility scale Photo-Voltaic (PV) power plants. The framework is based on spatio-temporal Gaussian Processes Regression (Kriging) models, which incorporates satellite data with the UCSD version of the Weather and Research Forecasting model. This framework is designed to improve the efficiency of the large-scale solar power plants. The results are also validated from measurements of the local pyranometer sensors, and some improvements in different scenarios are observed. Solar energy.
The dimensional salience solution to the expectancy-value muddle: an extension.

PubMed

Newton, Joshua D; Newton, Fiona J; Ewing, Michael T

2014-01-01

The theory of reasoned action (TRA) specifies a set of expectancy-value, belief-based frameworks that underpin attitude (behavioural beliefs × outcome evaluations) and subjective norm (normative beliefs × motivation to comply). Unfortunately, the most common method for analysing these frameworks generates statistically uninterpretable findings, resulting in what has been termed the 'expectancy-value muddle'. Recently, however, a dimensional salience approach was found to resolve this muddle for the belief-based framework underpinning attitude. An online survey of 262 participants was therefore conducted to determine whether the dimensional salience approach could also be applied to the belief-based framework underpinning subjective norm. Results revealed that motivations to comply were greater for salient, as opposed to non-salient, social referents. The belief-based framework underpinning subjective norm was therefore represented by evaluating normative belief ratings for salient social referents. This modified framework was found to predict subjective norm, although predictions were greater when participants were forced to select five salient social referents rather than being free to select any number of social referents. These findings validate the use of the dimensional salience approach for examining the belief-based frameworks underpinning subjective norm. As such, this approach provides a complete solution to addressing the expectancy-value muddle in the TRA.
Compromise-based Robust Prioritization of Climate Change Adaptation Strategies for Watershed Management

NASA Astrophysics Data System (ADS)

Kim, Y.; Chung, E. S.

2014-12-01

This study suggests a robust prioritization framework for climate change adaptation strategies under multiple climate change scenarios with a case study of selecting sites for reusing treated wastewater (TWW) in a Korean urban watershed. The framework utilizes various multi-criteria decision making techniques, including the VIKOR method and the Shannon entropy-based weights. In this case study, the sustainability of TWW use is quantified with indicator-based approaches with the DPSIR framework, which considers both hydro-environmental and socio-economic aspects of the watershed management. Under the various climate change scenarios, the hydro-environmental responses to reusing TWW in potential alternative sub-watersheds are determined using the Hydrologic Simulation Program in Fortran (HSPF). The socio-economic indicators are obtained from the statistical databases. Sustainability scores for multiple scenarios are estimated individually and then integrated with the proposed approach. At last, the suggested framework allows us to prioritize adaptation strategies in a robust manner with varying levels of compromise between utility-based and regret-based strategies.
A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula.

PubMed

Ince, Robin A A; Giordano, Bruno L; Kayser, Christoph; Rousselet, Guillaume A; Gross, Joachim; Schyns, Philippe G

2017-03-01

We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541-1573, 2017. © 2016 Wiley Periodicals, Inc. 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.

A New Mathematical Framework for Design Under Uncertainty

DTIC Science & Technology

2016-05-05

blending multiple information sources via auto-regressive stochastic modeling. A computationally efficient machine learning framework is developed based on...sion and machine learning approaches; see Fig. 1. This will lead to a comprehensive description of system performance with less uncertainty than in the...Bayesian optimization of super-cavitating hy- drofoils The goal of this study is to demonstrate the capabilities of statistical learning and
Optimal moment determination in POME-copula based hydrometeorological dependence modelling

NASA Astrophysics Data System (ADS)

Liu, Dengfeng; Wang, Dong; Singh, Vijay P.; Wang, Yuankun; Wu, Jichun; Wang, Lachun; Zou, Xinqing; Chen, Yuanfang; Chen, Xi

2017-07-01

Copula has been commonly applied in multivariate modelling in various fields where marginal distribution inference is a key element. To develop a flexible, unbiased mathematical inference framework in hydrometeorological multivariate applications, the principle of maximum entropy (POME) is being increasingly coupled with copula. However, in previous POME-based studies, determination of optimal moment constraints has generally not been considered. The main contribution of this study is the determination of optimal moments for POME for developing a coupled optimal moment-POME-copula framework to model hydrometeorological multivariate events. In this framework, margins (marginals, or marginal distributions) are derived with the use of POME, subject to optimal moment constraints. Then, various candidate copulas are constructed according to the derived margins, and finally the most probable one is determined, based on goodness-of-fit statistics. This optimal moment-POME-copula framework is applied to model the dependence patterns of three types of hydrometeorological events: (i) single-site streamflow-water level; (ii) multi-site streamflow; and (iii) multi-site precipitation, with data collected from Yichang and Hankou in the Yangtze River basin, China. Results indicate that the optimal-moment POME is more accurate in margin fitting and the corresponding copulas reflect a good statistical performance in correlation simulation. Also, the derived copulas, capturing more patterns which traditional correlation coefficients cannot reflect, provide an efficient way in other applied scenarios concerning hydrometeorological multivariate modelling.
Spatial Modeling for Resources Framework (SMRF): A modular framework for developing spatial forcing data in mountainous terrain

NASA Astrophysics Data System (ADS)

Havens, S.; Marks, D. G.; Kormos, P.; Hedrick, A. R.; Johnson, M.; Robertson, M.; Sandusky, M.

2017-12-01

In the Western US, operational water supply managers rely on statistical techniques to forecast the volume of water left to enter the reservoirs. As the climate changes and the demand increases for stored water utilized for irrigation, flood control, power generation, and ecosystem services, water managers have begun to move from statistical techniques towards using physically based models. To assist with the transition, a new open source framework was developed, the Spatial Modeling for Resources Framework (SMRF), to automate and simplify the most common forcing data distribution methods. SMRF is computationally efficient and can be implemented for both research and operational applications. Currently, SMRF is able to generate all of the forcing data required to run physically based snow or hydrologic models at 50-100 m resolution over regions of 500-10,000 km2, and has been successfully applied in real time and historical applications for the Boise River Basin in Idaho, USA, the Tuolumne River Basin and San Joaquin in California, USA, and Reynolds Creek Experimental Watershed in Idaho, USA. These applications use meteorological station measurements and numerical weather prediction model outputs as input data. SMRF has significantly streamlined the modeling workflow, decreased model set up time from weeks to days, and made near real-time application of physics-based snow and hydrologic models possible.
A data colocation grid framework for big data medical image processing: backend design

NASA Astrophysics Data System (ADS)

Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.

2018-03-01

When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop and HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design.

PubMed

Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A

2018-03-01

When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
Manifold parametrization of the left ventricle for a statistical modelling of its complete anatomy

NASA Astrophysics Data System (ADS)

Gil, D.; Garcia-Barnes, J.; Hernández-Sabate, A.; Marti, E.

2010-03-01

Distortion of Left Ventricle (LV) external anatomy is related to some dysfunctions, such as hypertrophy. The architecture of myocardial fibers determines LV electromechanical activation patterns as well as mechanics. Thus, their joined modelling would allow the design of specific interventions (such as peacemaker implantation and LV remodelling) and therapies (such as resynchronization). On one hand, accurate modelling of external anatomy requires either a dense sampling or a continuous infinite dimensional approach, which requires non-Euclidean statistics. On the other hand, computation of fiber models requires statistics on Riemannian spaces. Most approaches compute separate statistical models for external anatomy and fibers architecture. In this work we propose a general mathematical framework based on differential geometry concepts for computing a statistical model including, both, external and fiber anatomy. Our framework provides a continuous approach to external anatomy supporting standard statistics. We also provide a straightforward formula for the computation of the Riemannian fiber statistics. We have applied our methodology to the computation of complete anatomical atlas of canine hearts from diffusion tensor studies. The orientation of fibers over the average external geometry agrees with the segmental description of orientations reported in the literature.
Statistical estimation via convex optimization for trending and performance monitoring

NASA Astrophysics Data System (ADS)

Samar, Sikandar

This thesis presents an optimization-based statistical estimation approach to find unknown trends in noisy data. A Bayesian framework is used to explicitly take into account prior information about the trends via trend models and constraints. The main focus is on convex formulation of the Bayesian estimation problem, which allows efficient computation of (globally) optimal estimates. There are two main parts of this thesis. The first part formulates trend estimation in systems described by known detailed models as a convex optimization problem. Statistically optimal estimates are then obtained by maximizing a concave log-likelihood function subject to convex constraints. We consider the problem of increasing problem dimension as more measurements become available, and introduce a moving horizon framework to enable recursive estimation of the unknown trend by solving a fixed size convex optimization problem at each horizon. We also present a distributed estimation framework, based on the dual decomposition method, for a system formed by a network of complex sensors with local (convex) estimation. Two specific applications of the convex optimization-based Bayesian estimation approach are described in the second part of the thesis. Batch estimation for parametric diagnostics in a flight control simulation of a space launch vehicle is shown to detect incipient fault trends despite the natural masking properties of feedback in the guidance and control loops. Moving horizon approach is used to estimate time varying fault parameters in a detailed nonlinear simulation model of an unmanned aerial vehicle. An excellent performance is demonstrated in the presence of winds and turbulence.
A framework for grouping nanoparticles based on their measurable characteristics.

PubMed

Sayes, Christie M; Smith, P Alex; Ivanov, Ivan V

2013-01-01

There is a need to take a broader look at nanotoxicological studies. Eventually, the field will demand that some generalizations be made. To begin to address this issue, we posed a question: are metal colloids on the nanometer-size scale a homogeneous group? In general, most people can agree that the physicochemical properties of nanomaterials can be linked and related to their induced toxicological responses. The focus of this study was to determine how a set of selected physicochemical properties of five specific metal-based colloidal materials on the nanometer-size scale - silver, copper, nickel, iron, and zinc - could be used as nanodescriptors that facilitate the grouping of these metal-based colloids. The example of the framework pipeline processing provided in this paper shows the utility of specific statistical and pattern recognition techniques in grouping nanoparticles based on experimental data about their physicochemical properties. Interestingly, the results of the analyses suggest that a seemingly homogeneous group of nanoparticles could be separated into sub-groups depending on interdependencies observed in their nanodescriptors. These particles represent an important category of nanomaterials that are currently mass produced. Each has been reputed to induce toxicological and/or cytotoxicological effects. Here, we propose an experimental methodology coupled with mathematical and statistical modeling that can serve as a prototype for a rigorous framework that aids in the ability to group nanomaterials together and to facilitate the subsequent analysis of trends in data based on quantitative modeling of nanoparticle-specific structure-activity relationships. The computational part of the proposed framework is rather general and can be applied to other groups of nanomaterials as well.
Capturing farm diversity with hypothesis-based typologies: An innovative methodological framework for farming system typology development

PubMed Central

Alvarez, Stéphanie; Timler, Carl J.; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A.; Groot, Jeroen C. J.

2018-01-01

Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia’s Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies. PMID:29763422
Capturing farm diversity with hypothesis-based typologies: An innovative methodological framework for farming system typology development.

PubMed

Alvarez, Stéphanie; Timler, Carl J; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A; Groot, Jeroen C J

2018-01-01

Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia's Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies.
Something old, something new, something borrowed, something blue: a framework for the marriage of health econometrics and cost-effectiveness analysis.

PubMed

Hoch, Jeffrey S; Briggs, Andrew H; Willan, Andrew R

2002-07-01

Economic evaluation is often seen as a branch of health economics divorced from mainstream econometric techniques. Instead, it is perceived as relying on statistical methods for clinical trials. Furthermore, the statistic of interest in cost-effectiveness analysis, the incremental cost-effectiveness ratio is not amenable to regression-based methods, hence the traditional reliance on comparing aggregate measures across the arms of a clinical trial. In this paper, we explore the potential for health economists undertaking cost-effectiveness analysis to exploit the plethora of established econometric techniques through the use of the net-benefit framework - a recently suggested reformulation of the cost-effectiveness problem that avoids the reliance on cost-effectiveness ratios and their associated statistical problems. This allows the formulation of the cost-effectiveness problem within a standard regression type framework. We provide an example with empirical data to illustrate how a regression type framework can enhance the net-benefit method. We go on to suggest that practical advantages of the net-benefit regression approach include being able to use established econometric techniques, adjust for imperfect randomisation, and identify important subgroups in order to estimate the marginal cost-effectiveness of an intervention. Copyright 2002 John Wiley & Sons, Ltd.
MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

PubMed Central

Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

2015-01-01

Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201
The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

PubMed

Tendeiro, Jorge N

2017-01-01

Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.
Validation of surrogate endpoints in advanced solid tumors: systematic review of statistical methods, results, and implications for policy makers.

PubMed

Ciani, Oriana; Davis, Sarah; Tappenden, Paul; Garside, Ruth; Stein, Ken; Cantrell, Anna; Saad, Everardo D; Buyse, Marc; Taylor, Rod S

2014-07-01

Licensing of, and coverage decisions on, new therapies should rely on evidence from patient-relevant endpoints such as overall survival (OS). Nevertheless, evidence from surrogate endpoints may also be useful, as it may not only expedite the regulatory approval of new therapies but also inform coverage decisions. It is, therefore, essential that candidate surrogate endpoints be properly validated. However, there is no consensus on statistical methods for such validation and on how the evidence thus derived should be applied by policy makers. We review current statistical approaches to surrogate-endpoint validation based on meta-analysis in various advanced-tumor settings. We assessed the suitability of two surrogates (progression-free survival [PFS] and time-to-progression [TTP]) using three current validation frameworks: Elston and Taylor's framework, the German Institute of Quality and Efficiency in Health Care's (IQWiG) framework and the Biomarker-Surrogacy Evaluation Schema (BSES3). A wide variety of statistical methods have been used to assess surrogacy. The strength of the association between the two surrogates and OS was generally low. The level of evidence (observation-level versus treatment-level) available varied considerably by cancer type, by evaluation tools and was not always consistent even within one specific cancer type. Not in all solid tumors the treatment-level association between PFS or TTP and OS has been investigated. According to IQWiG's framework, only PFS achieved acceptable evidence of surrogacy in metastatic colorectal and ovarian cancer treated with cytotoxic agents. Our study emphasizes the challenges of surrogate-endpoint validation and the importance of building consensus on the development of evaluation frameworks.
Metrics for comparing neuronal tree shapes based on persistent homology.

PubMed

Li, Yanjie; Wang, Dingkang; Ascoli, Giorgio A; Mitra, Partha; Wang, Yusu

2017-01-01

As more and more neuroanatomical data are made available through efforts such as NeuroMorpho.Org and FlyCircuit.org, the need to develop computational tools to facilitate automatic knowledge discovery from such large datasets becomes more urgent. One fundamental question is how best to compare neuron structures, for instance to organize and classify large collection of neurons. We aim to develop a flexible yet powerful framework to support comparison and classification of large collection of neuron structures efficiently. Specifically we propose to use a topological persistence-based feature vectorization framework. Existing methods to vectorize a neuron (i.e, convert a neuron to a feature vector so as to support efficient comparison and/or searching) typically rely on statistics or summaries of morphometric information, such as the average or maximum local torque angle or partition asymmetry. These simple summaries have limited power in encoding global tree structures. Based on the concept of topological persistence recently developed in the field of computational topology, we vectorize each neuron structure into a simple yet informative summary. In particular, each type of information of interest can be represented as a descriptor function defined on the neuron tree, which is then mapped to a simple persistence-signature. Our framework can encode both local and global tree structure, as well as other information of interest (electrophysiological or dynamical measures), by considering multiple descriptor functions on the neuron. The resulting persistence-based signature is potentially more informative than simple statistical summaries (such as average/mean/max) of morphometric quantities-Indeed, we show that using a certain descriptor function will give a persistence-based signature containing strictly more information than the classical Sholl analysis. At the same time, our framework retains the efficiency associated with treating neurons as points in a simple Euclidean feature space, which would be important for constructing efficient searching or indexing structures over them. We present preliminary experimental results to demonstrate the effectiveness of our persistence-based neuronal feature vectorization framework.
Metrics for comparing neuronal tree shapes based on persistent homology

PubMed Central

Li, Yanjie; Wang, Dingkang; Ascoli, Giorgio A.; Mitra, Partha

2017-01-01

As more and more neuroanatomical data are made available through efforts such as NeuroMorpho.Org and FlyCircuit.org, the need to develop computational tools to facilitate automatic knowledge discovery from such large datasets becomes more urgent. One fundamental question is how best to compare neuron structures, for instance to organize and classify large collection of neurons. We aim to develop a flexible yet powerful framework to support comparison and classification of large collection of neuron structures efficiently. Specifically we propose to use a topological persistence-based feature vectorization framework. Existing methods to vectorize a neuron (i.e, convert a neuron to a feature vector so as to support efficient comparison and/or searching) typically rely on statistics or summaries of morphometric information, such as the average or maximum local torque angle or partition asymmetry. These simple summaries have limited power in encoding global tree structures. Based on the concept of topological persistence recently developed in the field of computational topology, we vectorize each neuron structure into a simple yet informative summary. In particular, each type of information of interest can be represented as a descriptor function defined on the neuron tree, which is then mapped to a simple persistence-signature. Our framework can encode both local and global tree structure, as well as other information of interest (electrophysiological or dynamical measures), by considering multiple descriptor functions on the neuron. The resulting persistence-based signature is potentially more informative than simple statistical summaries (such as average/mean/max) of morphometric quantities—Indeed, we show that using a certain descriptor function will give a persistence-based signature containing strictly more information than the classical Sholl analysis. At the same time, our framework retains the efficiency associated with treating neurons as points in a simple Euclidean feature space, which would be important for constructing efficient searching or indexing structures over them. We present preliminary experimental results to demonstrate the effectiveness of our persistence-based neuronal feature vectorization framework. PMID:28809960
Event coincidence analysis for quantifying statistical interrelationships between event time series. On the role of flood events as triggers of epidemic outbreaks

NASA Astrophysics Data System (ADS)

Donges, J. F.; Schleussner, C.-F.; Siegmund, J. F.; Donner, R. V.

2016-05-01

Studying event time series is a powerful approach for analyzing the dynamics of complex dynamical systems in many fields of science. In this paper, we describe the method of event coincidence analysis to provide a framework for quantifying the strength, directionality and time lag of statistical interrelationships between event series. Event coincidence analysis allows to formulate and test null hypotheses on the origin of the observed interrelationships including tests based on Poisson processes or, more generally, stochastic point processes with a prescribed inter-event time distribution and other higher-order properties. Applying the framework to country-level observational data yields evidence that flood events have acted as triggers of epidemic outbreaks globally since the 1950s. Facing projected future changes in the statistics of climatic extreme events, statistical techniques such as event coincidence analysis will be relevant for investigating the impacts of anthropogenic climate change on human societies and ecosystems worldwide.
Statistical lamb wave localization based on extreme value theory

NASA Astrophysics Data System (ADS)

Harley, Joel B.

2018-04-01

Guided wave localization methods based on delay-and-sum imaging, matched field processing, and other techniques have been designed and researched to create images that locate and describe structural damage. The maximum value of these images typically represent an estimated damage location. Yet, it is often unclear if this maximum value, or any other value in the image, is a statistically significant indicator of damage. Furthermore, there are currently few, if any, approaches to assess the statistical significance of guided wave localization images. As a result, we present statistical delay-and-sum and statistical matched field processing localization methods to create statistically significant images of damage. Our framework uses constant rate of false alarm statistics and extreme value theory to detect damage with little prior information. We demonstrate our methods with in situ guided wave data from an aluminum plate to detect two 0.75 cm diameter holes. Our results show an expected improvement in statistical significance as the number of sensors increase. With seventeen sensors, both methods successfully detect damage with statistical significance.
Toward Global Comparability of Sexual Orientation Data in Official Statistics: A Conceptual Framework of Sexual Orientation for Health Data Collection in New Zealand's Official Statistics System

PubMed Central

Gray, Alistair; Veale, Jaimie F.; Binson, Diane; Sell, Randell L.

2013-01-01

Objective. Effectively addressing health disparities experienced by sexual minority populations requires high-quality official data on sexual orientation. We developed a conceptual framework of sexual orientation to improve the quality of sexual orientation data in New Zealand's Official Statistics System. Methods. We reviewed conceptual and methodological literature, culminating in a draft framework. To improve the framework, we held focus groups and key-informant interviews with sexual minority stakeholders and producers and consumers of official statistics. An advisory board of experts provided additional guidance. Results. The framework proposes working definitions of the sexual orientation topic and measurement concepts, describes dimensions of the measurement concepts, discusses variables framing the measurement concepts, and outlines conceptual grey areas. Conclusion. The framework proposes standard definitions and concepts for the collection of official sexual orientation data in New Zealand. It presents a model for producers of official statistics in other countries, who wish to improve the quality of health data on their citizens. PMID:23840231
Improved statistical power with a sparse shape model in detecting an aging effect in the hippocampus and amygdala

NASA Astrophysics Data System (ADS)

Chung, Moo K.; Kim, Seung-Goo; Schaefer, Stacey M.; van Reekum, Carien M.; Peschke-Schmitz, Lara; Sutterer, Matthew J.; Davidson, Richard J.

2014-03-01

The sparse regression framework has been widely used in medical image processing and analysis. However, it has been rarely used in anatomical studies. We present a sparse shape modeling framework using the Laplace- Beltrami (LB) eigenfunctions of the underlying shape and show its improvement of statistical power. Tradition- ally, the LB-eigenfunctions are used as a basis for intrinsically representing surface shapes as a form of Fourier descriptors. To reduce high frequency noise, only the first few terms are used in the expansion and higher frequency terms are simply thrown away. However, some lower frequency terms may not necessarily contribute significantly in reconstructing the surfaces. Motivated by this idea, we present a LB-based method to filter out only the significant eigenfunctions by imposing a sparse penalty. For dense anatomical data such as deformation fields on a surface mesh, the sparse regression behaves like a smoothing process, which will reduce the error of incorrectly detecting false negatives. Hence the statistical power improves. The sparse shape model is then applied in investigating the influence of age on amygdala and hippocampus shapes in the normal population. The advantage of the LB sparse framework is demonstrated by showing the increased statistical power.

Probabilistic arithmetic automata and their applications.

PubMed

Marschall, Tobias; Herms, Inke; Kaltenbach, Hans-Michael; Rahmann, Sven

2012-01-01

We present a comprehensive review on probabilistic arithmetic automata (PAAs), a general model to describe chains of operations whose operands depend on chance, along with two algorithms to numerically compute the distribution of the results of such probabilistic calculations. PAAs provide a unifying framework to approach many problems arising in computational biology and elsewhere. We present five different applications, namely 1) pattern matching statistics on random texts, including the computation of the distribution of occurrence counts, waiting times, and clump sizes under hidden Markov background models; 2) exact analysis of window-based pattern matching algorithms; 3) sensitivity of filtration seeds used to detect candidate sequence alignments; 4) length and mass statistics of peptide fragments resulting from enzymatic cleavage reactions; and 5) read length statistics of 454 and IonTorrent sequencing reads. The diversity of these applications indicates the flexibility and unifying character of the presented framework. While the construction of a PAA depends on the particular application, we single out a frequently applicable construction method: We introduce deterministic arithmetic automata (DAAs) to model deterministic calculations on sequences, and demonstrate how to construct a PAA from a given DAA and a finite-memory random text model. This procedure is used for all five discussed applications and greatly simplifies the construction of PAAs. Implementations are available as part of the MoSDi package. Its application programming interface facilitates the rapid development of new applications based on the PAA framework.
Complete integrability of information processing by biochemical reactions

PubMed Central

Agliari, Elena; Barra, Adriano; Dello Schiavo, Lorenzo; Moro, Antonio

2016-01-01

Statistical mechanics provides an effective framework to investigate information processing in biochemical reactions. Within such framework far-reaching analogies are established among (anti-) cooperative collective behaviors in chemical kinetics, (anti-)ferromagnetic spin models in statistical mechanics and operational amplifiers/flip-flops in cybernetics. The underlying modeling – based on spin systems – has been proved to be accurate for a wide class of systems matching classical (e.g. Michaelis–Menten, Hill, Adair) scenarios in the infinite-size approximation. However, the current research in biochemical information processing has been focusing on systems involving a relatively small number of units, where this approximation is no longer valid. Here we show that the whole statistical mechanical description of reaction kinetics can be re-formulated via a mechanical analogy – based on completely integrable hydrodynamic-type systems of PDEs – which provides explicit finite-size solutions, matching recently investigated phenomena (e.g. noise-induced cooperativity, stochastic bi-stability, quorum sensing). The resulting picture, successfully tested against a broad spectrum of data, constitutes a neat rationale for a numerically effective and theoretically consistent description of collective behaviors in biochemical reactions. PMID:27812018
Complete integrability of information processing by biochemical reactions

NASA Astrophysics Data System (ADS)

Agliari, Elena; Barra, Adriano; Dello Schiavo, Lorenzo; Moro, Antonio

2016-11-01

Statistical mechanics provides an effective framework to investigate information processing in biochemical reactions. Within such framework far-reaching analogies are established among (anti-) cooperative collective behaviors in chemical kinetics, (anti-)ferromagnetic spin models in statistical mechanics and operational amplifiers/flip-flops in cybernetics. The underlying modeling - based on spin systems - has been proved to be accurate for a wide class of systems matching classical (e.g. Michaelis-Menten, Hill, Adair) scenarios in the infinite-size approximation. However, the current research in biochemical information processing has been focusing on systems involving a relatively small number of units, where this approximation is no longer valid. Here we show that the whole statistical mechanical description of reaction kinetics can be re-formulated via a mechanical analogy - based on completely integrable hydrodynamic-type systems of PDEs - which provides explicit finite-size solutions, matching recently investigated phenomena (e.g. noise-induced cooperativity, stochastic bi-stability, quorum sensing). The resulting picture, successfully tested against a broad spectrum of data, constitutes a neat rationale for a numerically effective and theoretically consistent description of collective behaviors in biochemical reactions.
Complete integrability of information processing by biochemical reactions.

PubMed

Agliari, Elena; Barra, Adriano; Dello Schiavo, Lorenzo; Moro, Antonio

2016-11-04

Statistical mechanics provides an effective framework to investigate information processing in biochemical reactions. Within such framework far-reaching analogies are established among (anti-) cooperative collective behaviors in chemical kinetics, (anti-)ferromagnetic spin models in statistical mechanics and operational amplifiers/flip-flops in cybernetics. The underlying modeling - based on spin systems - has been proved to be accurate for a wide class of systems matching classical (e.g. Michaelis-Menten, Hill, Adair) scenarios in the infinite-size approximation. However, the current research in biochemical information processing has been focusing on systems involving a relatively small number of units, where this approximation is no longer valid. Here we show that the whole statistical mechanical description of reaction kinetics can be re-formulated via a mechanical analogy - based on completely integrable hydrodynamic-type systems of PDEs - which provides explicit finite-size solutions, matching recently investigated phenomena (e.g. noise-induced cooperativity, stochastic bi-stability, quorum sensing). The resulting picture, successfully tested against a broad spectrum of data, constitutes a neat rationale for a numerically effective and theoretically consistent description of collective behaviors in biochemical reactions.
QPROT: Statistical method for testing differential expression using protein-level intensity data in label-free quantitative proteomics.

PubMed

Choi, Hyungwon; Kim, Sinae; Fermin, Damian; Tsou, Chih-Chiang; Nesvizhskii, Alexey I

2015-11-03

We introduce QPROT, a statistical framework and computational tool for differential protein expression analysis using protein intensity data. QPROT is an extension of the QSPEC suite, originally developed for spectral count data, adapted for the analysis using continuously measured protein-level intensity data. QPROT offers a new intensity normalization procedure and model-based differential expression analysis, both of which account for missing data. Determination of differential expression of each protein is based on the standardized Z-statistic based on the posterior distribution of the log fold change parameter, guided by the false discovery rate estimated by a well-known Empirical Bayes method. We evaluated the classification performance of QPROT using the quantification calibration data from the clinical proteomic technology assessment for cancer (CPTAC) study and a recently published Escherichia coli benchmark dataset, with evaluation of FDR accuracy in the latter. QPROT is a statistical framework with computational software tool for comparative quantitative proteomics analysis. It features various extensions of QSPEC method originally built for spectral count data analysis, including probabilistic treatment of missing values in protein intensity data. With the increasing popularity of label-free quantitative proteomics data, the proposed method and accompanying software suite will be immediately useful for many proteomics laboratories. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Evaluate, assess, treat: development and evaluation of the EAT framework to increase effective communication regarding sensitive oral-systemic health issues.

PubMed

DeBate, R D; Cragun, D; Gallentine, A A; Severson, H H; Shaw, T; Cantwell, C; Christiansen, S; Koerber, A; Hendricson, W; Tomar, S L; McCormack Brown, K; Tedesco, L A

2012-11-01

Oral healthcare providers are likely to encounter a number of sensitive oral/systemic health issues whilst interacting with patients. The purpose of the current study was to develop and evaluate a framework aimed at oral healthcare providers to engage in active secondary prevention of eating disorders (i.e. early detection of oral manifestations of disordered eating behaviours, patient approach and communication, patient-specific oral treatment, and referral to care) for patients presenting with signs of disordered eating behaviours. The EAT Framework was developed based on the Brief Motivational Interviewing (B-MI) conceptual framework and comprises three continuous steps: Evaluating, Assessing, and Treating. Using a group-randomized control design, 11 dental hygiene (DH) and seven dental (D) classes from eight institutions were randomized to either the intervention or control conditions. Both groups completed pre- and post-intervention assessments. Hierarchical linear models were conducted to measure the effects of the intervention whilst controlling for baseline levels. Statistically significant improvements from pre- to post-intervention were observed in the Intervention group compared with the Control group on knowledge of eating disorders and oral findings, skills-based knowledge, and self-efficacy (all P < 0.01). Effect sizes ranged from 0.57 to 0.95. No statistically significant differences in outcomes were observed by type of student. Although the EAT Framework was developed as part of a larger study on secondary prevention of eating disorders, the procedures and skills presented can be applied to other sensitive oral/systemic health issues. Because the EAT Framework was developed by translating B-MI principles and procedures, the framework can be easily adopted as a non-confrontational method for patient communication. © 2012 John Wiley & Sons A/S.
Evaluate, Assess, Treat: Development and evaluation of the EAT framework to increase effective communication regarding sensitive oral-systemic health issues

PubMed Central

DeBate, Rita D.; Cragun, Deborah; Gallentine, Ashley A.; Severson, Herbert H.; Shaw, Tracey; Cantwell, Carley; Christiansen, Steve; Koerber, Anne; Hendricson, William; Tomar, Scott L.; Brown, Kelli McCormack; Tedesco, Lisa A.

2012-01-01

Oral healthcare providers are likely to encounter a number of sensitive oral/systemic health issues while interacting with patients. The purpose of the current study was to develop and evaluate a framework aimed at oral healthcare providers to engage in active secondary prevention of eating disorders (i.e., early detection of oral manifestations of disordered eating behaviors, patient approach and communication, patient-specific oral treatment, and referral to care) for patients presenting with signs of disordered eating behaviors. The EAT Framework was developed based on the Brief Motivational Interviewing (B-MI) conceptual framework and comprises three continuous steps: Evaluating, Assessing, and Treating. Using a group-randomized control design, 11 dental hygiene (DH) and 7 dental (D) classes from 8 institutions were randomized to either the intervention or control conditions. Both groups completed preand post-intervention assessments. Hierarchical linear models were conducted to measure the effects of the intervention while controlling for baseline levels. Statistically significant improvements from pre-to post-intervention were observed in the Intervention group compared with the Control group on knowledge of eating disorders and oral findings, skills-based knowledge, and self-efficacy (all p < .01). Effect sizes ranged from .57–.95. No statistically significant differences in outcomes were observed by type of student. Although the EAT Framework was developed as part of a larger study on secondary prevention of eating disorders, the procedures and skills presented can be applied to other sensitive oral/systemic health issues. Because the EAT Framework was developed by translating B-MI principles and procedures, the framework can be easily adopted as a non-confrontational method for patient communication. PMID:23050505
A Statistical Framework for Analyzing Cyber Threats

DTIC Science & Technology

defender cares most about the attacks against certain ports or services). The grey-box statistical framework formulates a new methodology of Cybersecurity ...the design of prediction models. Our research showed that the grey-box framework is effective in predicting cybersecurity situational awareness.
Landscape of Research Areas for Zeolites and Metal-Organic Frameworks Using Computational Classification Based on Citation Networks.

PubMed

Ogawa, Takaya; Iyoki, Kenta; Fukushima, Tomohiro; Kajikawa, Yuya

2017-12-14

The field of porous materials is widely spreading nowadays, and researchers need to read tremendous numbers of papers to obtain a "bird's eye" view of a given research area. However, it is difficult for researchers to obtain an objective database based on statistical data without any relation to subjective knowledge related to individual research interests. Here, citation network analysis was applied for a comparative analysis of the research areas for zeolites and metal-organic frameworks as examples for porous materials. The statistical and objective data contributed to the analysis of: (1) the computational screening of research areas; (2) classification of research stages to a certain domain; (3) "well-cited" research areas; and (4) research area preferences of specific countries. Moreover, we proposed a methodology to assist researchers to gain potential research ideas by reviewing related research areas, which is based on the detection of unfocused ideas in one area but focused in the other area by a bibliometric approach.
Landscape of Research Areas for Zeolites and Metal-Organic Frameworks Using Computational Classification Based on Citation Networks

PubMed Central

Ogawa, Takaya; Fukushima, Tomohiro; Kajikawa, Yuya

2017-01-01

The field of porous materials is widely spreading nowadays, and researchers need to read tremendous numbers of papers to obtain a “bird’s eye” view of a given research area. However, it is difficult for researchers to obtain an objective database based on statistical data without any relation to subjective knowledge related to individual research interests. Here, citation network analysis was applied for a comparative analysis of the research areas for zeolites and metal-organic frameworks as examples for porous materials. The statistical and objective data contributed to the analysis of: (1) the computational screening of research areas; (2) classification of research stages to a certain domain; (3) “well-cited” research areas; and (4) research area preferences of specific countries. Moreover, we proposed a methodology to assist researchers to gain potential research ideas by reviewing related research areas, which is based on the detection of unfocused ideas in one area but focused in the other area by a bibliometric approach. PMID:29240708
Interactive classification and content-based retrieval of tissue images

NASA Astrophysics Data System (ADS)

Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof

2002-11-01

We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.
Evicase: an evidence-based case structuring approach for personalized healthcare.

PubMed

Carmeli, Boaz; Casali, Paolo; Goldbraich, Anna; Goldsteen, Abigail; Kent, Carmel; Licitra, Lisa; Locatelli, Paolo; Restifo, Nicola; Rinott, Ruty; Sini, Elena; Torresani, Michele; Waks, Zeev

2012-01-01

The personalized medicine era stresses a growing need to combine evidence-based medicine with case based reasoning in order to improve the care process. To address this need we suggest a framework to generate multi-tiered statistical structures we call Evicases. Evicase integrates established medical evidence together with patient cases from the bedside. It then uses machine learning algorithms to produce statistical results and aggregators, weighted predictions, and appropriate recommendations. Designed as a stand-alone structure, Evicase can be used for a range of decision support applications including guideline adherence monitoring and personalized prognostic predictions.
Automatically detect and track infrared small targets with kernel Fukunaga-Koontz transform and Kalman prediction.

PubMed

Liu, Ruiming; Liu, Erqi; Yang, Jie; Zeng, Yong; Wang, Fanglin; Cao, Yuan

2007-11-01

Fukunaga-Koontz transform (FKT), stemming from principal component analysis (PCA), is used in many pattern recognition and image-processing fields. It cannot capture the higher-order statistical property of natural images, so its detection performance is not satisfying. PCA has been extended into kernel PCA in order to capture the higher-order statistics. However, thus far there have been no researchers who have definitely proposed kernel FKT (KFKT) and researched its detection performance. For accurately detecting potential small targets from infrared images, we first extend FKT into KFKT to capture the higher-order statistical properties of images. Then a framework based on Kalman prediction and KFKT, which can automatically detect and track small targets, is developed. Results of experiments show that KFKT outperforms FKT and the proposed framework is competent to automatically detect and track infrared point targets.
Automatically detect and track infrared small targets with kernel Fukunaga-Koontz transform and Kalman prediction

NASA Astrophysics Data System (ADS)

Liu, Ruiming; Liu, Erqi; Yang, Jie; Zeng, Yong; Wang, Fanglin; Cao, Yuan

2007-11-01

Fukunaga-Koontz transform (FKT), stemming from principal component analysis (PCA), is used in many pattern recognition and image-processing fields. It cannot capture the higher-order statistical property of natural images, so its detection performance is not satisfying. PCA has been extended into kernel PCA in order to capture the higher-order statistics. However, thus far there have been no researchers who have definitely proposed kernel FKT (KFKT) and researched its detection performance. For accurately detecting potential small targets from infrared images, we first extend FKT into KFKT to capture the higher-order statistical properties of images. Then a framework based on Kalman prediction and KFKT, which can automatically detect and track small targets, is developed. Results of experiments show that KFKT outperforms FKT and the proposed framework is competent to automatically detect and track infrared point targets.
Conceptualizing a Framework for Advanced Placement Statistics Teaching Knowledge

ERIC Educational Resources Information Center

Haines, Brenna

2015-01-01

The purpose of this article is to sketch a conceptualization of a framework for Advanced Placement (AP) Statistics Teaching Knowledge. Recent research continues to problematize the lack of knowledge and preparation among secondary level statistics teachers. The College Board's AP Statistics course continues to grow and gain popularity, but is a…
Bayesian inference and decision theory - A framework for decision making in natural resource management

USGS Publications Warehouse

Dorazio, R.M.; Johnson, F.A.

2003-01-01

Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.
Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis

PubMed Central

Ré, Miguel A.; Azad, Rajeev K.

2014-01-01

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338
Generalization of entropy based divergence measures for symbolic sequence analysis.

PubMed

Ré, Miguel A; Azad, Rajeev K

2014-01-01

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
The non-equilibrium allele frequency spectrum in a Poisson random field framework.

PubMed

Kaj, Ingemar; Mugal, Carina F

2016-10-01

In population genetic studies, the allele frequency spectrum (AFS) efficiently summarizes genome-wide polymorphism data and shapes a variety of allele frequency-based summary statistics. While existing theory typically features equilibrium conditions, emerging methodology requires an analytical understanding of the build-up of the allele frequencies over time. In this work, we use the framework of Poisson random fields to derive new representations of the non-equilibrium AFS for the case of a Wright-Fisher population model with selection. In our approach, the AFS is a scaling-limit of the expectation of a Poisson stochastic integral and the representation of the non-equilibrium AFS arises in terms of a fixation time probability distribution. The known duality between the Wright-Fisher diffusion process and a birth and death process generalizing Kingman's coalescent yields an additional representation. The results carry over to the setting of a random sample drawn from the population and provide the non-equilibrium behavior of sample statistics. Our findings are consistent with and extend a previous approach where the non-equilibrium AFS solves a partial differential forward equation with a non-traditional boundary condition. Moreover, we provide a bridge to previous coalescent-based work, and hence tie several frameworks together. Since frequency-based summary statistics are widely used in population genetics, for example, to identify candidate loci of adaptive evolution, to infer the demographic history of a population, or to improve our understanding of the underlying mechanics of speciation events, the presented results are potentially useful for a broad range of topics. Copyright © 2016 Elsevier Inc. All rights reserved.
Semiautomatic tumor segmentation with multimodal images in a conditional random field framework.

PubMed

Hu, Yu-Chi; Grossberg, Michael; Mageras, Gikas

2016-04-01

Volumetric medical images of a single subject can be acquired using different imaging modalities, such as computed tomography, magnetic resonance imaging (MRI), and positron emission tomography. In this work, we present a semiautomatic segmentation algorithm that can leverage the synergies between different image modalities while integrating interactive human guidance. The algorithm provides a statistical segmentation framework partly automating the segmentation task while still maintaining critical human oversight. The statistical models presented are trained interactively using simple brush strokes to indicate tumor and nontumor tissues and using intermediate results within a patient's image study. To accomplish the segmentation, we construct the energy function in the conditional random field (CRF) framework. For each slice, the energy function is set using the estimated probabilities from both user brush stroke data and prior approved segmented slices within a patient study. The progressive segmentation is obtained using a graph-cut-based minimization. Although no similar semiautomated algorithm is currently available, we evaluated our method with an MRI data set from Medical Image Computing and Computer Assisted Intervention Society multimodal brain segmentation challenge (BRATS 2012 and 2013) against a similar fully automatic method based on CRF and a semiautomatic method based on grow-cut, and our method shows superior performance.

Improved Bayesian Infrasonic Source Localization for regional infrasound

DOE PAGES

Blom, Philip S.; Marcillo, Omar; Arrowsmith, Stephen J.

2015-10-20

The Bayesian Infrasonic Source Localization (BISL) methodology is examined and simplified providing a generalized method of estimating the source location and time for an infrasonic event and the mathematical framework is used therein. The likelihood function describing an infrasonic detection used in BISL has been redefined to include the von Mises distribution developed in directional statistics and propagation-based, physically derived celerity-range and azimuth deviation models. Frameworks for constructing propagation-based celerity-range and azimuth deviation statistics are presented to demonstrate how stochastic propagation modelling methods can be used to improve the precision and accuracy of the posterior probability density function describing themore » source localization. Infrasonic signals recorded at a number of arrays in the western United States produced by rocket motor detonations at the Utah Test and Training Range are used to demonstrate the application of the new mathematical framework and to quantify the improvement obtained by using the stochastic propagation modelling methods. Moreover, using propagation-based priors, the spatial and temporal confidence bounds of the source decreased by more than 40 per cent in all cases and by as much as 80 per cent in one case. Further, the accuracy of the estimates remained high, keeping the ground truth within the 99 per cent confidence bounds for all cases.« less
Probabilistic Graphical Model Representation in Phylogenetics

PubMed Central

Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

2014-01-01

Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
Theory-based Bayesian models of inductive learning and reasoning.

PubMed

Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles

2006-07-01

Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.
Statistical Power Analysis with Microsoft Excel: Normal Tests for One or Two Means as a Prelude to Using Non-Central Distributions to Calculate Power

ERIC Educational Resources Information Center

Texeira, Antonio; Rosa, Alvaro; Calapez, Teresa

2009-01-01

This article presents statistical power analysis (SPA) based on the normal distribution using Excel, adopting textbook and SPA approaches. The objective is to present the latter in a comparative way within a framework that is familiar to textbook level readers, as a first step to understand SPA with other distributions. The analysis focuses on the…
Markov Random Fields, Stochastic Quantization and Image Analysis

DTIC Science & Technology

1990-01-01

Markov random fields based on the lattice Z2 have been extensively used in image analysis in a Bayesian framework as a-priori models for the...of Image Analysis can be given some fundamental justification then there is a remarkable connection between Probabilistic Image Analysis , Statistical Mechanics and Lattice-based Euclidean Quantum Field Theory.
Geometry of behavioral spaces: A computational approach to analysis and understanding of agent based models and agent behaviors

NASA Astrophysics Data System (ADS)

Cenek, Martin; Dahl, Spencer K.

2016-11-01

Systems with non-linear dynamics frequently exhibit emergent system behavior, which is important to find and specify rigorously to understand the nature of the modeled phenomena. Through this analysis, it is possible to characterize phenomena such as how systems assemble or dissipate and what behaviors lead to specific final system configurations. Agent Based Modeling (ABM) is one of the modeling techniques used to study the interaction dynamics between a system's agents and its environment. Although the methodology of ABM construction is well understood and practiced, there are no computational, statistically rigorous, comprehensive tools to evaluate an ABM's execution. Often, a human has to observe an ABM's execution in order to analyze how the ABM functions, identify the emergent processes in the agent's behavior, or study a parameter's effect on the system-wide behavior. This paper introduces a new statistically based framework to automatically analyze agents' behavior, identify common system-wide patterns, and record the probability of agents changing their behavior from one pattern of behavior to another. We use network based techniques to analyze the landscape of common behaviors in an ABM's execution. Finally, we test the proposed framework with a series of experiments featuring increasingly emergent behavior. The proposed framework will allow computational comparison of ABM executions, exploration of a model's parameter configuration space, and identification of the behavioral building blocks in a model's dynamics.
Geometry of behavioral spaces: A computational approach to analysis and understanding of agent based models and agent behaviors.

PubMed

Cenek, Martin; Dahl, Spencer K

2016-11-01

Systems with non-linear dynamics frequently exhibit emergent system behavior, which is important to find and specify rigorously to understand the nature of the modeled phenomena. Through this analysis, it is possible to characterize phenomena such as how systems assemble or dissipate and what behaviors lead to specific final system configurations. Agent Based Modeling (ABM) is one of the modeling techniques used to study the interaction dynamics between a system's agents and its environment. Although the methodology of ABM construction is well understood and practiced, there are no computational, statistically rigorous, comprehensive tools to evaluate an ABM's execution. Often, a human has to observe an ABM's execution in order to analyze how the ABM functions, identify the emergent processes in the agent's behavior, or study a parameter's effect on the system-wide behavior. This paper introduces a new statistically based framework to automatically analyze agents' behavior, identify common system-wide patterns, and record the probability of agents changing their behavior from one pattern of behavior to another. We use network based techniques to analyze the landscape of common behaviors in an ABM's execution. Finally, we test the proposed framework with a series of experiments featuring increasingly emergent behavior. The proposed framework will allow computational comparison of ABM executions, exploration of a model's parameter configuration space, and identification of the behavioral building blocks in a model's dynamics.
A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

PubMed

Zhou, Xiang

2017-12-01

Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Texas two-step: a framework for optimal multi-input single-output deconvolution.

PubMed

Neelamani, Ramesh; Deffenbaugh, Max; Baraniuk, Richard G

2007-11-01

Multi-input single-output deconvolution (MISO-D) aims to extract a deblurred estimate of a target signal from several blurred and noisy observations. This paper develops a new two step framework--Texas Two-Step--to solve MISO-D problems with known blurs. Texas Two-Step first reduces the MISO-D problem to a related single-input single-output deconvolution (SISO-D) problem by invoking the concept of sufficient statistics (SSs) and then solves the simpler SISO-D problem using an appropriate technique. The two-step framework enables new MISO-D techniques (both optimal and suboptimal) based on the rich suite of existing SISO-D techniques. In fact, the properties of SSs imply that a MISO-D algorithm is mean-squared-error optimal if and only if it can be rearranged to conform to the Texas Two-Step framework. Using this insight, we construct new wavelet- and curvelet-based MISO-D algorithms with asymptotically optimal performance. Simulated and real data experiments verify that the framework is indeed effective.
Statistical wave climate projections for coastal impact assessments

NASA Astrophysics Data System (ADS)

Camus, P.; Losada, I. J.; Izaguirre, C.; Espejo, A.; Menéndez, M.; Pérez, J.

2017-09-01

Global multimodel wave climate projections are obtained at 1.0° × 1.0° scale from 30 Coupled Model Intercomparison Project Phase 5 (CMIP5) global circulation model (GCM) realizations. A semi-supervised weather-typing approach based on a characterization of the ocean wave generation areas and the historical wave information from the recent GOW2 database are used to train the statistical model. This framework is also applied to obtain high resolution projections of coastal wave climate and coastal impacts as port operability and coastal flooding. Regional projections are estimated using the collection of weather types at spacing of 1.0°. This assumption is feasible because the predictor is defined based on the wave generation area and the classification is guided by the local wave climate. The assessment of future changes in coastal impacts is based on direct downscaling of indicators defined by empirical formulations (total water level for coastal flooding and number of hours per year with overtopping for port operability). Global multimodel projections of the significant wave height and peak period are consistent with changes obtained in previous studies. Statistical confidence of expected changes is obtained due to the large number of GCMs to construct the ensemble. The proposed methodology is proved to be flexible to project wave climate at different spatial scales. Regional changes of additional variables as wave direction or other statistics can be estimated from the future empirical distribution with extreme values restricted to high percentiles (i.e., 95th, 99th percentiles). The statistical framework can also be applied to evaluate regional coastal impacts integrating changes in storminess and sea level rise.
Overarching framework for data-based modelling

NASA Astrophysics Data System (ADS)

Schelter, Björn; Mader, Malenka; Mader, Wolfgang; Sommerlade, Linda; Platt, Bettina; Lai, Ying-Cheng; Grebogi, Celso; Thiel, Marco

2014-02-01

One of the main modelling paradigms for complex physical systems are networks. When estimating the network structure from measured signals, typically several assumptions such as stationarity are made in the estimation process. Violating these assumptions renders standard analysis techniques fruitless. We here propose a framework to estimate the network structure from measurements of arbitrary non-linear, non-stationary, stochastic processes. To this end, we propose a rigorous mathematical theory that underlies this framework. Based on this theory, we present a highly efficient algorithm and the corresponding statistics that are immediately sensibly applicable to measured signals. We demonstrate its performance in a simulation study. In experiments of transitions between vigilance stages in rodents, we infer small network structures with complex, time-dependent interactions; this suggests biomarkers for such transitions, the key to understand and diagnose numerous diseases such as dementia. We argue that the suggested framework combines features that other approaches followed so far lack.
Imperial College near infrared spectroscopy neuroimaging analysis framework.

PubMed

Orihuela-Espina, Felipe; Leff, Daniel R; James, David R C; Darzi, Ara W; Yang, Guang-Zhong

2018-01-01

This paper describes the Imperial College near infrared spectroscopy neuroimaging analysis (ICNNA) software tool for functional near infrared spectroscopy neuroimaging data. ICNNA is a MATLAB-based object-oriented framework encompassing an application programming interface and a graphical user interface. ICNNA incorporates reconstruction based on the modified Beer-Lambert law and basic processing and data validation capabilities. Emphasis is placed on the full experiment rather than individual neuroimages as the central element of analysis. The software offers three types of analyses including classical statistical methods based on comparison of changes in relative concentrations of hemoglobin between the task and baseline periods, graph theory-based metrics of connectivity and, distinctively, an analysis approach based on manifold embedding. This paper presents the different capabilities of ICNNA in its current version.
Comparative analysis on the selection of number of clusters in community detection

NASA Astrophysics Data System (ADS)

Kawamoto, Tatsuro; Kabashima, Yoshiyuki

2018-02-01

We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design

PubMed Central

Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.

2018-01-01

When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework’s performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available. PMID:29887668
Sequential-Optimization-Based Framework for Robust Modeling and Design of Heterogeneous Catalytic Systems

DOE PAGES

Rangarajan, Srinivas; Maravelias, Christos T.; Mavrikakis, Manos

2017-11-09

Here, we present a general optimization-based framework for (i) ab initio and experimental data driven mechanistic modeling and (ii) optimal catalyst design of heterogeneous catalytic systems. Both cases are formulated as a nonlinear optimization problem that is subject to a mean-field microkinetic model and thermodynamic consistency requirements as constraints, for which we seek sparse solutions through a ridge (L 2 regularization) penalty. The solution procedure involves an iterative sequence of forward simulation of the differential algebraic equations pertaining to the microkinetic model using a numerical tool capable of handling stiff systems, sensitivity calculations using linear algebra, and gradient-based nonlinear optimization.more » A multistart approach is used to explore the solution space, and a hierarchical clustering procedure is implemented for statistically classifying potentially competing solutions. An example of methanol synthesis through hydrogenation of CO and CO 2 on a Cu-based catalyst is used to illustrate the framework. The framework is fast, is robust, and can be used to comprehensively explore the model solution and design space of any heterogeneous catalytic system.« less
Sequential-Optimization-Based Framework for Robust Modeling and Design of Heterogeneous Catalytic Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rangarajan, Srinivas; Maravelias, Christos T.; Mavrikakis, Manos

Here, we present a general optimization-based framework for (i) ab initio and experimental data driven mechanistic modeling and (ii) optimal catalyst design of heterogeneous catalytic systems. Both cases are formulated as a nonlinear optimization problem that is subject to a mean-field microkinetic model and thermodynamic consistency requirements as constraints, for which we seek sparse solutions through a ridge (L 2 regularization) penalty. The solution procedure involves an iterative sequence of forward simulation of the differential algebraic equations pertaining to the microkinetic model using a numerical tool capable of handling stiff systems, sensitivity calculations using linear algebra, and gradient-based nonlinear optimization.more » A multistart approach is used to explore the solution space, and a hierarchical clustering procedure is implemented for statistically classifying potentially competing solutions. An example of methanol synthesis through hydrogenation of CO and CO 2 on a Cu-based catalyst is used to illustrate the framework. The framework is fast, is robust, and can be used to comprehensively explore the model solution and design space of any heterogeneous catalytic system.« less
Characterizing behavioural ‘characters’: an evolutionary framework

PubMed Central

Araya-Ajoy, Yimen G.; Dingemanse, Niels J.

2014-01-01

Biologists often study phenotypic evolution assuming that phenotypes consist of a set of quasi-independent units that have been shaped by selection to accomplish a particular function. In the evolutionary literature, such quasi-independent functional units are called ‘evolutionary characters’, and a framework based on evolutionary principles has been developed to characterize them. This framework mainly focuses on ‘fixed’ characters, i.e. those that vary exclusively between individuals. In this paper, we introduce multi-level variation and thereby expand the framework to labile characters, focusing on behaviour as a worked example. We first propose a concept of ‘behavioural characters’ based on the original evolutionary character concept. We then detail how integration of variation between individuals (cf. ‘personality’) and within individuals (cf. ‘individual plasticity’) into the framework gives rise to a whole suite of novel testable predictions about the evolutionary character concept. We further propose a corresponding statistical methodology to test whether observed behaviours should be considered expressions of a hypothesized evolutionary character. We illustrate the application of our framework by characterizing the behavioural character ‘aggressiveness’ in wild great tits, Parus major. PMID:24335984
A research framework for pharmacovigilance in health social media: Identification and evaluation of patient adverse drug event reports.

PubMed

Liu, Xiao; Chen, Hsinchun

2015-12-01

Social media offer insights of patients' medical problems such as drug side effects and treatment failures. Patient reports of adverse drug events from social media have great potential to improve current practice of pharmacovigilance. However, extracting patient adverse drug event reports from social media continues to be an important challenge for health informatics research. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance patient reported adverse drug event extraction. The framework consists of medical entity extraction for recognizing patient discussions of drug and events, adverse drug event extraction with shortest dependency path kernel based statistical learning method and semantic filtering with information from medical knowledge bases, and report source classification to tease out noise. To evaluate the proposed framework, a series of experiments were conducted on a test bed encompassing about postings from major diabetes and heart disease forums in the United States. The results reveal that each component of the framework significantly contributes to its overall effectiveness. Our framework significantly outperforms prior work. Published by Elsevier Inc.
Framework for making better predictions by directly estimating variables' predictivity.

PubMed

Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

2016-12-13

We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.
Predicted range expansion of Chinese tallow tree (Triadica sebifera) in forestlands of the southern United States

Treesearch

Hsiao-Hsuan Wang; William Grant; Todd Swannack; Jianbang Gan; William Rogers; Tomasz Koralewski; James Miller; John W. Taylor Jr.

2011-01-01

We present an integrated approach for predicting future range expansion of an invasive species (Chinese tallow tree) that incorporates statistical forecasting and analytical techniques within a spatially explicit, agent-based, simulation framework.

Pre-crash scenario framework for crash avoidance systems based on vehicle-to-vehicle communications

DOT National Transportation Integrated Search

2011-06-13

This paper prioritizes and statistically describes precrash : scenarios as a basis for the identification of : crash avoidance functions enhanced or enabled by : vehicle-to-vehicle (V2V) communication technology. : Pre-crash scenarios depict vehicle ...
Combined data preprocessing and multivariate statistical analysis characterizes fed-batch culture of mouse hybridoma cells for rational medium design.

PubMed

Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup

2010-10-01

We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.
Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

PubMed Central

Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

2015-01-01

Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175
SparRec: An effective matrix completion framework of missing data imputation for GWAS

NASA Astrophysics Data System (ADS)

Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

2016-10-01

Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
The forecasting of menstruation based on a state-space modeling of basal body temperature time series.

PubMed

Fukaya, Keiichi; Kawamori, Ai; Osada, Yutaka; Kitazawa, Masumi; Ishiguro, Makio

2017-09-20

Women's basal body temperature (BBT) shows a periodic pattern that associates with menstrual cycle. Although this fact suggests a possibility that daily BBT time series can be useful for estimating the underlying phase state as well as for predicting the length of current menstrual cycle, little attention has been paid to model BBT time series. In this study, we propose a state-space model that involves the menstrual phase as a latent state variable to explain the daily fluctuation of BBT and the menstruation cycle length. Conditional distributions of the phase are obtained by using sequential Bayesian filtering techniques. A predictive distribution of the next menstruation day can be derived based on this conditional distribution and the model, leading to a novel statistical framework that provides a sequentially updated prediction for upcoming menstruation day. We applied this framework to a real data set of women's BBT and menstruation days and compared prediction accuracy of the proposed method with that of previous methods, showing that the proposed method generally provides a better prediction. Because BBT can be obtained with relatively small cost and effort, the proposed method can be useful for women's health management. Potential extensions of this framework as the basis of modeling and predicting events that are associated with the menstrual cycles are discussed. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
A novel probabilistic framework for event-based speech recognition

NASA Astrophysics Data System (ADS)

Juneja, Amit; Espy-Wilson, Carol

2003-10-01

One of the reasons for unsatisfactory performance of the state-of-the-art automatic speech recognition (ASR) systems is the inferior acoustic modeling of low-level acoustic-phonetic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal, but such a system for continuous speech recognition (CSR) is not known to exist. A probabilistic and statistical framework for CSR based on the idea of the representation of speech sounds by bundles of binary valued articulatory phonetic features is proposed. Multiple probabilistic sequences of linguistically motivated landmarks are obtained using binary classifiers of manner phonetic features-syllabic, sonorant and continuant-and the knowledge-based acoustic parameters (APs) that are acoustic correlates of those features. The landmarks are then used for the extraction of knowledge-based APs for source and place phonetic features and their binary classification. Probabilistic landmark sequences are constrained using manner class language models for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge-based systems that led the ASR community to switch to systems highly dependent on statistical pattern analysis methods and probabilistic language or grammar models.
A framework for incorporating DTI Atlas Builder registration into Tract-Based Spatial Statistics and a simulated comparison to standard TBSS.

PubMed

Leming, Matthew; Steiner, Rachel; Styner, Martin

2016-02-27

Tract-based spatial statistics (TBSS) 6 is a software pipeline widely employed in comparative analysis of the white matter integrity from diffusion tensor imaging (DTI) datasets. In this study, we seek to evaluate the relationship between different methods of atlas registration for use with TBSS and different measurements of DTI (fractional anisotropy, FA, axial diffusivity, AD, radial diffusivity, RD, and medial diffusivity, MD). To do so, we have developed a novel tool that builds on existing diffusion atlas building software, integrating it into an adapted version of TBSS called DAB-TBSS (DTI Atlas Builder-Tract-Based Spatial Statistics) by using the advanced registration offered in DTI Atlas Builder 7 . To compare the effectiveness of these two versions of TBSS, we also propose a framework for simulating population differences for diffusion tensor imaging data, providing a more substantive means of empirically comparing DTI group analysis programs such as TBSS. In this study, we used 33 diffusion tensor imaging datasets and simulated group-wise changes in this data by increasing, in three different simulations, the principal eigenvalue (directly altering AD), the second and third eigenvalues (RD), and all three eigenvalues (MD) in the genu, the right uncinate fasciculus, and the left IFO. Additionally, we assessed the benefits of comparing the tensors directly using a functional analysis of diffusion tensor tract statistics (FADTTS 10 ). Our results indicate comparable levels of FA-based detection between DAB-TBSS and TBSS, with standard TBSS registration reporting a higher rate of false positives in other measurements of DTI. Within the simulated changes investigated here, this study suggests that the use of DTI Atlas Builder's registration enhances TBSS group-based studies.
Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics

PubMed Central

Brusniak, Mi-Youn; Bodenmiller, Bernd; Campbell, David; Cooke, Kelly; Eddes, James; Garbutt, Andrew; Lau, Hollis; Letarte, Simon; Mueller, Lukas N; Sharma, Vagisha; Vitek, Olga; Zhang, Ning; Aebersold, Ruedi; Watts, Julian D

2008-01-01

Background Quantitative proteomics holds great promise for identifying proteins that are differentially abundant between populations representing different physiological or disease states. A range of computational tools is now available for both isotopically labeled and label-free liquid chromatography mass spectrometry (LC-MS) based quantitative proteomics. However, they are generally not comparable to each other in terms of functionality, user interfaces, information input/output, and do not readily facilitate appropriate statistical data analysis. These limitations, along with the array of choices, present a daunting prospect for biologists, and other researchers not trained in bioinformatics, who wish to use LC-MS-based quantitative proteomics. Results We have developed Corra, a computational framework and tools for discovery-based LC-MS proteomics. Corra extends and adapts existing algorithms used for LC-MS-based proteomics, and statistical algorithms, originally developed for microarray data analyses, appropriate for LC-MS data analysis. Corra also adapts software engineering technologies (e.g. Google Web Toolkit, distributed processing) so that computationally intense data processing and statistical analyses can run on a remote server, while the user controls and manages the process from their own computer via a simple web interface. Corra also allows the user to output significantly differentially abundant LC-MS-detected peptide features in a form compatible with subsequent sequence identification via tandem mass spectrometry (MS/MS). We present two case studies to illustrate the application of Corra to commonly performed LC-MS-based biological workflows: a pilot biomarker discovery study of glycoproteins isolated from human plasma samples relevant to type 2 diabetes, and a study in yeast to identify in vivo targets of the protein kinase Ark1 via phosphopeptide profiling. Conclusion The Corra computational framework leverages computational innovation to enable biologists or other researchers to process, analyze and visualize LC-MS data with what would otherwise be a complex and not user-friendly suite of tools. Corra enables appropriate statistical analyses, with controlled false-discovery rates, ultimately to inform subsequent targeted identification of differentially abundant peptides by MS/MS. For the user not trained in bioinformatics, Corra represents a complete, customizable, free and open source computational platform enabling LC-MS-based proteomic workflows, and as such, addresses an unmet need in the LC-MS proteomics field. PMID:19087345
Addressing the issue of insufficient information in data-based bridge health monitoring : final report.

DOT National Transportation Integrated Search

2015-11-01

One of the most efficient ways to solve the damage detection problem using the statistical pattern recognition : approach is that of exploiting the methods of outlier analysis. Cast within the pattern recognition framework, : damage detection assesse...
Statistical label fusion with hierarchical performance models

PubMed Central

Asman, Andrew J.; Dagley, Alexander S.; Landman, Bennett A.

2014-01-01

Label fusion is a critical step in many image segmentation frameworks (e.g., multi-atlas segmentation) as it provides a mechanism for generalizing a collection of labeled examples into a single estimate of the underlying segmentation. In the multi-label case, typical label fusion algorithms treat all labels equally – fully neglecting the known, yet complex, anatomical relationships exhibited in the data. To address this problem, we propose a generalized statistical fusion framework using hierarchical models of rater performance. Building on the seminal work in statistical fusion, we reformulate the traditional rater performance model from a multi-tiered hierarchical perspective. This new approach provides a natural framework for leveraging known anatomical relationships and accurately modeling the types of errors that raters (or atlases) make within a hierarchically consistent formulation. Herein, we describe several contributions. First, we derive a theoretical advancement to the statistical fusion framework that enables the simultaneous estimation of multiple (hierarchical) performance models within the statistical fusion context. Second, we demonstrate that the proposed hierarchical formulation is highly amenable to the state-of-the-art advancements that have been made to the statistical fusion framework. Lastly, in an empirical whole-brain segmentation task we demonstrate substantial qualitative and significant quantitative improvement in overall segmentation accuracy. PMID:24817809
IMAGINE: Interstellar MAGnetic field INference Engine

NASA Astrophysics Data System (ADS)

Steininger, Theo

2018-03-01

IMAGINE (Interstellar MAGnetic field INference Engine) performs inference on generic parametric models of the Galaxy. The modular open source framework uses highly optimized tools and technology such as the MultiNest sampler (ascl:1109.006) and the information field theory framework NIFTy (ascl:1302.013) to create an instance of the Milky Way based on a set of parameters for physical observables, using Bayesian statistics to judge the mismatch between measured data and model prediction. The flexibility of the IMAGINE framework allows for simple refitting for newly available data sets and makes state-of-the-art Bayesian methods easily accessible particularly for random components of the Galactic magnetic field.
Item Analysis Appropriate for Domain-Referenced Classroom Testing. (Project Technical Report Number 1).

ERIC Educational Resources Information Center

Nitko, Anthony J.; Hsu, Tse-chi

Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…
Statistical Analysis of Complexity Generators for Cost Estimation

NASA Technical Reports Server (NTRS)

Rowell, Ginger Holmes

1999-01-01

Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.
Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

USGS Publications Warehouse

Lee, L.; Helsel, D.

2007-01-01

Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Indicator-based approach to assess sustainability of current and projected water use in Korea

NASA Astrophysics Data System (ADS)

Kong, I.; Kim, I., Sr.

2016-12-01

Recently occurred failures in water supply system derived from lacking rainfall in Korea has raised severe concerns about limited water resources exacerbated by anthropogenic drivers as well as climatic changes. Since Korea is under unprecedented changes in both social and environmental aspects, it is required to integrate social and environmental changes as well as climate factors in order to consider underlying problems and their upcoming impacts on sustainable water use. In this study, we proposed a framework to assess multilateral aspects in sustainable water use in support of performance-based monitoring. The framework is consisted of four thematic indices (climate, infrastructure, pollution, and management capacity) and subordinate indicators. Second, in order to project future circumstances, climate variability, demographic, and land cover scenarios to 2050 were applied after conducting statistical analysis identifying correlations between indicators within the framework since water crisis are caused by numerous interrelated factors. Assessment was conducted throughout 161 administrative boundaries in Korea at the time of 2010, 2030, and 2050. Third, current and future status in water use were illustrated using GIS-based methodology and statistical clustering (K-means and HCA) to elucidate spatially explicit maps and to categorize administrative regions showing similar phenomenon in the future. Based on conspicuous results shown in spatial analysis and clustering method, we suggested policy implementations to navigate local communities to decide which countermeasures should be supplemented or adopted to increase resiliency to upcoming changes in water use environments.
Shiny FHIR: An Integrated Framework Leveraging Shiny R and HL7 FHIR to Empower Standards-Based Clinical Data Applications.

PubMed

Hong, Na; Prodduturi, Naresh; Wang, Chen; Jiang, Guoqian

2017-01-01

In this study, we describe our efforts in building a clinical statistics and analysis application platform using an emerging clinical data standard, HL7 FHIR, and an open source web application framework, Shiny. We designed two primary workflows that integrate a series of R packages to enable both patient-centered and cohort-based interactive analyses. We leveraged Shiny with R to develop interactive interfaces on FHIR-based data and used ovarian cancer study datasets as a use case to implement a prototype. Specifically, we implemented patient index, patient-centered data report and analysis, and cohort analysis. The evaluation of our study was performed by testing the adaptability of the framework on two public FHIR servers. We identify common research requirements and current outstanding issues, and discuss future enhancement work of the current studies. Overall, our study demonstrated that it is feasible to use Shiny for implementing interactive analysis on FHIR-based standardized clinical data.
Risk Driven Outcome-Based Command and Control (C2) Assessment

DTIC Science & Technology

2000-01-01

shaping the risk ranking scores into more interpretable and statistically sound risk measures. Regression analysis was applied to determine what...Architecture Framework Implementation, AFCEA Coursebook 503J, February 8-11, 2000, San Diego, California. [Morgan and Henrion, 1990] M. Granger Morgan and
Increasing Army Supply Chain Performance: Using an Integrated End to End Metrics System

DTIC Science & Technology

2017-01-01

Sched Deliver Sched Delinquent Contracts Current Metrics PQDR/SDRs Forecasting Accuracy Reliability Demand Management Asset Mgmt Strategies Pipeline...are identified and characterized by statistical analysis. The study proposed a framework and tool for inventory management based on factors such as
A Stochastic Framework for Evaluating Seizure Prediction Algorithms Using Hidden Markov Models

PubMed Central

Wong, Stephen; Gardner, Andrew B.; Krieger, Abba M.; Litt, Brian

2007-01-01

Responsive, implantable stimulation devices to treat epilepsy are now in clinical trials. New evidence suggests that these devices may be more effective when they deliver therapy before seizure onset. Despite years of effort, prospective seizure prediction, which could improve device performance, remains elusive. In large part, this is explained by lack of agreement on a statistical framework for modeling seizure generation and a method for validating algorithm performance. We present a novel stochastic framework based on a three-state hidden Markov model (HMM) (representing interictal, preictal, and seizure states) with the feature that periods of increased seizure probability can transition back to the interictal state. This notion reflects clinical experience and may enhance interpretation of published seizure prediction studies. Our model accommodates clipped EEG segments and formalizes intuitive notions regarding statistical validation. We derive equations for type I and type II errors as a function of the number of seizures, duration of interictal data, and prediction horizon length and we demonstrate the model’s utility with a novel seizure detection algorithm that appeared to predicted seizure onset. We propose this framework as a vital tool for designing and validating prediction algorithms and for facilitating collaborative research in this area. PMID:17021032
Complex networks as a unified framework for descriptive analysis and predictive modeling in climate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R

The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less

Initial Systematic Investigations of the Weakly Coupled Free Fermionic Heterotic String Landscape Statistics

NASA Astrophysics Data System (ADS)

Renner, Timothy

2011-12-01

A C++ framework was constructed with the explicit purpose of systematically generating string models using the Weakly Coupled Free Fermionic Heterotic String (WCFFHS) method. The software, optimized for speed, generality, and ease of use, has been used to conduct preliminary systematic investigations of WCFFHS vacua. Documentation for this framework is provided in the Appendix. After an introduction to theoretical and computational aspects of WCFFHS model building, a study of ten-dimensional WCFFHS models is presented. Degeneracies among equivalent expressions of each of the known models are investigated and classified. A study of more phenomenologically realistic four-dimensional models based on the well known "NAHE" set is then presented, with statistics being reported on gauge content, matter representations, and space-time supersymmetries. The final study is a parallel to the NAHE study in which a variation of the NAHE set is systematically extended and examined statistically. Special attention is paid to models with "mirroring"---identical observable and hidden sector gauge groups and matter representations.
Proof of Concept for an Approach to a Finer Resolution Inventory

Treesearch

Chris J. Cieszewski; Kim Iles; Roger C. Lowe; Michal Zasada

2005-01-01

This report presents a proof of concept for a statistical framework to develop a timely, accurate, and unbiased fiber supply assessment in the State of Georgia, U.S.A. The proposed approach is based on using various data sources and modeling techniques to calibrate satellite image-based statewide stand lists, which provide initial estimates for a State inventory on a...
Knowledge Discovery from Vibration Measurements

PubMed Central

Li, Jian; Wang, Daoyao

2014-01-01

The framework as well as the particular algorithms of pattern recognition process is widely adopted in structural health monitoring (SHM). However, as a part of the overall process of knowledge discovery from data bases (KDD), the results of pattern recognition are only changes and patterns of changes of data features. In this paper, based on the similarity between KDD and SHM and considering the particularity of SHM problems, a four-step framework of SHM is proposed which extends the final goal of SHM from detecting damages to extracting knowledge to facilitate decision making. The purposes and proper methods of each step of this framework are discussed. To demonstrate the proposed SHM framework, a specific SHM method which is composed by the second order structural parameter identification, statistical control chart analysis, and system reliability analysis is then presented. To examine the performance of this SHM method, real sensor data measured from a lab size steel bridge model structure are used. The developed four-step framework of SHM has the potential to clarify the process of SHM to facilitate the further development of SHM techniques. PMID:24574933
UNITY: Confronting Supernova Cosmology's Statistical and Systematic Uncertainties in a Unified Bayesian Framework

NASA Astrophysics Data System (ADS)

Rubin, D.; Aldering, G.; Barbary, K.; Boone, K.; Chappell, G.; Currie, M.; Deustua, S.; Fagrelius, P.; Fruchter, A.; Hayden, B.; Lidman, C.; Nordin, J.; Perlmutter, S.; Saunders, C.; Sofiatti, C.; Supernova Cosmology Project, The

2015-11-01

While recent supernova (SN) cosmology research has benefited from improved measurements, current analysis approaches are not statistically optimal and will prove insufficient for future surveys. This paper discusses the limitations of current SN cosmological analyses in treating outliers, selection effects, shape- and color-standardization relations, unexplained dispersion, and heterogeneous observations. We present a new Bayesian framework, called UNITY (Unified Nonlinear Inference for Type-Ia cosmologY), that incorporates significant improvements in our ability to confront these effects. We apply the framework to real SN observations and demonstrate smaller statistical and systematic uncertainties. We verify earlier results that SNe Ia require nonlinear shape and color standardizations, but we now include these nonlinear relations in a statistically well-justified way. This analysis was primarily performed blinded, in that the basic framework was first validated on simulated data before transitioning to real data. We also discuss possible extensions of the method.
Density profiles in the Scrape-Off Layer interpreted through filament dynamics

NASA Astrophysics Data System (ADS)

Militello, Fulvio

2017-10-01

We developed a new theoretical framework to clarify the relation between radial Scrape-Off Layer density profiles and the fluctuations that generate them. The framework provides an interpretation of the experimental features of the profiles and of the turbulence statistics on the basis of simple properties of the filaments, such as their radial motion and their draining towards the divertor. L-mode and inter-ELM filaments are described as a Poisson process in which each event is independent and modelled with a wave function of amplitude and width statistically distributed according to experimental observations and evolving according to fluid equations. We will rigorously show that radially accelerating filaments, less efficient parallel exhaust and also a statistical distribution of their radial velocity can contribute to induce flatter profiles in the far SOL and therefore enhance plasma-wall interactions. A quite general result of our analysis is the resiliency of this non-exponential nature of the profiles and the increase of the relative fluctuation amplitude towards the wall, as experimentally observed. According to the framework, profile broadening at high fueling rates can be caused by interactions with neutrals (e.g. charge exchange) in the divertor or by a significant radial acceleration of the filaments. The framework assumptions were tested with 3D numerical simulations of seeded SOL filaments based on a two fluid model. In particular, filaments interact through the electrostatic field they generate only when they are in close proximity (separation comparable to their width in the drift plane), thus justifying our independence hypothesis. In addition, we will discuss how isolated filament motion responds to variations in the plasma conditions, and specifically divertor conditions. Finally, using the theoretical framework we will reproduce and interpret experimental results obtained on JET, MAST and HL-2A.
Pixels, Blocks of Pixels, and Polygons: Choosing a Spatial Unit for Thematic Accuracy Assessment

EPA Science Inventory

Pixels, polygons, and blocks of pixels are all potentially viable spatial assessment units for conducting an accuracy assessment. We develop a statistical population-based framework to examine how the spatial unit chosen affects the outcome of an accuracy assessment. The populati...
Background Knowledge in Learning-Based Relation Extraction

ERIC Educational Resources Information Center

Do, Quang Xuan

2012-01-01

In this thesis, we study the importance of background knowledge in relation extraction systems. We not only demonstrate the benefits of leveraging background knowledge to improve the systems' performance but also propose a principled framework that allows one to effectively incorporate knowledge into statistical machine learning models for…
Tatool: a Java-based open-source programming framework for psychological studies.

PubMed

von Bastian, Claudia C; Locher, André; Ruflin, Michael

2013-03-01

Tatool (Training and Testing Tool) was developed to assist researchers with programming training software, experiments, and questionnaires. Tatool is Java-based, and thus is a platform-independent and object-oriented framework. The architecture was designed to meet the requirements of experimental designs and provides a large number of predefined functions that are useful in psychological studies. Tatool comprises features crucial for training studies (e.g., configurable training schedules, adaptive training algorithms, and individual training statistics) and allows for running studies online via Java Web Start. The accompanying "Tatool Online" platform provides the possibility to manage studies and participants' data easily with a Web-based interface. Tatool is published open source under the GNU Lesser General Public License, and is available at www.tatool.ch.
Low-complexity stochastic modeling of wall-bounded shear flows

NASA Astrophysics Data System (ADS)

Zare, Armin

Turbulent flows are ubiquitous in nature and they appear in many engineering applications. Transition to turbulence, in general, increases skin-friction drag in air/water vehicles compromising their fuel-efficiency and reduces the efficiency and longevity of wind turbines. While traditional flow control techniques combine physical intuition with costly experiments, their effectiveness can be significantly enhanced by control design based on low-complexity models and optimization. In this dissertation, we develop a theoretical and computational framework for the low-complexity stochastic modeling of wall-bounded shear flows. Part I of the dissertation is devoted to the development of a modeling framework which incorporates data-driven techniques to refine physics-based models. We consider the problem of completing partially known sample statistics in a way that is consistent with underlying stochastically driven linear dynamics. Neither the statistics nor the dynamics are precisely known. Thus, our objective is to reconcile the two in a parsimonious manner. To this end, we formulate optimization problems to identify the dynamics and directionality of input excitation in order to explain and complete available covariance data. For problem sizes that general-purpose solvers cannot handle, we develop customized optimization algorithms based on alternating direction methods. The solution to the optimization problem provides information about critical directions that have maximal effect in bringing model and statistics in agreement. In Part II, we employ our modeling framework to account for statistical signatures of turbulent channel flow using low-complexity stochastic dynamical models. We demonstrate that white-in-time stochastic forcing is not sufficient to explain turbulent flow statistics and develop models for colored-in-time forcing of the linearized Navier-Stokes equations. We also examine the efficacy of stochastically forced linearized NS equations and their parabolized equivalents in the receptivity analysis of velocity fluctuations to external sources of excitation as well as capturing the effect of the slowly-varying base flow on streamwise streaks and Tollmien-Schlichting waves. In Part III, we develop a model-based approach to design surface actuation of turbulent channel flow in the form of streamwise traveling waves. This approach is capable of identifying the drag reducing trends of traveling waves in a simulation-free manner. We also use the stochastically forced linearized NS equations to examine the Reynolds number independent effects of spanwise wall oscillations on drag reduction in turbulent channel flows. This allows us to extend the predictive capability of our simulation-free approach to high Reynolds numbers.
Toward statistical modeling of saccadic eye-movement and visual saliency.

PubMed

Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

2014-11-01

In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.
A Query Expansion Framework in Image Retrieval Domain Based on Local and Global Analysis

PubMed Central

Rahman, M. M.; Antani, S. K.; Thoma, G. R.

2011-01-01

We present an image retrieval framework based on automatic query expansion in a concept feature space by generalizing the vector space model of information retrieval. In this framework, images are represented by vectors of weighted concepts similar to the keyword-based representation used in text retrieval. To generate the concept vocabularies, a statistical model is built by utilizing Support Vector Machine (SVM)-based classification techniques. The images are represented as “bag of concepts” that comprise perceptually and/or semantically distinguishable color and texture patches from local image regions in a multi-dimensional feature space. To explore the correlation between the concepts and overcome the assumption of feature independence in this model, we propose query expansion techniques in the image domain from a new perspective based on both local and global analysis. For the local analysis, the correlations between the concepts based on the co-occurrence pattern, and the metrical constraints based on the neighborhood proximity between the concepts in encoded images, are analyzed by considering local feedback information. We also analyze the concept similarities in the collection as a whole in the form of a similarity thesaurus and propose an efficient query expansion based on the global analysis. The experimental results on a photographic collection of natural scenes and a biomedical database of different imaging modalities demonstrate the effectiveness of the proposed framework in terms of precision and recall. PMID:21822350
A two-dimensional statistical framework connecting thermodynamic profiles with filaments in the scrape off layer and application to experiments

NASA Astrophysics Data System (ADS)

Militello, F.; Farley, T.; Mukhi, K.; Walkden, N.; Omotani, J. T.

2018-05-01

A statistical framework was introduced in Militello and Omotani [Nucl. Fusion 56, 104004 (2016)] to correlate the dynamics and statistics of L-mode and inter-ELM plasma filaments with the radial profiles of thermodynamic quantities they generate in the Scrape Off Layer. This paper extends the framework to cases in which the filaments are emitted from the separatrix at different toroidal positions and with a finite toroidal velocity. It is found that the toroidal velocity does not affect the profiles, while the toroidal distribution of filament emission renormalises the waiting time between two events. Experimental data collected by visual camera imaging are used to evaluate the statistics of the fluctuations, to inform the choice of the probability distribution functions used in the application of the framework. It is found that the toroidal separation of the filaments is exponentially distributed, thus suggesting the lack of a toroidal modal structure. Finally, using these measurements, the framework is applied to an experimental case and good agreement is found.
A Stochastic Framework for Modeling the Population Dynamics of Convective Clouds

DOE PAGES

Hagos, Samson; Feng, Zhe; Plant, Robert S.; ...

2018-02-20

A stochastic prognostic framework for modeling the population dynamics of convective clouds and representing them in climate models is proposed. The framework follows the nonequilibrium statistical mechanical approach to constructing a master equation for representing the evolution of the number of convective cells of a specific size and their associated cloud-base mass flux, given a large-scale forcing. In this framework, referred to as STOchastic framework for Modeling Population dynamics of convective clouds (STOMP), the evolution of convective cell size is predicted from three key characteristics of convective cells: (i) the probability of growth, (ii) the probability of decay, and (iii)more » the cloud-base mass flux. STOMP models are constructed and evaluated against CPOL radar observations at Darwin and convection permitting model (CPM) simulations. Multiple models are constructed under various assumptions regarding these three key parameters and the realisms of these models are evaluated. It is shown that in a model where convective plumes prefer to aggregate spatially and the cloud-base mass flux is a nonlinear function of convective cell area, the mass flux manifests a recharge-discharge behavior under steady forcing. Such a model also produces observed behavior of convective cell populations and CPM simulated cloud-base mass flux variability under diurnally varying forcing. Finally, in addition to its use in developing understanding of convection processes and the controls on convective cell size distributions, this modeling framework is also designed to serve as a nonequilibrium closure formulations for spectral mass flux parameterizations.« less
A Stochastic Framework for Modeling the Population Dynamics of Convective Clouds

NASA Astrophysics Data System (ADS)

Hagos, Samson; Feng, Zhe; Plant, Robert S.; Houze, Robert A.; Xiao, Heng

2018-02-01

A stochastic prognostic framework for modeling the population dynamics of convective clouds and representing them in climate models is proposed. The framework follows the nonequilibrium statistical mechanical approach to constructing a master equation for representing the evolution of the number of convective cells of a specific size and their associated cloud-base mass flux, given a large-scale forcing. In this framework, referred to as STOchastic framework for Modeling Population dynamics of convective clouds (STOMP), the evolution of convective cell size is predicted from three key characteristics of convective cells: (i) the probability of growth, (ii) the probability of decay, and (iii) the cloud-base mass flux. STOMP models are constructed and evaluated against CPOL radar observations at Darwin and convection permitting model (CPM) simulations. Multiple models are constructed under various assumptions regarding these three key parameters and the realisms of these models are evaluated. It is shown that in a model where convective plumes prefer to aggregate spatially and the cloud-base mass flux is a nonlinear function of convective cell area, the mass flux manifests a recharge-discharge behavior under steady forcing. Such a model also produces observed behavior of convective cell populations and CPM simulated cloud-base mass flux variability under diurnally varying forcing. In addition to its use in developing understanding of convection processes and the controls on convective cell size distributions, this modeling framework is also designed to serve as a nonequilibrium closure formulations for spectral mass flux parameterizations.
A Stochastic Framework for Modeling the Population Dynamics of Convective Clouds

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hagos, Samson; Feng, Zhe; Plant, Robert S.

A stochastic prognostic framework for modeling the population dynamics of convective clouds and representing them in climate models is proposed. The framework follows the nonequilibrium statistical mechanical approach to constructing a master equation for representing the evolution of the number of convective cells of a specific size and their associated cloud-base mass flux, given a large-scale forcing. In this framework, referred to as STOchastic framework for Modeling Population dynamics of convective clouds (STOMP), the evolution of convective cell size is predicted from three key characteristics of convective cells: (i) the probability of growth, (ii) the probability of decay, and (iii)more » the cloud-base mass flux. STOMP models are constructed and evaluated against CPOL radar observations at Darwin and convection permitting model (CPM) simulations. Multiple models are constructed under various assumptions regarding these three key parameters and the realisms of these models are evaluated. It is shown that in a model where convective plumes prefer to aggregate spatially and the cloud-base mass flux is a nonlinear function of convective cell area, the mass flux manifests a recharge-discharge behavior under steady forcing. Such a model also produces observed behavior of convective cell populations and CPM simulated cloud-base mass flux variability under diurnally varying forcing. Finally, in addition to its use in developing understanding of convection processes and the controls on convective cell size distributions, this modeling framework is also designed to serve as a nonequilibrium closure formulations for spectral mass flux parameterizations.« less
Effects of Peer Tutoring on Reading Self-Concept

ERIC Educational Resources Information Center

Flores, Marta; Duran, David

2013-01-01

This study investigates the development of the Reading Self-Concept and of the mechanisms underlying it, within a framework of a reading programme based on peer tutoring. The multiple methodological design adopted allowed for a quantitative approach which showed statistically significant changes in the Reading Self-Concept of those students who…
Dynamic Weather Routes Architecture Overview

NASA Technical Reports Server (NTRS)

Eslami, Hassan; Eshow, Michelle

2014-01-01

Dynamic Weather Routes Architecture Overview, presents the high level software architecture of DWR, based on the CTAS software framework and the Direct-To automation tool. The document also covers external and internal data flows, required dataset, changes to the Direct-To software for DWR, collection of software statistics, and the code structure.
Trial Sequential Methods for Meta-Analysis

ERIC Educational Resources Information Center

Kulinskaya, Elena; Wood, John

2014-01-01

Statistical methods for sequential meta-analysis have applications also for the design of new trials. Existing methods are based on group sequential methods developed for single trials and start with the calculation of a required information size. This works satisfactorily within the framework of fixed effects meta-analysis, but conceptual…
Student Background, School Climate, School Disorder, and Student Achievement: An Empirical Study of New York City's Middle Schools

ERIC Educational Resources Information Center

Chen, Greg; Weikart, Lynne A.

2008-01-01

This study develops and tests a school disorder and student achievement model based upon the school climate framework. The model was fitted to 212 New York City middle schools using the Structural Equations Modeling Analysis method. The analysis shows that the model fits the data well based upon test statistics and goodness of fit indices. The…
Scaling and stochastic cascade properties of NEMO oceanic simulations and their potential value for GCM evaluation and downscaling

NASA Astrophysics Data System (ADS)

Verrier, Sébastien; Crépon, Michel; Thiria, Sylvie

2014-09-01

Spectral scaling properties have already been evidenced on oceanic numerical simulations and have been subject to several interpretations. They can be used to evaluate classical turbulence theories that predict scaling with specific exponents and to evaluate the quality of GCM outputs from a statistical and multiscale point of view. However, a more complete framework based on multifractal cascades is able to generalize the classical but restrictive second-order spectral framework to other moment orders, providing an accurate description of probability distributions of the fields at multiple scales. The predictions of this formalism still needed systematic verification in oceanic GCM while they have been confirmed recently for their atmospheric counterparts by several papers. The present paper is devoted to a systematic analysis of several oceanic fields produced by the NEMO oceanic GCM. Attention is focused to regional, idealized configurations that permit to evaluate the NEMO engine core from a scaling point of view regardless of limitations involved by land masks. Based on classical multifractal analysis tools, multifractal properties were evidenced for several oceanic state variables (sea surface temperature and salinity, velocity components, etc.). While first-order structure functions estimated a different nonconservativity parameter H in two scaling ranges, the multiorder statistics of turbulent fluxes were scaling over almost the whole available scaling range. This multifractal scaling was then parameterized with the help of the universal multifractal framework, providing parameters that are coherent with existing empirical literature. Finally, we argue that the knowledge of these properties may be useful for oceanographers. The framework seems very well suited for the statistical evaluation of OGCM outputs. Moreover, it also provides practical solutions to simulate subpixel variability stochastically for GCM downscaling purposes. As an independent perspective, the existence of multifractal properties in oceanic flows seems also interesting for investigating scale dependencies in remote sensing inversion algorithms.

Active learning methods for interactive image retrieval.

PubMed

Gosselin, Philippe Henri; Cord, Matthieu

2008-07-01

Active learning methods have been considered with increased interest in the statistical learning community. Initially developed within a classification framework, a lot of extensions are now being proposed to handle multimedia applications. This paper provides algorithms within a statistical framework to extend active learning for online content-based image retrieval (CBIR). The classification framework is presented with experiments to compare several powerful classification techniques in this information retrieval context. Focusing on interactive methods, active learning strategy is then described. The limitations of this approach for CBIR are emphasized before presenting our new active selection process RETIN. First, as any active method is sensitive to the boundary estimation between classes, the RETIN strategy carries out a boundary correction to make the retrieval process more robust. Second, the criterion of generalization error to optimize the active learning selection is modified to better represent the CBIR objective of database ranking. Third, a batch processing of images is proposed. Our strategy leads to a fast and efficient active learning scheme to retrieve sets of online images (query concept). Experiments on large databases show that the RETIN method performs well in comparison to several other active strategies.
Load Model Verification, Validation and Calibration Framework by Statistical Analysis on Field Data

NASA Astrophysics Data System (ADS)

Jiao, Xiangqing; Liao, Yuan; Nguyen, Thai

2017-11-01

Accurate load models are critical for power system analysis and operation. A large amount of research work has been done on load modeling. Most of the existing research focuses on developing load models, while little has been done on developing formal load model verification and validation (V&V) methodologies or procedures. Most of the existing load model validation is based on qualitative rather than quantitative analysis. In addition, not all aspects of model V&V problem have been addressed by the existing approaches. To complement the existing methods, this paper proposes a novel load model verification and validation framework that can systematically and more comprehensively examine load model's effectiveness and accuracy. Statistical analysis, instead of visual check, quantifies the load model's accuracy, and provides a confidence level of the developed load model for model users. The analysis results can also be used to calibrate load models. The proposed framework can be used as a guidance to systematically examine load models for utility engineers and researchers. The proposed method is demonstrated through analysis of field measurements collected from a utility system.
An Argument Framework for the Application of Null Hypothesis Statistical Testing in Support of Research

ERIC Educational Resources Information Center

LeMire, Steven D.

2010-01-01

This paper proposes an argument framework for the teaching of null hypothesis statistical testing and its application in support of research. Elements of the Toulmin (1958) model of argument are used to illustrate the use of p values and Type I and Type II error rates in support of claims about statistical parameters and subject matter research…
A probabilistic framework to infer brain functional connectivity from anatomical connections.

PubMed

Deligianni, Fani; Varoquaux, Gael; Thirion, Bertrand; Robinson, Emma; Sharp, David J; Edwards, A David; Rueckert, Daniel

2011-01-01

We present a novel probabilistic framework to learn across several subjects a mapping from brain anatomical connectivity to functional connectivity, i.e. the covariance structure of brain activity. This prediction problem must be formulated as a structured-output learning task, as the predicted parameters are strongly correlated. We introduce a model selection framework based on cross-validation with a parametrization-independent loss function suitable to the manifold of covariance matrices. Our model is based on constraining the conditional independence structure of functional activity by the anatomical connectivity. Subsequently, we learn a linear predictor of a stationary multivariate autoregressive model. This natural parameterization of functional connectivity also enforces the positive-definiteness of the predicted covariance and thus matches the structure of the output space. Our results show that functional connectivity can be explained by anatomical connectivity on a rigorous statistical basis, and that a proper model of functional connectivity is essential to assess this link.
Calibration and analysis of genome-based models for microbial ecology.

PubMed

Louca, Stilianos; Doebeli, Michael

2015-10-16

Microbial ecosystem modeling is complicated by the large number of unknown parameters and the lack of appropriate calibration tools. Here we present a novel computational framework for modeling microbial ecosystems, which combines genome-based model construction with statistical analysis and calibration to experimental data. Using this framework, we examined the dynamics of a community of Escherichia coli strains that emerged in laboratory evolution experiments, during which an ancestral strain diversified into two coexisting ecotypes. We constructed a microbial community model comprising the ancestral and the evolved strains, which we calibrated using separate monoculture experiments. Simulations reproduced the successional dynamics in the evolution experiments, and pathway activation patterns observed in microarray transcript profiles. Our approach yielded detailed insights into the metabolic processes that drove bacterial diversification, involving acetate cross-feeding and competition for organic carbon and oxygen. Our framework provides a missing link towards a data-driven mechanistic microbial ecology.
On-line early fault detection and diagnosis of municipal solid waste incinerators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhao Jinsong; Huang Jianchao; Sun Wei

A fault detection and diagnosis framework is proposed in this paper for early fault detection and diagnosis (FDD) of municipal solid waste incinerators (MSWIs) in order to improve the safety and continuity of production. In this framework, principal component analysis (PCA), one of the multivariate statistical technologies, is used for detecting abnormal events, while rule-based reasoning performs the fault diagnosis and consequence prediction, and also generates recommendations for fault mitigation once an abnormal event is detected. A software package, SWIFT, is developed based on the proposed framework, and has been applied in an actual industrial MSWI. The application shows thatmore » automated real-time abnormal situation management (ASM) of the MSWI can be achieved by using SWIFT, resulting in an industrially acceptable low rate of wrong diagnosis, which has resulted in improved process continuity and environmental performance of the MSWI.« less
Optimal Symmetric Multimodal Templates and Concatenated Random Forests for Supervised Brain Tumor Segmentation (Simplified) with ANTsR.

PubMed

Tustison, Nicholas J; Shrinidhi, K L; Wintermark, Max; Durst, Christopher R; Kandel, Benjamin M; Gee, James C; Grossman, Murray C; Avants, Brian B

2015-04-01

Segmenting and quantifying gliomas from MRI is an important task for diagnosis, planning intervention, and for tracking tumor changes over time. However, this task is complicated by the lack of prior knowledge concerning tumor location, spatial extent, shape, possible displacement of normal tissue, and intensity signature. To accommodate such complications, we introduce a framework for supervised segmentation based on multiple modality intensity, geometry, and asymmetry feature sets. These features drive a supervised whole-brain and tumor segmentation approach based on random forest-derived probabilities. The asymmetry-related features (based on optimal symmetric multimodal templates) demonstrate excellent discriminative properties within this framework. We also gain performance by generating probability maps from random forest models and using these maps for a refining Markov random field regularized probabilistic segmentation. This strategy allows us to interface the supervised learning capabilities of the random forest model with regularized probabilistic segmentation using the recently developed ANTsR package--a comprehensive statistical and visualization interface between the popular Advanced Normalization Tools (ANTs) and the R statistical project. The reported algorithmic framework was the top-performing entry in the MICCAI 2013 Multimodal Brain Tumor Segmentation challenge. The challenge data were widely varying consisting of both high-grade and low-grade glioma tumor four-modality MRI from five different institutions. Average Dice overlap measures for the final algorithmic assessment were 0.87, 0.78, and 0.74 for "complete", "core", and "enhanced" tumor components, respectively.
The Blurred Line between Form and Process: A Comparison of Stream Channel Classification Frameworks

PubMed Central

Kasprak, Alan; Hough-Snee, Nate

2016-01-01

Stream classification provides a means to understand the diversity and distribution of channels and floodplains that occur across a landscape while identifying links between geomorphic form and process. Accordingly, stream classification is frequently employed as a watershed planning, management, and restoration tool. At the same time, there has been intense debate and criticism of particular frameworks, on the grounds that these frameworks classify stream reaches based largely on their physical form, rather than direct measurements of their component hydrogeomorphic processes. Despite this debate surrounding stream classifications, and their ongoing use in watershed management, direct comparisons of channel classification frameworks are rare. Here we implement four stream classification frameworks and explore the degree to which each make inferences about hydrogeomorphic process from channel form within the Middle Fork John Day Basin, a watershed of high conservation interest within the Columbia River Basin, U.S.A. We compare the results of the River Styles Framework, Natural Channel Classification, Rosgen Classification System, and a channel form-based statistical classification at 33 field-monitored sites. We found that the four frameworks consistently classified reach types into similar groups based on each reach or segment’s dominant hydrogeomorphic elements. Where classified channel types diverged, differences could be attributed to the (a) spatial scale of input data used, (b) the requisite metrics and their order in completing a framework’s decision tree and/or, (c) whether the framework attempts to classify current or historic channel form. Divergence in framework agreement was also observed at reaches where channel planform was decoupled from valley setting. Overall, the relative agreement between frameworks indicates that criticism of individual classifications for their use of form in grouping stream channels may be overstated. These form-based criticisms may also ignore the geomorphic tenet that channel form reflects formative hydrogeomorphic processes across a given landscape. PMID:26982076
Statistical Inference for Data Adaptive Target Parameters.

PubMed

Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

2016-05-01

Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Feasibility Study on the Use of On-line Multivariate Statistical Process Control for Safeguards Applications in Natural Uranium Conversion Plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ladd-Lively, Jennifer L

2014-01-01

The objective of this work was to determine the feasibility of using on-line multivariate statistical process control (MSPC) for safeguards applications in natural uranium conversion plants. Multivariate statistical process control is commonly used throughout industry for the detection of faults. For safeguards applications in uranium conversion plants, faults could include the diversion of intermediate products such as uranium dioxide, uranium tetrafluoride, and uranium hexafluoride. This study was limited to a 100 metric ton of uranium (MTU) per year natural uranium conversion plant (NUCP) using the wet solvent extraction method for the purification of uranium ore concentrate. A key component inmore » the multivariate statistical methodology is the Principal Component Analysis (PCA) approach for the analysis of data, development of the base case model, and evaluation of future operations. The PCA approach was implemented through the use of singular value decomposition of the data matrix where the data matrix represents normal operation of the plant. Component mole balances were used to model each of the process units in the NUCP. However, this approach could be applied to any data set. The monitoring framework developed in this research could be used to determine whether or not a diversion of material has occurred at an NUCP as part of an International Atomic Energy Agency (IAEA) safeguards system. This approach can be used to identify the key monitoring locations, as well as locations where monitoring is unimportant. Detection limits at the key monitoring locations can also be established using this technique. Several faulty scenarios were developed to test the monitoring framework after the base case or normal operating conditions of the PCA model were established. In all of the scenarios, the monitoring framework was able to detect the fault. Overall this study was successful at meeting the stated objective.« less
A Framework for Thinking about Informal Statistical Inference

ERIC Educational Resources Information Center

Makar, Katie; Rubin, Andee

2009-01-01

Informal inferential reasoning has shown some promise in developing students' deeper understanding of statistical processes. This paper presents a framework to think about three key principles of informal inference--generalizations "beyond the data," probabilistic language, and data as evidence. The authors use primary school classroom…
Robustness of movement models: can models bridge the gap between temporal scales of data sets and behavioural processes?

PubMed

Schlägel, Ulrike E; Lewis, Mark A

2016-12-01

Discrete-time random walks and their extensions are common tools for analyzing animal movement data. In these analyses, resolution of temporal discretization is a critical feature. Ideally, a model both mirrors the relevant temporal scale of the biological process of interest and matches the data sampling rate. Challenges arise when resolution of data is too coarse due to technological constraints, or when we wish to extrapolate results or compare results obtained from data with different resolutions. Drawing loosely on the concept of robustness in statistics, we propose a rigorous mathematical framework for studying movement models' robustness against changes in temporal resolution. In this framework, we define varying levels of robustness as formal model properties, focusing on random walk models with spatially-explicit component. With the new framework, we can investigate whether models can validly be applied to data across varying temporal resolutions and how we can account for these different resolutions in statistical inference results. We apply the new framework to movement-based resource selection models, demonstrating both analytical and numerical calculations, as well as a Monte Carlo simulation approach. While exact robustness is rare, the concept of approximate robustness provides a promising new direction for analyzing movement models.
Bayesian Image Segmentations by Potts Prior and Loopy Belief Propagation

NASA Astrophysics Data System (ADS)

Tanaka, Kazuyuki; Kataoka, Shun; Yasuda, Muneki; Waizumi, Yuji; Hsu, Chiou-Ting

2014-12-01

This paper presents a Bayesian image segmentation model based on Potts prior and loopy belief propagation. The proposed Bayesian model involves several terms, including the pairwise interactions of Potts models, and the average vectors and covariant matrices of Gauss distributions in color image modeling. These terms are often referred to as hyperparameters in statistical machine learning theory. In order to determine these hyperparameters, we propose a new scheme for hyperparameter estimation based on conditional maximization of entropy in the Potts prior. The algorithm is given based on loopy belief propagation. In addition, we compare our conditional maximum entropy framework with the conventional maximum likelihood framework, and also clarify how the first order phase transitions in loopy belief propagations for Potts models influence our hyperparameter estimation procedures.
A unified framework for image retrieval using keyword and visual features.

PubMed

Jing, Feng; Li, Mingling; Zhang, Hong-Jiang; Zhang, Bo

2005-07-01

In this paper, a unified image retrieval framework based on both keyword annotations and visual features is proposed. In this framework, a set of statistical models are built based on visual features of a small set of manually labeled images to represent semantic concepts and used to propagate keywords to other unlabeled images. These models are updated periodically when more images implicitly labeled by users become available through relevance feedback. In this sense, the keyword models serve the function of accumulation and memorization of knowledge learned from user-provided relevance feedback. Furthermore, two sets of effective and efficient similarity measures and relevance feedback schemes are proposed for query by keyword scenario and query by image example scenario, respectively. Keyword models are combined with visual features in these schemes. In particular, a new, entropy-based active learning strategy is introduced to improve the efficiency of relevance feedback for query by keyword. Furthermore, a new algorithm is proposed to estimate the keyword features of the search concept for query by image example. It is shown to be more appropriate than two existing relevance feedback algorithms. Experimental results demonstrate the effectiveness of the proposed framework.
Machine Learning-based discovery of closures for reduced models of dynamical systems

NASA Astrophysics Data System (ADS)

Pan, Shaowu; Duraisamy, Karthik

2017-11-01

Despite the successful application of machine learning (ML) in fields such as image processing and speech recognition, only a few attempts has been made toward employing ML to represent the dynamics of complex physical systems. Previous attempts mostly focus on parameter calibration or data-driven augmentation of existing models. In this work we present a ML framework to discover closure terms in reduced models of dynamical systems and provide insights into potential problems associated with data-driven modeling. Based on exact closure models for linear system, we propose a general linear closure framework from viewpoint of optimization. The framework is based on trapezoidal approximation of convolution term. Hyperparameters that need to be determined include temporal length of memory effect, number of sampling points, and dimensions of hidden states. To circumvent the explicit specification of memory effect, a general framework inspired from neural networks is also proposed. We conduct both a priori and posteriori evaluations of the resulting model on a number of non-linear dynamical systems. This work was supported in part by AFOSR under the project ``LES Modeling of Non-local effects using Statistical Coarse-graining'' with Dr. Jean-Luc Cambier as the technical monitor.
Assessing Fire Weather Index using statistical downscaling and spatial interpolation techniques in Greece

NASA Astrophysics Data System (ADS)

Karali, Anna; Giannakopoulos, Christos; Frias, Maria Dolores; Hatzaki, Maria; Roussos, Anargyros; Casanueva, Ana

2013-04-01

Forest fires have always been present in the Mediterranean ecosystems, thus they constitute a major ecological and socio-economic issue. The last few decades though, the number of forest fires has significantly increased, as well as their severity and impact on the environment. Local fire danger projections are often required when dealing with wild fire research. In the present study the application of statistical downscaling and spatial interpolation methods was performed to the Canadian Fire Weather Index (FWI), in order to assess forest fire risk in Greece. The FWI is used worldwide (including the Mediterranean basin) to estimate the fire danger in a generalized fuel type, based solely on weather observations. The meteorological inputs to the FWI System are noon values of dry-bulb temperature, air relative humidity, 10m wind speed and precipitation during the previous 24 hours. The statistical downscaling methods are based on a statistical model that takes into account empirical relationships between large scale variables (used as predictors) and local scale variables. In the framework of the current study the statistical downscaling portal developed by the Santander Meteorology Group (https://www.meteo.unican.es/downscaling) in the framework of the EU project CLIMRUN (www.climrun.eu) was used to downscale non standard parameters related to forest fire risk. In this study, two different approaches were adopted. Firstly, the analogue downscaling technique was directly performed to the FWI index values and secondly the same downscaling technique was performed indirectly through the meteorological inputs of the index. In both cases, the statistical downscaling portal was used considering the ERA-Interim reanalysis as predictands due to the lack of observations at noon. Additionally, a three-dimensional (3D) interpolation method of position and elevation, based on Thin Plate Splines (TPS) was used, to interpolate the ERA-Interim data used to calculate the index. Results from this method were compared with the statistical downscaling results obtained from the portal. Finally, FWI was computed using weather observations obtained from the Hellenic National Meteorological Service, mainly in the south continental part of Greece and a comparison with the previous results was performed.
A unified framework for physical print quality

NASA Astrophysics Data System (ADS)

Eid, Ahmed; Cooper, Brian; Rippetoe, Ed

2007-01-01

In this paper we present a unified framework for physical print quality. This framework includes a design for a testbed, testing methodologies and quality measures of physical print characteristics. An automatic belt-fed flatbed scanning system is calibrated to acquire L* data for a wide range of flat field imagery. Testing methodologies based on wavelet pre-processing and spectral/statistical analysis are designed. We apply the proposed framework to three common printing artifacts: banding, jitter, and streaking. Since these artifacts are directional, wavelet based approaches are used to extract one artifact at a time and filter out other artifacts. Banding is characterized as a medium-to-low frequency, vertical periodic variation down the page. The same definition is applied to the jitter artifact, except that the jitter signal is characterized as a high-frequency signal above the banding frequency range. However, streaking is characterized as a horizontal aperiodic variation in the high-to-medium frequency range. Wavelets at different levels are applied to the input images in different directions to extract each artifact within specified frequency bands. Following wavelet reconstruction, images are converted into 1-D signals describing the artifact under concern. Accurate spectral analysis using a DFT with Blackman-Harris windowing technique is used to extract the power (strength) of periodic signals (banding and jitter). Since streaking is an aperiodic signal, a statistical measure is used to quantify the streaking strength. Experiments on 100 print samples scanned at 600 dpi from 10 different printers show high correlation (75% to 88%) between the ranking of these samples by the proposed metrologies and experts' visual ranking.
Simulation, identification and statistical variation in cardiovascular analysis (SISCA) - A software framework for multi-compartment lumped modeling.

PubMed

Huttary, Rudolf; Goubergrits, Leonid; Schütte, Christof; Bernhard, Stefan

2017-08-01

It has not yet been possible to obtain modeling approaches suitable for covering a wide range of real world scenarios in cardiovascular physiology because many of the system parameters are uncertain or even unknown. Natural variability and statistical variation of cardiovascular system parameters in healthy and diseased conditions are characteristic features for understanding cardiovascular diseases in more detail. This paper presents SISCA, a novel software framework for cardiovascular system modeling and its MATLAB implementation. The framework defines a multi-model statistical ensemble approach for dimension reduced, multi-compartment models and focuses on statistical variation, system identification and patient-specific simulation based on clinical data. We also discuss a data-driven modeling scenario as a use case example. The regarded dataset originated from routine clinical examinations and comprised typical pre and post surgery clinical data from a patient diagnosed with coarctation of aorta. We conducted patient and disease specific pre/post surgery modeling by adapting a validated nominal multi-compartment model with respect to structure and parametrization using metadata and MRI geometry. In both models, the simulation reproduced measured pressures and flows fairly well with respect to stenosis and stent treatment and by pre-treatment cross stenosis phase shift of the pulse wave. However, with post-treatment data showing unrealistic phase shifts and other more obvious inconsistencies within the dataset, the methods and results we present suggest that conditioning and uncertainty management of routine clinical data sets needs significantly more attention to obtain reasonable results in patient-specific cardiovascular modeling. Copyright © 2017 Elsevier Ltd. All rights reserved.
Patterns of local and nonlocal water resource use across the western U.S. determined via stable isotope intercomparisons

USDA-ARS?s Scientific Manuscript database

In this paper we develop an isotope-based statistical framework to evaluate the dynamics of the relationship between water supplies used for human consumption and several hydrological factors, including the spatiotemporal distribution of precipitation and snowmelt as well as the timing and rates of ...
The Reliability of Setting Grade Boundaries Using Comparative Judgement

ERIC Educational Resources Information Center

Benton, Tom; Elliott, Gill

2016-01-01

In recent years the use of expert judgement to set and maintain examination standards has been increasingly criticised in favour of approaches based on statistical modelling. This paper reviews existing research on this controversy and attempts to unify the evidence within a framework where expertise is utilised in the form of comparative…

Using Interactive "Shiny" Applications to Facilitate Research-Informed Learning and Teaching

ERIC Educational Resources Information Center

Fawcett, Lee

2018-01-01

In this article we discuss our attempt to incorporate research-informed learning and teaching activities into a final year undergraduate Statistics course. We make use of the Shiny web-based application framework for R to develop "Shiny apps" designed to help facilitate student interaction with methods from recently published papers in…
A Framework to Support Research on Informal Inferential Reasoning

ERIC Educational Resources Information Center

Zieffler, Andrew; Garfield, Joan; delMas, Robert; Reading, Chris

2008-01-01

Informal inferential reasoning is a relatively recent concept in the research literature. Several research studies have defined this type of cognitive process in slightly different ways. In this paper, a working definition of informal inferential reasoning based on an analysis of the key aspects of statistical inference, and on research from…
The Chicanos of El Paso: A Case of Changing Colonization.

ERIC Educational Resources Information Center

Martinez, Oscar J.

Using historical statistics and key indicators, data were synthesized to identify longitudinal trends and patterns in the social, economic, and political status of El Paso's Chicanos. Data related to group achievement were analyzed. A framework adapted to local conditions based on the internal colonialism model was used for the periodization of El…
Understanding Discourse on Work and Job-Related Well-Being in Public Social Media

PubMed Central

Liu, Tong; Homan, Christopher M.; Alm, Cecilia Ovesdotter; White, Ann Marie; Lytle, Megan C.; Kautz, Henry A.

2016-01-01

We construct a humans-in-the-loop supervised learning framework that integrates crowdsourcing feedback and local knowledge to detect job-related tweets from individual and business accounts. Using data-driven ethnography, we examine discourse about work by fusing language-based analysis with temporal, geospational, and labor statistics information. PMID:27795613
Technology Integration in K-12 Science Classrooms: An Analysis of Barriers and Implications

ERIC Educational Resources Information Center

Hechter, Richard P.; Vermette, Laurie Anne

2013-01-01

This paper examines the barriers to technology integration for Manitoban K-12 inservice science educators (n = 430) based on a 10-item online survey; results are analyzed according to teaching stream using the Technology, Pedagogy, and Content Knowledge (TPACK) framework. Quantitative descriptive statistics indicated that the leading barriers…
Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems

DOE PAGES

Mudunuru, Maruti Kumar; Karra, Satish; Makedonska, Nataliia; ...

2017-09-05

Subsurface applications, including geothermal, geological carbon sequestration, and oil and gas, typically involve maximizing either the extraction of energy or the storage of fluids. Fractures form the main pathways for flow in these systems, and locating these fractures is critical for predicting flow. However, fracture characterization is a highly uncertain process, and data from multiple sources, such as flow and geophysical are needed to reduce this uncertainty. We present a nonintrusive, sequential inversion framework for integrating data from geophysical and flow sources to constrain fracture networks in the subsurface. In this framework, we first estimate bounds on the statistics formore » the fracture orientations using microseismic data. These bounds are estimated through a combination of a focal mechanism (physics-based approach) and clustering analysis (statistical approach) of seismic data. Then, the fracture lengths are constrained using flow data. In conclusion, the efficacy of this inversion is demonstrated through a representative example.« less
Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mudunuru, Maruti Kumar; Karra, Satish; Makedonska, Nataliia

Subsurface applications, including geothermal, geological carbon sequestration, and oil and gas, typically involve maximizing either the extraction of energy or the storage of fluids. Fractures form the main pathways for flow in these systems, and locating these fractures is critical for predicting flow. However, fracture characterization is a highly uncertain process, and data from multiple sources, such as flow and geophysical are needed to reduce this uncertainty. We present a nonintrusive, sequential inversion framework for integrating data from geophysical and flow sources to constrain fracture networks in the subsurface. In this framework, we first estimate bounds on the statistics formore » the fracture orientations using microseismic data. These bounds are estimated through a combination of a focal mechanism (physics-based approach) and clustering analysis (statistical approach) of seismic data. Then, the fracture lengths are constrained using flow data. In conclusion, the efficacy of this inversion is demonstrated through a representative example.« less
A statistical framework for the validation of a population exposure model based on personal exposure data

NASA Astrophysics Data System (ADS)

Rodriguez, Delphy; Valari, Myrto; Markakis, Konstantinos; Payan, Sébastien

2016-04-01

Currently, ambient pollutant concentrations at monitoring sites are routinely measured by local networks, such as AIRPARIF in Paris, France. Pollutant concentration fields are also simulated with regional-scale chemistry transport models such as CHIMERE (http://www.lmd.polytechnique.fr/chimere) under air-quality forecasting platforms (e.g. Prev'Air http://www.prevair.org) or research projects. These data may be combined with more or less sophisticated techniques to provide a fairly good representation of pollutant concentration spatial gradients over urban areas. Here we focus on human exposure to atmospheric contaminants. Based on census data on population dynamics and demographics, modeled outdoor concentrations and infiltration of outdoor air-pollution indoors we have developed a population exposure model for ozone and PM2.5. A critical challenge in the field of population exposure modeling is model validation since personal exposure data are expensive and therefore, rare. However, recent research has made low cost mobile sensors fairly common and therefore personal exposure data should become more and more accessible. In view of planned cohort field-campaigns where such data will be available over the Paris region, we propose in the present study a statistical framework that makes the comparison between modeled and measured exposures meaningful. Our ultimate goal is to evaluate the exposure model by comparing modeled exposures to monitor data. The scientific question we address here is how to downscale modeled data that are estimated on the county population scale at the individual scale which is appropriate to the available measurements. To assess this question we developed a Bayesian hierarchical framework that assimilates actual individual data into population statistics and updates the probability estimate.
Illicit and pharmaceutical drug consumption estimated via wastewater analysis. Part B: placing back-calculations in a formal statistical framework.

PubMed

Jones, Hayley E; Hickman, Matthew; Kasprzyk-Hordern, Barbara; Welton, Nicky J; Baker, David R; Ades, A E

2014-07-15

Concentrations of metabolites of illicit drugs in sewage water can be measured with great accuracy and precision, thanks to the development of sensitive and robust analytical methods. Based on assumptions about factors including the excretion profile of the parent drug, routes of administration and the number of individuals using the wastewater system, the level of consumption of a drug can be estimated from such measured concentrations. When presenting results from these 'back-calculations', the multiple sources of uncertainty are often discussed, but are not usually explicitly taken into account in the estimation process. In this paper we demonstrate how these calculations can be placed in a more formal statistical framework by assuming a distribution for each parameter involved, based on a review of the evidence underpinning it. Using a Monte Carlo simulations approach, it is then straightforward to propagate uncertainty in each parameter through the back-calculations, producing a distribution for instead of a single estimate of daily or average consumption. This can be summarised for example by a median and credible interval. To demonstrate this approach, we estimate cocaine consumption in a large urban UK population, using measured concentrations of two of its metabolites, benzoylecgonine and norbenzoylecgonine. We also demonstrate a more sophisticated analysis, implemented within a Bayesian statistical framework using Markov chain Monte Carlo simulation. Our model allows the two metabolites to simultaneously inform estimates of daily cocaine consumption and explicitly allows for variability between days. After accounting for this variability, the resulting credible interval for average daily consumption is appropriately wider, representing additional uncertainty. We discuss possibilities for extensions to the model, and whether analysis of wastewater samples has potential to contribute to a prevalence model for illicit drug use. Copyright © 2014. Published by Elsevier B.V.
Illicit and pharmaceutical drug consumption estimated via wastewater analysis. Part B: Placing back-calculations in a formal statistical framework

PubMed Central

Jones, Hayley E.; Hickman, Matthew; Kasprzyk-Hordern, Barbara; Welton, Nicky J.; Baker, David R.; Ades, A.E.

2014-01-01

Concentrations of metabolites of illicit drugs in sewage water can be measured with great accuracy and precision, thanks to the development of sensitive and robust analytical methods. Based on assumptions about factors including the excretion profile of the parent drug, routes of administration and the number of individuals using the wastewater system, the level of consumption of a drug can be estimated from such measured concentrations. When presenting results from these ‘back-calculations’, the multiple sources of uncertainty are often discussed, but are not usually explicitly taken into account in the estimation process. In this paper we demonstrate how these calculations can be placed in a more formal statistical framework by assuming a distribution for each parameter involved, based on a review of the evidence underpinning it. Using a Monte Carlo simulations approach, it is then straightforward to propagate uncertainty in each parameter through the back-calculations, producing a distribution for instead of a single estimate of daily or average consumption. This can be summarised for example by a median and credible interval. To demonstrate this approach, we estimate cocaine consumption in a large urban UK population, using measured concentrations of two of its metabolites, benzoylecgonine and norbenzoylecgonine. We also demonstrate a more sophisticated analysis, implemented within a Bayesian statistical framework using Markov chain Monte Carlo simulation. Our model allows the two metabolites to simultaneously inform estimates of daily cocaine consumption and explicitly allows for variability between days. After accounting for this variability, the resulting credible interval for average daily consumption is appropriately wider, representing additional uncertainty. We discuss possibilities for extensions to the model, and whether analysis of wastewater samples has potential to contribute to a prevalence model for illicit drug use. PMID:24636801
Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale

NASA Astrophysics Data System (ADS)

Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel

2017-07-01

Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.
A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion

NASA Astrophysics Data System (ADS)

Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em

2017-09-01

Exascale-level simulations require fault-resilient algorithms that are robust against repeated and expected software and/or hardware failures during computations, which may render the simulation results unsatisfactory. If each processor can share some global information about the simulation from a coarse, limited accuracy but relatively costless auxiliary simulator we can effectively fill-in the missing spatial data at the required times by a statistical learning technique - multi-level Gaussian process regression, on the fly; this has been demonstrated in previous work [1]. Based on the previous work, we also employ another (nonlinear) statistical learning technique, Diffusion Maps, that detects computational redundancy in time and hence accelerate the simulation by projective time integration, giving the overall computation a "patch dynamics" flavor. Furthermore, we are now able to perform information fusion with multi-fidelity and heterogeneous data (including stochastic data). Finally, we set the foundations of a new framework in CFD, called patch simulation, that combines information fusion techniques from, in principle, multiple fidelity and resolution simulations (and even experiments) with a new adaptive timestep refinement technique. We present two benchmark problems (the heat equation and the Navier-Stokes equations) to demonstrate the new capability that statistical learning tools can bring to traditional scientific computing algorithms. For each problem, we rely on heterogeneous and multi-fidelity data, either from a coarse simulation of the same equation or from a stochastic, particle-based, more "microscopic" simulation. We consider, as such "auxiliary" models, a Monte Carlo random walk for the heat equation and a dissipative particle dynamics (DPD) model for the Navier-Stokes equations. More broadly, in this paper we demonstrate the symbiotic and synergistic combination of statistical learning, domain decomposition, and scientific computing in exascale simulations.
Performance comparison between total variation (TV)-based compressed sensing and statistical iterative reconstruction algorithms.

PubMed

Tang, Jie; Nett, Brian E; Chen, Guang-Hong

2009-10-07

Of all available reconstruction methods, statistical iterative reconstruction algorithms appear particularly promising since they enable accurate physical noise modeling. The newly developed compressive sampling/compressed sensing (CS) algorithm has shown the potential to accurately reconstruct images from highly undersampled data. The CS algorithm can be implemented in the statistical reconstruction framework as well. In this study, we compared the performance of two standard statistical reconstruction algorithms (penalized weighted least squares and q-GGMRF) to the CS algorithm. In assessing the image quality using these iterative reconstructions, it is critical to utilize realistic background anatomy as the reconstruction results are object dependent. A cadaver head was scanned on a Varian Trilogy system at different dose levels. Several figures of merit including the relative root mean square error and a quality factor which accounts for the noise performance and the spatial resolution were introduced to objectively evaluate reconstruction performance. A comparison is presented between the three algorithms for a constant undersampling factor comparing different algorithms at several dose levels. To facilitate this comparison, the original CS method was formulated in the framework of the statistical image reconstruction algorithms. Important conclusions of the measurements from our studies are that (1) for realistic neuro-anatomy, over 100 projections are required to avoid streak artifacts in the reconstructed images even with CS reconstruction, (2) regardless of the algorithm employed, it is beneficial to distribute the total dose to more views as long as each view remains quantum noise limited and (3) the total variation-based CS method is not appropriate for very low dose levels because while it can mitigate streaking artifacts, the images exhibit patchy behavior, which is potentially harmful for medical diagnosis.
Equivalent statistics and data interpretation.

PubMed

Francis, Gregory

2017-08-01

Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Explicit optimization of plan quality measures in intensity-modulated radiation therapy treatment planning.

PubMed

Engberg, Lovisa; Forsgren, Anders; Eriksson, Kjell; Hårdemark, Björn

2017-06-01

To formulate convex planning objectives of treatment plan multicriteria optimization with explicit relationships to the dose-volume histogram (DVH) statistics used in plan quality evaluation. Conventional planning objectives are designed to minimize the violation of DVH statistics thresholds using penalty functions. Although successful in guiding the DVH curve towards these thresholds, conventional planning objectives offer limited control of the individual points on the DVH curve (doses-at-volume) used to evaluate plan quality. In this study, we abandon the usual penalty-function framework and propose planning objectives that more closely relate to DVH statistics. The proposed planning objectives are based on mean-tail-dose, resulting in convex optimization. We also demonstrate how to adapt a standard optimization method to the proposed formulation in order to obtain a substantial reduction in computational cost. We investigated the potential of the proposed planning objectives as tools for optimizing DVH statistics through juxtaposition with the conventional planning objectives on two patient cases. Sets of treatment plans with differently balanced planning objectives were generated using either the proposed or the conventional approach. Dominance in the sense of better distributed doses-at-volume was observed in plans optimized within the proposed framework. The initial computational study indicates that the DVH statistics are better optimized and more efficiently balanced using the proposed planning objectives than using the conventional approach. © 2017 American Association of Physicists in Medicine.
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics

PubMed Central

Chen, Wenan; Larrabee, Beth R.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Haralambieva, Iana H.; Poland, Gregory A.; Schaid, Daniel J.

2015-01-01

Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. PMID:25948564
Product plots.

PubMed

Wickham, Hadley; Hofmann, Heike

2011-12-01

We propose a new framework for visualising tables of counts, proportions and probabilities. We call our framework product plots, alluding to the computation of area as a product of height and width, and the statistical concept of generating a joint distribution from the product of conditional and marginal distributions. The framework, with extensions, is sufficient to encompass over 20 visualisations previously described in fields of statistical graphics and infovis, including bar charts, mosaic plots, treemaps, equal area plots and fluctuation diagrams. © 2011 IEEE
Measuring implementation behaviour of menu guidelines in the childcare setting: confirmatory factor analysis of a theoretical domains framework questionnaire (TDFQ).

PubMed

Seward, Kirsty; Wolfenden, Luke; Wiggers, John; Finch, Meghan; Wyse, Rebecca; Oldmeadow, Christopher; Presseau, Justin; Clinton-McHarg, Tara; Yoong, Sze Lin

2017-04-04

While there are number of frameworks which focus on supporting the implementation of evidence based approaches, few psychometrically valid measures exist to assess constructs within these frameworks. This study aimed to develop and psychometrically assess a scale measuring each domain of the Theoretical Domains Framework for use in assessing the implementation of dietary guidelines within a non-health care setting (childcare services). A 75 item 14-domain Theoretical Domains Framework Questionnaire (TDFQ) was developed and administered via telephone interview to 202 centre based childcare service cooks who had a role in planning the service menu. Confirmatory factor analysis (CFA) was undertaken to assess the reliability, discriminant validity and goodness of fit of the 14-domain theoretical domain framework measure. For the CFA, five iterative processes of adjustment were undertaken where 14 items were removed, resulting in a final measure consisting of 14 domains and 61 items. For the final measure: the Chi-Square goodness of fit statistic was 3447.19; the Standardized Root Mean Square Residual (SRMR) was 0.070; the Root Mean Square Error of Approximation (RMSEA) was 0.072; and the Comparative Fit Index (CFI) had a value of 0.78. While only one of the three indices support goodness of fit of the measurement model tested, a 14-domain model with 61 items showed good discriminant validity and internally consistent items. Future research should aim to assess the psychometric properties of the developed TDFQ in other community-based settings.
Statistical prediction with Kanerva's sparse distributed memory

NASA Technical Reports Server (NTRS)

Rogers, David

1989-01-01

A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.
SmartMal: a service-oriented behavioral malware detection framework for mobile devices.

PubMed

Wang, Chao; Wu, Zhizhong; Li, Xi; Zhou, Xuehai; Wang, Aili; Hung, Patrick C K

2014-01-01

This paper presents SmartMal--a novel service-oriented behavioral malware detection framework for vehicular and mobile devices. The highlight of SmartMal is to introduce service-oriented architecture (SOA) concepts and behavior analysis into the malware detection paradigms. The proposed framework relies on client-server architecture, the client continuously extracts various features and transfers them to the server, and the server's main task is to detect anomalies using state-of-art detection algorithms. Multiple distributed servers simultaneously analyze the feature vector using various detectors and information fusion is used to concatenate the results of detectors. We also propose a cycle-based statistical approach for mobile device anomaly detection. We accomplish this by analyzing the users' regular usage patterns. Empirical results suggest that the proposed framework and novel anomaly detection algorithm are highly effective in detecting malware on Android devices.

SmartMal: A Service-Oriented Behavioral Malware Detection Framework for Mobile Devices

PubMed Central

Wu, Zhizhong; Li, Xi; Zhou, Xuehai; Wang, Aili; Hung, Patrick C. K.

2014-01-01

This paper presents SmartMal—a novel service-oriented behavioral malware detection framework for vehicular and mobile devices. The highlight of SmartMal is to introduce service-oriented architecture (SOA) concepts and behavior analysis into the malware detection paradigms. The proposed framework relies on client-server architecture, the client continuously extracts various features and transfers them to the server, and the server's main task is to detect anomalies using state-of-art detection algorithms. Multiple distributed servers simultaneously analyze the feature vector using various detectors and information fusion is used to concatenate the results of detectors. We also propose a cycle-based statistical approach for mobile device anomaly detection. We accomplish this by analyzing the users' regular usage patterns. Empirical results suggest that the proposed framework and novel anomaly detection algorithm are highly effective in detecting malware on Android devices. PMID:25165729
Interpreter of maladies: redescription mining applied to biomedical data analysis.

PubMed

Waltman, Peter; Pearlman, Alex; Mishra, Bud

2006-04-01

Comprehensive, systematic and integrated data-centric statistical approaches to disease modeling can provide powerful frameworks for understanding disease etiology. Here, one such computational framework based on redescription mining in both its incarnations, static and dynamic, is discussed. The static framework provides bioinformatic tools applicable to multifaceted datasets, containing genetic, transcriptomic, proteomic, and clinical data for diseased patients and normal subjects. The dynamic redescription framework provides systems biology tools to model complex sets of regulatory, metabolic and signaling pathways in the initiation and progression of a disease. As an example, the case of chronic fatigue syndrome (CFS) is considered, which has so far remained intractable and unpredictable in its etiology and nosology. The redescription mining approaches can be applied to the Centers for Disease Control and Prevention's Wichita (KS, USA) dataset, integrating transcriptomic, epidemiological and clinical data, and can also be used to study how pathways in the hypothalamic-pituitary-adrenal axis affect CFS patients.
Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 1: Theory

NASA Astrophysics Data System (ADS)

Sundberg, R.; Moberg, A.; Hind, A.

2012-08-01

A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.
Improved analyses using function datasets and statistical modeling

Treesearch

John S. Hogland; Nathaniel M. Anderson

2014-01-01

Raster modeling is an integral component of spatial analysis. However, conventional raster modeling techniques can require a substantial amount of processing time and storage space and have limited statistical functionality and machine learning algorithms. To address this issue, we developed a new modeling framework using C# and ArcObjects and integrated that framework...
General Aviation Avionics Statistics : 1975

DOT National Transportation Integrated Search

1978-06-01

This report presents avionics statistics for the 1975 general aviation (GA) aircraft fleet and updates a previous publication, General Aviation Avionics Statistics: 1974. The statistics are presented in a capability group framework which enables one ...
A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

PubMed Central

Wang, Guoli; Ebrahimi, Nader

2014-01-01

Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. PMID:25821345
A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing.

PubMed

Devarajan, Karthik; Wang, Guoli; Ebrahimi, Nader

2015-04-01

Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H , such that V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants.

PubMed

Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E

2018-06-05

There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

NASA Astrophysics Data System (ADS)

ten Veldhuis, Marie-Claire; Schleiss, Marc

2017-04-01

Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.
Assessing the Development of Educational Research Literacy: The Effect of Courses on Research Methods in Studies of Educational Science

ERIC Educational Resources Information Center

Groß Ophoff, Jana; Schladitz, Sandra; Leuders, Juliane; Leuders, Timo; Wirtz, Markus A.

2015-01-01

The ability to purposefully access, reflect, and use evidence from educational research (Educational Research Literacy) is expected of future professionals in educational practice. Based on the presented conceptual framework, a test instrument was developed to assess the different competency aspects: Information Literacy, Statistical Literacy, and…
Results of EPA’s Assessment of Fish Tissue from U.S. Rivers for Selenium with Implications for Aquatic Life and Human Health

EPA Science Inventory

EPA’s Office of Water and Office of Research and Development collaborated to conduct the first statistically based survey of contaminants in fish fillets from U.S. rivers. This national fish survey was conducted under the framework of EPA’s National Rivers and Streams Assessment ...
Effect of geometry on deformation of anterior implant-supported zirconia frameworks: An in vitro study using digital image correlation.

PubMed

Calha, Nuno; Messias, Ana; Guerra, Fernando; Martinho, Beatriz; Neto, Maria Augusta; Nicolau, Pedro

2017-04-01

To evaluate the effect of geometry on the displacement and the strain distribution of anterior implant-supported zirconia frameworks under static load using the 3D digital image correlation method. Two groups (n=5) of 4-unit zirconia frameworks were produced by CAD/CAM for the implant-abutment assembly. Group 1 comprised five straight configuration frameworks and group 2 consisted of five curved configuration frameworks. Specimens were cemented and submitted to static load up to 200N. Displacements were captured with two high-speed photographic cameras and analyzed with video correlation system in three spacial axes U, V, W. Statistical analysis was made using the nonparametric Mann-Whitney test. Up to 150N loads, the vertical displacements (V axis) were statistically higher for curved frameworks (-267.83±23.76μm), when compared to the straight frameworks (-120.73±36.17μm) (p=0.008), as well as anterior displacements in the W transformed axis (589.55±64.51μm vs 224.29±50.38μm for the curved and straight frameworks), respectively (p=0.008). The mean von Mises strains over the surface frameworks were statistically higher for the curved frameworks under any load. Within the limitations of this in vitro study, it is possible to conclude that the geometric configuration influences the deformation of 4-unit anterior frameworks under static load. The higher strain distribution and micro-movements of the curved frameworks reflect less rigidity and increased risk of fractures associated to FPDs. Copyright © 2016 Japan Prosthodontic Society. Published by Elsevier Ltd. All rights reserved.
When mechanism matters: Bayesian forecasting using models of ecological diffusion

USGS Publications Warehouse

Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.

2017-01-01

Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Combining statistical inference and decisions in ecology

USGS Publications Warehouse

Williams, Perry J.; Hooten, Mevin B.

2016-01-01

Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation, and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem.
Feature-Based Statistical Analysis of Combustion Simulation Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bennett, J; Krishnamoorthy, V; Liu, S

2011-11-18

We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing andmore » reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion science; however, it is applicable to many other science domains.« less
Next generation of weather generators on web service framework

NASA Astrophysics Data System (ADS)

Chinnachodteeranun, R.; Hung, N. D.; Honda, K.; Ines, A. V. M.

2016-12-01

Weather generator is a statistical model that synthesizes possible realization of long-term historical weather in future. It generates several tens to hundreds of realizations stochastically based on statistical analysis. Realization is essential information as a crop modeling's input for simulating crop growth and yield. Moreover, they can be contributed to analyzing uncertainty of weather to crop development stage and to decision support system on e.g. water management and fertilizer management. Performing crop modeling requires multidisciplinary skills which limit the usage of weather generator only in a research group who developed it as well as a barrier for newcomers. To improve the procedures of performing weather generators as well as the methodology to acquire the realization in a standard way, we implemented a framework for providing weather generators as web services, which support service interoperability. Legacy weather generator programs were wrapped in the web service framework. The service interfaces were implemented based on an international standard that was Sensor Observation Service (SOS) defined by Open Geospatial Consortium (OGC). Clients can request realizations generated by the model through SOS Web service. Hierarchical data preparation processes required for weather generator are also implemented as web services and seamlessly wired. Analysts and applications can invoke services over a network easily. The services facilitate the development of agricultural applications and also reduce the workload of analysts on iterative data preparation and handle legacy weather generator program. This architectural design and implementation can be a prototype for constructing further services on top of interoperable sensor network system. This framework opens an opportunity for other sectors such as application developers and scientists in other fields to utilize weather generators.
Teaching ear reconstruction using an alloplastic carving model.

PubMed

Murabit, Amera; Anzarut, Alexander; Kasrai, Laila; Fisher, David; Wilkes, Gordon

2010-11-01

Ear reconstruction is challenging surgery, often with poor outcomes. Our purpose was to develop a surgical training model for auricular reconstruction. Silicone costal cartilage models were incorporated in a workshop-based instructional program. Trainees were randomly divided. Workshop group (WG) participated in an interactive session, carving frameworks under supervision. Nonworkshop group (NWG) did not participate. Standard Nagata templates were used. Two further frameworks were created, first with supervision then without. Groups were combined after the first carving because of frustration in the NWG. Assessment was completed by 3 microtia surgeons from 2 different centers, blinded to framework origin. Frameworks were rated out of 10 using Likert and visual analog scales. Results were examined using SPSS (version 14), with t test, ANOVA, and Bonferroni post hoc analyses. Cartilaginous frameworks from the WG scored better for the first carving (WG 5.5 vs NWG 4.4), the NWG improved for the second carving (WG 6.6 vs NWG 6.5), and both groups scored lower with the third unsupervised carving (WG 5.9 vs NWG 5.6). Combined scores after 3 frameworks were not statistically significantly different between original groups. A statistically significant improvement was demonstrated for all carvers between sessions 1 and 2 (P ≤ 0.09), between sessions 1 and 3 (P ≤ 0.05), but not between sessions 2 and 3, thus suggesting the necessity of in vitro practice until high scores are achieved and maintained without supervision before embarking on in vivo carvings. Quality of carvings was not related to level of training. An appropriate and applicable surgical training model and training method can aid in attaining skills necessary for successful auricular reconstruction.
BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting.

PubMed

Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

2015-06-01

Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.
Robustly detecting differential expression in RNA sequencing data using observation weights

PubMed Central

Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.

2014-01-01

A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412
Multi-Dielectric Brownian Dynamics and Design-Space-Exploration Studies of Permeation in Ion Channels.

PubMed

Siksik, May; Krishnamurthy, Vikram

2017-09-01

This paper proposes a multi-dielectric Brownian dynamics simulation framework for design-space-exploration (DSE) studies of ion-channel permeation. The goal of such DSE studies is to estimate the channel modeling-parameters that minimize the mean-squared error between the simulated and expected "permeation characteristics." To address this computational challenge, we use a methodology based on statistical inference that utilizes the knowledge of channel structure to prune the design space. We demonstrate the proposed framework and DSE methodology using a case study based on the KcsA ion channel, in which the design space is successfully reduced from a 6-D space to a 2-D space. Our results show that the channel dielectric map computed using the framework matches with that computed directly using molecular dynamics with an error of 7%. Finally, the scalability and resolution of the model used are explored, and it is shown that the memory requirements needed for DSE remain constant as the number of parameters (degree of heterogeneity) increases.

Functional Path Analysis as a Multivariate Technique in Developing a Theory of Participation in Adult Education.

ERIC Educational Resources Information Center

Martin, James L.

This paper reports on attempts by the author to construct a theoretical framework of adult education participation using a theory development process and the corresponding multivariate statistical techniques. Two problems are identified: the lack of theoretical framework in studying problems, and the limiting of statistical analysis to univariate…
Teaching Introductory Business Statistics Using the DCOVA Framework

ERIC Educational Resources Information Center

Levine, David M.; Stephan, David F.

2011-01-01

Introductory business statistics students often receive little guidance on how to apply the methods they learn to further business objectives they may one day face. And those students may fail to see the continuity among the topics taught in an introductory course if they learn those methods outside a context that provides a unifying framework.…
Unified framework for information integration based on information geometry

PubMed Central

Oizumi, Masafumi; Amari, Shun-ichi

2016-01-01

Assessment of causal influences is a ubiquitous and important subject across diverse research fields. Drawn from consciousness studies, integrated information is a measure that defines integration as the degree of causal influences among elements. Whereas pairwise causal influences between elements can be quantified with existing methods, quantifying multiple influences among many elements poses two major mathematical difficulties. First, overestimation occurs due to interdependence among influences if each influence is separately quantified in a part-based manner and then simply summed over. Second, it is difficult to isolate causal influences while avoiding noncausal confounding influences. To resolve these difficulties, we propose a theoretical framework based on information geometry for the quantification of multiple causal influences with a holistic approach. We derive a measure of integrated information, which is geometrically interpreted as the divergence between the actual probability distribution of a system and an approximated probability distribution where causal influences among elements are statistically disconnected. This framework provides intuitive geometric interpretations harmonizing various information theoretic measures in a unified manner, including mutual information, transfer entropy, stochastic interaction, and integrated information, each of which is characterized by how causal influences are disconnected. In addition to the mathematical assessment of consciousness, our framework should help to analyze causal relationships in complex systems in a complete and hierarchical manner. PMID:27930289
Selection vector filter framework

NASA Astrophysics Data System (ADS)

Lukac, Rastislav; Plataniotis, Konstantinos N.; Smolka, Bogdan; Venetsanopoulos, Anastasios N.

2003-10-01

We provide a unified framework of nonlinear vector techniques outputting the lowest ranked vector. The proposed framework constitutes a generalized filter class for multichannel signal processing. A new class of nonlinear selection filters are based on the robust order-statistic theory and the minimization of the weighted distance function to other input samples. The proposed method can be designed to perform a variety of filtering operations including previously developed filtering techniques such as vector median, basic vector directional filter, directional distance filter, weighted vector median filters and weighted directional filters. A wide range of filtering operations is guaranteed by the filter structure with two independent weight vectors for angular and distance domains of the vector space. In order to adapt the filter parameters to varying signal and noise statistics, we provide also the generalized optimization algorithms taking the advantage of the weighted median filters and the relationship between standard median filter and vector median filter. Thus, we can deal with both statistical and deterministic aspects of the filter design process. It will be shown that the proposed method holds the required properties such as the capability of modelling the underlying system in the application at hand, the robustness with respect to errors in the model of underlying system, the availability of the training procedure and finally, the simplicity of filter representation, analysis, design and implementation. Simulation studies also indicate that the new filters are computationally attractive and have excellent performance in environments corrupted by bit errors and impulsive noise.
General Aviation Avionics Statistics : 1976

DOT National Transportation Integrated Search

1979-11-01

This report presents avionics statistics for the 1976 general aviation (GA) aircraft fleet and is the third in a series titled "General Aviation Avionics Statistics." The statistics are presented in a capability group framework which enables one to r...
General Aviation Avionics Statistics : 1978 Data

DOT National Transportation Integrated Search

1980-12-01

The report presents avionics statistics for the 1978 general aviation (GA) aircraft fleet and is the fifth in a series titled "General Aviation Statistics." The statistics are presented in a capability group framework which enables one to relate airb...
General Aviation Avionics Statistics : 1979 Data

DOT National Transportation Integrated Search

1981-04-01

This report presents avionics statistics for the 1979 general aviation (GA) aircraft fleet and is the sixth in a series titled General Aviation Avionics Statistics. The statistics preseneted in a capability group framework which enables one to relate...
An Observation-Driven Agent-Based Modeling and Analysis Framework for C. elegans Embryogenesis.

PubMed

Wang, Zi; Ramsey, Benjamin J; Wang, Dali; Wong, Kwai; Li, Husheng; Wang, Eric; Bao, Zhirong

2016-01-01

With cutting-edge live microscopy and image analysis, biologists can now systematically track individual cells in complex tissues and quantify cellular behavior over extended time windows. Computational approaches that utilize the systematic and quantitative data are needed to understand how cells interact in vivo to give rise to the different cell types and 3D morphology of tissues. An agent-based, minimum descriptive modeling and analysis framework is presented in this paper to study C. elegans embryogenesis. The framework is designed to incorporate the large amounts of experimental observations on cellular behavior and reserve data structures/interfaces that allow regulatory mechanisms to be added as more insights are gained. Observed cellular behaviors are organized into lineage identity, timing and direction of cell division, and path of cell movement. The framework also includes global parameters such as the eggshell and a clock. Division and movement behaviors are driven by statistical models of the observations. Data structures/interfaces are reserved for gene list, cell-cell interaction, cell fate and landscape, and other global parameters until the descriptive model is replaced by a regulatory mechanism. This approach provides a framework to handle the ongoing experiments of single-cell analysis of complex tissues where mechanistic insights lag data collection and need to be validated on complex observations.
FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds.

PubMed

Abbasi, Elham; Ghatee, Mehdi; Shiri, M E

2013-09-01

In this paper, an intelligent hyper framework is proposed to recognize protein folds from its amino acid sequence which is a fundamental problem in bioinformatics. This framework includes some statistical and intelligent algorithms for proteins classification. The main components of the proposed framework are the Fuzzy Resource-Allocating Network (FRAN) and the Radial Bases Function based on Particle Swarm Optimization (RBF-PSO). FRAN applies a dynamic method to tune up the RBF network parameters. Due to the patterns complexity captured in protein dataset, FRAN classifies the proteins under fuzzy conditions. Also, RBF-PSO applies PSO to tune up the RBF classifier. Experimental results demonstrate that FRAN improves prediction accuracy up to 51% and achieves acceptable multi-class results for protein fold prediction. Although RBF-PSO provides reasonable results for protein fold recognition up to 48%, it is weaker than FRAN in some cases. However the proposed hyper framework provides an opportunity to use a great range of intelligent methods and can learn from previous experiences. Thus it can avoid the weakness of some intelligent methods in terms of memory, computational time and static structure. Furthermore, the performance of this system can be enhanced throughout the system life-cycle. Copyright © 2013 Elsevier Ltd. All rights reserved.
A flexible data-driven comorbidity feature extraction framework.

PubMed

Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid

2016-06-01

Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

NASA Astrophysics Data System (ADS)

Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

2006-12-01

We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.
Stochastic modeling of sunshine number data

NASA Astrophysics Data System (ADS)

Brabec, Marek; Paulescu, Marius; Badescu, Viorel

2013-11-01

In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation of Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.
Stochastic modeling of sunshine number data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brabec, Marek, E-mail: mbrabec@cs.cas.cz; Paulescu, Marius; Badescu, Viorel

2013-11-13

In this paper, we will present a unified statistical modeling framework for estimation and forecasting sunshine number (SSN) data. Sunshine number has been proposed earlier to describe sunshine time series in qualitative terms (Theor Appl Climatol 72 (2002) 127-136) and since then, it was shown to be useful not only for theoretical purposes but also for practical considerations, e.g. those related to the development of photovoltaic energy production. Statistical modeling and prediction of SSN as a binary time series has been challenging problem, however. Our statistical model for SSN time series is based on an underlying stochastic process formulation ofmore » Markov chain type. We will show how its transition probabilities can be efficiently estimated within logistic regression framework. In fact, our logistic Markovian model can be relatively easily fitted via maximum likelihood approach. This is optimal in many respects and it also enables us to use formalized statistical inference theory to obtain not only the point estimates of transition probabilities and their functions of interest, but also related uncertainties, as well as to test of various hypotheses of practical interest, etc. It is straightforward to deal with non-homogeneous transition probabilities in this framework. Very importantly from both physical and practical points of view, logistic Markov model class allows us to test hypotheses about how SSN dependents on various external covariates (e.g. elevation angle, solar time, etc.) and about details of the dynamic model (order and functional shape of the Markov kernel, etc.). Therefore, using generalized additive model approach (GAM), we can fit and compare models of various complexity which insist on keeping physical interpretation of the statistical model and its parts. After introducing the Markovian model and general approach for identification of its parameters, we will illustrate its use and performance on high resolution SSN data from the Solar Radiation Monitoring Station of the West University of Timisoara.« less
Economic Impacts of Infrastructure Damages on Industrial Sector

NASA Astrophysics Data System (ADS)

Kajitani, Yoshio

This paper proposes a basic model for evaluating economic impacts on industrial sectors under the conditions that multiple infrastructures are simultaneously damaged during the earthquake disasters. Especially, focusing on the available economic data developed in the smallest spatial scale in Japan (small area statistics), economic loss estimation model based on the small area statistics and its applicability are investigated on. In the detail, a loss estimation framework, utilizing survey results on firms' activities under electricity, water and gas disruptions, and route choice models in Transportation Engineering, are applied to the case of 2004 Mid-Niigata Earthquake.
Retrieving clinically relevant diabetic retinopathy images using a multi-class multiple-instance framework

NASA Astrophysics Data System (ADS)

Chandakkar, Parag S.; Venkatesan, Ragav; Li, Baoxin

2013-02-01

Diabetic retinopathy (DR) is a vision-threatening complication from diabetes mellitus, a medical condition that is rising globally. Unfortunately, many patients are unaware of this complication because of absence of symptoms. Regular screening of DR is necessary to detect the condition for timely treatment. Content-based image retrieval, using archived and diagnosed fundus (retinal) camera DR images can improve screening efficiency of DR. This content-based image retrieval study focuses on two DR clinical findings, microaneurysm and neovascularization, which are clinical signs of non-proliferative and proliferative diabetic retinopathy. The authors propose a multi-class multiple-instance image retrieval framework which deploys a modified color correlogram and statistics of steerable Gaussian Filter responses, for retrieving clinically relevant images from a database of DR fundus image database.
Nonequilibrium Statistical Operator Method and Generalized Kinetic Equations

NASA Astrophysics Data System (ADS)

Kuzemsky, A. L.

2018-01-01

We consider some principal problems of nonequilibrium statistical thermodynamics in the framework of the Zubarev nonequilibrium statistical operator approach. We present a brief comparative analysis of some approaches to describing irreversible processes based on the concept of nonequilibrium Gibbs ensembles and their applicability to describing nonequilibrium processes. We discuss the derivation of generalized kinetic equations for a system in a heat bath. We obtain and analyze a damped Schrödinger-type equation for a dynamical system in a heat bath. We study the dynamical behavior of a particle in a medium taking the dissipation effects into account. We consider the scattering problem for neutrons in a nonequilibrium medium and derive a generalized Van Hove formula. We show that the nonequilibrium statistical operator method is an effective, convenient tool for describing irreversible processes in condensed matter.
Assessing the significance of pedobarographic signals using random field theory.

PubMed

Pataky, Todd C

2008-08-07

Traditional pedobarographic statistical analyses are conducted over discrete regions. Recent studies have demonstrated that regionalization can corrupt pedobarographic field data through conflation when arbitrary dividing lines inappropriately delineate smooth field processes. An alternative is to register images such that homologous structures optimally overlap and then conduct statistical tests at each pixel to generate statistical parametric maps (SPMs). The significance of SPM processes may be assessed within the framework of random field theory (RFT). RFT is ideally suited to pedobarographic image analysis because its fundamental data unit is a lattice sampling of a smooth and continuous spatial field. To correct for the vast number of multiple comparisons inherent in such data, recent pedobarographic studies have employed a Bonferroni correction to retain a constant family-wise error rate. This approach unfortunately neglects the spatial correlation of neighbouring pixels, so provides an overly conservative (albeit valid) statistical threshold. RFT generally relaxes the threshold depending on field smoothness and on the geometry of the search area, but it also provides a framework for assigning p values to suprathreshold clusters based on their spatial extent. The current paper provides an overview of basic RFT concepts and uses simulated and experimental data to validate both RFT-relevant field smoothness estimations and RFT predictions regarding the topological characteristics of random pedobarographic fields. Finally, previously published experimental data are re-analysed using RFT inference procedures to demonstrate how RFT yields easily understandable statistical results that may be incorporated into routine clinical and laboratory analyses.
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.

PubMed

Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G; Kennedy, Richard B; Haralambieva, Iana H; Poland, Gregory A; Schaid, Daniel J

2015-07-01

Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. Copyright © 2015 by the Genetics Society of America.
Improving alignment in Tract-based spatial statistics: evaluation and optimization of image registration.

PubMed

de Groot, Marius; Vernooij, Meike W; Klein, Stefan; Ikram, M Arfan; Vos, Frans M; Smith, Stephen M; Niessen, Wiro J; Andersson, Jesper L R

2013-08-01

Anatomical alignment in neuroimaging studies is of such importance that considerable effort is put into improving the registration used to establish spatial correspondence. Tract-based spatial statistics (TBSS) is a popular method for comparing diffusion characteristics across subjects. TBSS establishes spatial correspondence using a combination of nonlinear registration and a "skeleton projection" that may break topological consistency of the transformed brain images. We therefore investigated feasibility of replacing the two-stage registration-projection procedure in TBSS with a single, regularized, high-dimensional registration. To optimize registration parameters and to evaluate registration performance in diffusion MRI, we designed an evaluation framework that uses native space probabilistic tractography for 23 white matter tracts, and quantifies tract similarity across subjects in standard space. We optimized parameters for two registration algorithms on two diffusion datasets of different quality. We investigated reproducibility of the evaluation framework, and of the optimized registration algorithms. Next, we compared registration performance of the regularized registration methods and TBSS. Finally, feasibility and effect of incorporating the improved registration in TBSS were evaluated in an example study. The evaluation framework was highly reproducible for both algorithms (R(2) 0.993; 0.931). The optimal registration parameters depended on the quality of the dataset in a graded and predictable manner. At optimal parameters, both algorithms outperformed the registration of TBSS, showing feasibility of adopting such approaches in TBSS. This was further confirmed in the example experiment. Copyright © 2013 Elsevier Inc. All rights reserved.
A voting-based statistical cylinder detection framework applied to fallen tree mapping in terrestrial laser scanning point clouds

NASA Astrophysics Data System (ADS)

Polewski, Przemyslaw; Yao, Wei; Heurich, Marco; Krzystek, Peter; Stilla, Uwe

2017-07-01

This paper introduces a statistical framework for detecting cylindrical shapes in dense point clouds. We target the application of mapping fallen trees in datasets obtained through terrestrial laser scanning. This is a challenging task due to the presence of ground vegetation, standing trees, DTM artifacts, as well as the fragmentation of dead trees into non-collinear segments. Our method shares the concept of voting in parameter space with the generalized Hough transform, however two of its significant drawbacks are improved upon. First, the need to generate samples on the shape's surface is eliminated. Instead, pairs of nearby input points lying on the surface cast a vote for the cylinder's parameters based on the intrinsic geometric properties of cylindrical shapes. Second, no discretization of the parameter space is required: the voting is carried out in continuous space by means of constructing a kernel density estimator and obtaining its local maxima, using automatic, data-driven kernel bandwidth selection. Furthermore, we show how the detected cylindrical primitives can be efficiently merged to obtain object-level (entire tree) semantic information using graph-cut segmentation and a tailored dynamic algorithm for eliminating cylinder redundancy. Experiments were performed on 3 plots from the Bavarian Forest National Park, with ground truth obtained through visual inspection of the point clouds. It was found that relative to sample consensus (SAC) cylinder fitting, the proposed voting framework can improve the detection completeness by up to 10 percentage points while maintaining the correctness rate.

Development of Turbulent Biological Closure Parameterizations

DTIC Science & Technology

2011-09-30

LONG-TERM GOAL: The long-term goals of this project are: (1) to develop a theoretical framework to quantify turbulence induced NPZ interactions. (2) to apply the theory to develop parameterizations to be used in realistic environmental physical biological coupling numerical models. OBJECTIVES: Connect the Goodman and Robinson (2008) statistically based pdf theory to Advection Diffusion Reaction (ADR) modeling of NPZ interaction.
The Legal Consequences of Mandating High Stakes Decisions Based on Low Quality Information: Teacher Evaluation in the Race-to-the-Top Era

ERIC Educational Resources Information Center

Baker, Bruce D.; Oluwole, Joseph O.; Green, Preston C., III

2013-01-01

In this article, we explain how overly prescriptive, rigid state statutory and regulatory policy frameworks regarding teacher evaluation, tenure and employment decisions outstrip the statistical reliability and validity of proposed measures of teaching effectiveness. We begin with a discussion of the emergence of highly prescriptive state…
Volcano plots in analyzing differential expressions with mRNA microarrays.

PubMed

Li, Wentian

2012-12-01

A volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log(10)(p-value) from the t-test). We review the basic and interactive use of the volcano plot and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide a unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility of applying volcano plots to other fields beyond microarray.
Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks

DOE PAGES

Karpinets, Tatiana V.; Gopalakrishnan, Vancheswaran; Wargo, Jennifer; ...

2018-03-07

Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putativemore » species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the Populus tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.« less
Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karpinets, Tatiana V.; Gopalakrishnan, Vancheswaran; Wargo, Jennifer

Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putativemore » species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the Populus tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.« less
Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

PubMed

Ing, Alex; Schwarzbauer, Christian

2014-01-01

Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.
Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

PubMed Central

Ing, Alex; Schwarzbauer, Christian

2014-01-01

Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136
A General Framework for Power Analysis to Detect the Moderator Effects in Two- and Three-Level Cluster Randomized Trials

ERIC Educational Resources Information Center

Dong, Nianbo; Spybrook, Jessaca; Kelcey, Ben

2016-01-01

The purpose of this study is to propose a general framework for power analyses to detect the moderator effects in two- and three-level cluster randomized trials (CRTs). The study specifically aims to: (1) develop the statistical formulations for calculating statistical power, minimum detectable effect size (MDES) and its confidence interval to…
Reducing Anxiety and Increasing Self-Efficacy within an Advanced Graduate Psychology Statistics Course

ERIC Educational Resources Information Center

McGrath, April L.; Ferns, Alyssa; Greiner, Leigh; Wanamaker, Kayla; Brown, Shelley

2015-01-01

In this study we assessed the usefulness of a multifaceted teaching framework in an advanced statistics course. We sought to expand on past findings by using this framework to assess changes in anxiety and self-efficacy, and we collected focus group data to ascertain whether students attribute such changes to a multifaceted teaching approach.…
Automated speech understanding: the next generation

NASA Astrophysics Data System (ADS)

Picone, J.; Ebel, W. J.; Deshmukh, N.

1995-04-01

Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP). Once a field dominated by vector-oriented processors and linear algebra-based mathematics, the current generation of DSP-based systems rely on sophisticated statistical models implemented using a complex software paradigm. Such systems are now capable of understanding continuous speech input for vocabularies of several thousand words in operational environments. The current generation of deployed systems, based on small vocabularies of isolated words, will soon be replaced by a new technology offering natural language access to vast information resources such as the Internet, and provide completely automated voice interfaces for mundane tasks such as travel planning and directory assistance.
Applications of statistics to medical science (1) Fundamental concepts.

PubMed

Watanabe, Hiroshi

2011-01-01

The conceptual framework of statistical tests and statistical inferences are discussed, and the epidemiological background of statistics is briefly reviewed. This study is one of a series in which we survey the basics of statistics and practical methods used in medical statistics. Arguments related to actual statistical analysis procedures will be made in subsequent papers.
History matching through dynamic decision-making

PubMed Central

Maschio, Célio; Santos, Antonio Alberto; Schiozer, Denis; Rocha, Anderson

2017-01-01

History matching is the process of modifying the uncertain attributes of a reservoir model to reproduce the real reservoir performance. It is a classical reservoir engineering problem and plays an important role in reservoir management since the resulting models are used to support decisions in other tasks such as economic analysis and production strategy. This work introduces a dynamic decision-making optimization framework for history matching problems in which new models are generated based on, and guided by, the dynamic analysis of the data of available solutions. The optimization framework follows a ‘learning-from-data’ approach, and includes two optimizer components that use machine learning techniques, such as unsupervised learning and statistical analysis, to uncover patterns of input attributes that lead to good output responses. These patterns are used to support the decision-making process while generating new, and better, history matched solutions. The proposed framework is applied to a benchmark model (UNISIM-I-H) based on the Namorado field in Brazil. Results show the potential the dynamic decision-making optimization framework has for improving the quality of history matching solutions using a substantial smaller number of simulations when compared with a previous work on the same benchmark. PMID:28582413
OpenNFT: An open-source Python/Matlab framework for real-time fMRI neurofeedback training based on activity, connectivity and multivariate pattern analysis.

PubMed

Koush, Yury; Ashburner, John; Prilepin, Evgeny; Sladky, Ronald; Zeidman, Peter; Bibikov, Sergei; Scharnowski, Frank; Nikonorov, Artem; De Ville, Dimitri Van

2017-08-01

Neurofeedback based on real-time functional magnetic resonance imaging (rt-fMRI) is a novel and rapidly developing research field. It allows for training of voluntary control over localized brain activity and connectivity and has demonstrated promising clinical applications. Because of the rapid technical developments of MRI techniques and the availability of high-performance computing, new methodological advances in rt-fMRI neurofeedback become possible. Here we outline the core components of a novel open-source neurofeedback framework, termed Open NeuroFeedback Training (OpenNFT), which efficiently integrates these new developments. This framework is implemented using Python and Matlab source code to allow for diverse functionality, high modularity, and rapid extendibility of the software depending on the user's needs. In addition, it provides an easy interface to the functionality of Statistical Parametric Mapping (SPM) that is also open-source and one of the most widely used fMRI data analysis software. We demonstrate the functionality of our new framework by describing case studies that include neurofeedback protocols based on brain activity levels, effective connectivity models, and pattern classification approaches. This open-source initiative provides a suitable framework to actively engage in the development of novel neurofeedback approaches, so that local methodological developments can be easily made accessible to a wider range of users. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Combining statistical inference and decisions in ecology.

PubMed

Williams, Perry J; Hooten, Mevin B

2016-09-01

Statistical decision theory (SDT) is a sub-field of decision theory that formally incorporates statistical investigation into a decision-theoretic framework to account for uncertainties in a decision problem. SDT provides a unifying analysis of three types of information: statistical results from a data set, knowledge of the consequences of potential choices (i.e., loss), and prior beliefs about a system. SDT links the theoretical development of a large body of statistical methods, including point estimation, hypothesis testing, and confidence interval estimation. The theory and application of SDT have mainly been developed and published in the fields of mathematics, statistics, operations research, and other decision sciences, but have had limited exposure in ecology. Thus, we provide an introduction to SDT for ecologists and describe its utility for linking the conventionally separate tasks of statistical investigation and decision making in a single framework. We describe the basic framework of both Bayesian and frequentist SDT, its traditional use in statistics, and discuss its application to decision problems that occur in ecology. We demonstrate SDT with two types of decisions: Bayesian point estimation and an applied management problem of selecting a prescribed fire rotation for managing a grassland bird species. Central to SDT, and decision theory in general, are loss functions. Thus, we also provide basic guidance and references for constructing loss functions for an SDT problem. © 2016 by the Ecological Society of America.
Cracking the Neural Code for Sensory Perception by Combining Statistics, Intervention, and Behavior.

PubMed

Panzeri, Stefano; Harvey, Christopher D; Piasini, Eugenio; Latham, Peter E; Fellin, Tommaso

2017-02-08

The two basic processes underlying perceptual decisions-how neural responses encode stimuli, and how they inform behavioral choices-have mainly been studied separately. Thus, although many spatiotemporal features of neural population activity, or "neural codes," have been shown to carry sensory information, it is often unknown whether the brain uses these features for perception. To address this issue, we propose a new framework centered on redefining the neural code as the neural features that carry sensory information used by the animal to drive appropriate behavior; that is, the features that have an intersection between sensory and choice information. We show how this framework leads to a new statistical analysis of neural activity recorded during behavior that can identify such neural codes, and we discuss how to combine intersection-based analysis of neural recordings with intervention on neural activity to determine definitively whether specific neural activity features are involved in a task. Copyright © 2017 Elsevier Inc. All rights reserved.
Population analysis of the cingulum bundle using the tubular surface model for schizophrenia detection

NASA Astrophysics Data System (ADS)

Mohan, Vandana; Sundaramoorthi, Ganesh; Kubicki, Marek; Terry, Douglas; Tannenbaum, Allen

2010-03-01

We propose a novel framework for population analysis of DW-MRI data using the Tubular Surface Model. We focus on the Cingulum Bundle (CB) - a major tract for the Limbic System and the main connection of the Cingulate Gyrus, which has been associated with several aspects of Schizophrenia symptomatology. The Tubular Surface Model represents a tubular surface as a center-line with an associated radius function. It provides a natural way to sample statistics along the length of the fiber bundle and reduces the registration of fiber bundle surfaces to that of 4D curves. We apply our framework to a population of 20 subjects (10 normal, 10 schizophrenic) and obtain excellent results with neural network based classification (90% sensitivity, 95% specificity) as well as unsupervised clustering (k-means). Further, we apply statistical analysis to the feature data and characterize the discrimination ability of local regions of the CB, as a step towards localizing CB regions most relevant to Schizophrenia.
Using Intervention Mapping to Develop an Oral Health e-Curriculum for Secondary Prevention of Eating Disorders.

PubMed

DeBate, Rita D; Bleck, Jennifer R; Raven, Jessica; Severson, Herb

2017-06-01

Preventing oral-systemic health issues relies on evidence-based interventions across various system-level target groups. Although the use of theory- and evidence-based approaches has been encouraged in developing oral health behavior change programs, the translation of theoretical constructs and principles to behavior change interventions has not been well described. Based on a series of six systematic steps, Intervention Mapping provides a framework for effective decision making with regard to developing, implementing, and evaluating theory- and evidence-informed, system-based behavior change programs. This article describes the application of the Intervention Mapping framework to develop the EAT (evaluating, assessing, and treating) evidence-based intervention with the goal of increasing the capacity of oral health providers to engage in secondary prevention of oral-systemic issues associated with disordered eating behaviors. Examples of data and deliverables for each step are described. In addition, results from evaluation of the intervention via randomized control trial are described, with statistically significant differences observed in behavioral outcomes in the intervention group with effect sizes ranging from r=0.62 to 0.83. These results suggest that intervention mapping, via the six systematic steps, can be useful as a framework for continued development of preventive interventions.
Confounding in statistical mediation analysis: What it is and how to address it.

PubMed

Valente, Matthew J; Pelham, William E; Smyth, Heather; MacKinnon, David P

2017-11-01

Psychology researchers are often interested in mechanisms underlying how randomized interventions affect outcomes such as substance use and mental health. Mediation analysis is a common statistical method for investigating psychological mechanisms that has benefited from exciting new methodological improvements over the last 2 decades. One of the most important new developments is methodology for estimating causal mediated effects using the potential outcomes framework for causal inference. Potential outcomes-based methods developed in epidemiology and statistics have important implications for understanding psychological mechanisms. We aim to provide a concise introduction to and illustration of these new methods and emphasize the importance of confounder adjustment. First, we review the traditional regression approach for estimating mediated effects. Second, we describe the potential outcomes framework. Third, we define what a confounder is and how the presence of a confounder can provide misleading evidence regarding mechanisms of interventions. Fourth, we describe experimental designs that can help rule out confounder bias. Fifth, we describe new statistical approaches to adjust for measured confounders of the mediator-outcome relation and sensitivity analyses to probe effects of unmeasured confounders on the mediated effect. All approaches are illustrated with application to a real counseling intervention dataset. Counseling psychologists interested in understanding the causal mechanisms of their interventions can benefit from incorporating the most up-to-date techniques into their mediation analyses. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Variational and perturbative formulations of quantum mechanical/molecular mechanical free energy with mean-field embedding and its analytical gradients.

PubMed

Yamamoto, Takeshi

2008-12-28

Conventional quantum chemical solvation theories are based on the mean-field embedding approximation. That is, the electronic wavefunction is calculated in the presence of the mean field of the environment. In this paper a direct quantum mechanical/molecular mechanical (QM/MM) analog of such a mean-field theory is formulated based on variational and perturbative frameworks. In the variational framework, an appropriate QM/MM free energy functional is defined and is minimized in terms of the trial wavefunction that best approximates the true QM wavefunction in a statistically averaged sense. Analytical free energy gradient is obtained, which takes the form of the gradient of effective QM energy calculated in the averaged MM potential. In the perturbative framework, the above variational procedure is shown to be equivalent to the first-order expansion of the QM energy (in the exact free energy expression) about the self-consistent reference field. This helps understand the relation between the variational procedure and the exact QM/MM free energy as well as existing QM/MM theories. Based on this, several ways are discussed for evaluating non-mean-field effects (i.e., statistical fluctuations of the QM wavefunction) that are neglected in the mean-field calculation. As an illustration, the method is applied to an S(N)2 Menshutkin reaction in water, NH(3)+CH(3)Cl-->NH(3)CH(3) (+)+Cl(-), for which free energy profiles are obtained at the Hartree-Fock, MP2, B3LYP, and BHHLYP levels by integrating the free energy gradient. Non-mean-field effects are evaluated to be <0.5 kcal/mol using a Gaussian fluctuation model for the environment, which suggests that those effects are rather small for the present reaction in water.
A Classification of Statistics Courses (A Framework for Studying Statistical Education)

ERIC Educational Resources Information Center

Turner, J. C.

1976-01-01

A classification of statistics courses in presented, with main categories of "course type,""methods of presentation,""objectives," and "syllabus." Examples and suggestions for uses of the classification are given. (DT)

Compressing random microstructures via stochastic Wang tilings.

PubMed

Novák, Jan; Kučerová, Anna; Zeman, Jan

2012-10-01

This Rapid Communication presents a stochastic Wang tiling-based technique to compress or reconstruct disordered microstructures on the basis of given spatial statistics. Unlike the existing approaches based on a single unit cell, it utilizes a finite set of tiles assembled by a stochastic tiling algorithm, thereby allowing to accurately reproduce long-range orientation orders in a computationally efficient manner. Although the basic features of the method are demonstrated for a two-dimensional particulate suspension, the present framework is fully extensible to generic multidimensional media.
Accurate continuous geographic assignment from low- to high-density SNP data.

PubMed

Guillot, Gilles; Jónsson, Hákon; Hinge, Antoine; Manchih, Nabil; Orlando, Ludovic

2016-04-01

Large-scale genotype datasets can help track the dispersal patterns of epidemiological outbreaks and predict the geographic origins of individuals. Such genetically-based geographic assignments also show a range of possible applications in forensics for profiling both victims and criminals, and in wildlife management, where poaching hotspot areas can be located. They, however, require fast and accurate statistical methods to handle the growing amount of genetic information made available from genotype arrays and next-generation sequencing technologies. We introduce a novel statistical method for geopositioning individuals of unknown origin from genotypes. Our method is based on a geostatistical model trained with a dataset of georeferenced genotypes. Statistical inference under this model can be implemented within the theoretical framework of Integrated Nested Laplace Approximation, which represents one of the major recent breakthroughs in statistics, as it does not require Monte Carlo simulations. We compare the performance of our method and an alternative method for geospatial inference, SPA in a simulation framework. We highlight the accuracy and limits of continuous spatial assignment methods at various scales by analyzing genotype datasets from a diversity of species, including Florida Scrub-jay birds Aphelocoma coerulescens, Arabidopsis thaliana and humans, representing 41-197,146 SNPs. Our method appears to be best suited for the analysis of medium-sized datasets (a few tens of thousands of loci), such as reduced-representation sequencing data that become increasingly available in ecology. http://www2.imm.dtu.dk/∼gigu/Spasiba/ gilles.b.guillot@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
[The GIPSY-RECPAM model: a versatile approach for integrated evaluation in cardiologic care].

PubMed

Carinci, F

2009-01-01

Tree-structured methodology applied for the GISSI-PSICOLOGIA project, although performed in the framework of earliest GISSI studies, represents a powerful tool to analyze different aspects of cardiologic care. The GISSI-PSICOLOGIA project has delivered a novel methodology based on the joint application of psychometric tools and sophisticated statistical techniques. Its prospective use could allow building effective epidemiological models relevant to the prognosis of the cardiologic patient. The various features of the RECPAM method allow a versatile use in the framework of modern e-health projects. The study used the Cognitive Behavioral Assessment H Form (CBA-H) psychometrics scales. The potential for its future application in the framework of Italian cardiology is relevant and particularly indicated to assist planning of systems for integrated care and routine evaluation of the cardiologic patient.
Nonlinear and non-Gaussian Bayesian based handwriting beautification

NASA Astrophysics Data System (ADS)

Shi, Cao; Xiao, Jianguo; Xu, Canhui; Jia, Wenhua

2013-03-01

A framework is proposed in this paper to effectively and efficiently beautify handwriting by means of a novel nonlinear and non-Gaussian Bayesian algorithm. In the proposed framework, format and size of handwriting image are firstly normalized, and then typeface in computer system is applied to optimize vision effect of handwriting. The Bayesian statistics is exploited to characterize the handwriting beautification process as a Bayesian dynamic model. The model parameters to translate, rotate and scale typeface in computer system are controlled by state equation, and the matching optimization between handwriting and transformed typeface is employed by measurement equation. Finally, the new typeface, which is transformed from the original one and gains the best nonlinear and non-Gaussian optimization, is the beautification result of handwriting. Experimental results demonstrate the proposed framework provides a creative handwriting beautification methodology to improve visual acceptance.
Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.

PubMed

Monica, Stefania; Ferrari, Gianluigi

2018-05-17

Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.
The Practicality of Statistical Physics Handout Based on KKNI and the Constructivist Approach

NASA Astrophysics Data System (ADS)

Sari, S. Y.; Afrizon, R.

2018-04-01

Statistical physics lecture shows that: 1) the performance of lecturers, social climate, students’ competence and soft skills needed at work are in enough category, 2) students feel difficulties in following the lectures of statistical physics because it is abstract, 3) 40.72% of students needs more understanding in the form of repetition, practice questions and structured tasks, and 4) the depth of statistical physics material needs to be improved gradually and structured. This indicates that learning materials in accordance of The Indonesian National Qualification Framework or Kerangka Kualifikasi Nasional Indonesia (KKNI) with the appropriate learning approach are needed to help lecturers and students in lectures. The author has designed statistical physics handouts which have very valid criteria (90.89%) according to expert judgment. In addition, the practical level of handouts designed also needs to be considered in order to be easy to use, interesting and efficient in lectures. The purpose of this research is to know the practical level of statistical physics handout based on KKNI and a constructivist approach. This research is a part of research and development with 4-D model developed by Thiagarajan. This research activity has reached part of development test at Development stage. Data collection took place by using a questionnaire distributed to lecturers and students. Data analysis using descriptive data analysis techniques in the form of percentage. The analysis of the questionnaire shows that the handout of statistical physics has very practical criteria. The conclusion of this study is statistical physics handouts based on the KKNI and constructivist approach have been practically used in lectures.
Comparative analysis of the fit of 3-unit implant-supported frameworks cast in nickel-chromium and cobalt-chromium alloys and commercially pure titanium after casting, laser welding, and simulated porcelain firings.

PubMed

Tiossi, Rodrigo; Rodrigues, Renata Cristina Silveira; de Mattos, Maria da Glória Chiarello; Ribeiro, Ricardo Faria

2008-01-01

This study compared the vertical misfit of 3-unit implant-supported nickel-chromium (Ni-Cr) and cobalt-chromium (Co-Cr) alloy and commercially pure titanium (cpTi) frameworks after casting as 1 piece, after sectioning and laser welding, and after simulated porcelain firings. The results on the tightened side showed no statistically significant differences. On the opposite side, statistically significant differences were found for Co-Cr alloy (118.64 microm [SD: 91.48] to 39.90 microm [SD: 27.13]) and cpTi (118.56 microm [51.35] to 27.87 microm [12.71]) when comparing 1-piece to laser-welded frameworks. With both sides tightened, only Co-Cr alloy showed statistically significant differences after laser welding. Ni-Cr alloy showed the lowest misfit values, though the differences were not statistically significantly different. Simulated porcelain firings revealed no significant differences.
Using expert knowledge to incorporate uncertainty in cause-of-death assignments for modeling of cause-specific mortality

USGS Publications Warehouse

Walsh, Daniel P.; Norton, Andrew S.; Storm, Daniel J.; Van Deelen, Timothy R.; Heisy, Dennis M.

2018-01-01

Implicit and explicit use of expert knowledge to inform ecological analyses is becoming increasingly common because it often represents the sole source of information in many circumstances. Thus, there is a need to develop statistical methods that explicitly incorporate expert knowledge, and can successfully leverage this information while properly accounting for associated uncertainty during analysis. Studies of cause-specific mortality provide an example of implicit use of expert knowledge when causes-of-death are uncertain and assigned based on the observer's knowledge of the most likely cause. To explicitly incorporate this use of expert knowledge and the associated uncertainty, we developed a statistical model for estimating cause-specific mortality using a data augmentation approach within a Bayesian hierarchical framework. Specifically, for each mortality event, we elicited the observer's belief of cause-of-death by having them specify the probability that the death was due to each potential cause. These probabilities were then used as prior predictive values within our framework. This hierarchical framework permitted a simple and rigorous estimation method that was easily modified to include covariate effects and regularizing terms. Although applied to survival analysis, this method can be extended to any event-time analysis with multiple event types, for which there is uncertainty regarding the true outcome. We conducted simulations to determine how our framework compared to traditional approaches that use expert knowledge implicitly and assume that cause-of-death is specified accurately. Simulation results supported the inclusion of observer uncertainty in cause-of-death assignment in modeling of cause-specific mortality to improve model performance and inference. Finally, we applied the statistical model we developed and a traditional method to cause-specific survival data for white-tailed deer, and compared results. We demonstrate that model selection results changed between the two approaches, and incorporating observer knowledge in cause-of-death increased the variability associated with parameter estimates when compared to the traditional approach. These differences between the two approaches can impact reported results, and therefore, it is critical to explicitly incorporate expert knowledge in statistical methods to ensure rigorous inference.
Lagrangian Particle Tracking Simulation for Warm-Rain Processes in Quasi-One-Dimensional Domain

NASA Astrophysics Data System (ADS)

Kunishima, Y.; Onishi, R.

2017-12-01

Conventional cloud simulations are based on the Euler method and compute each microphysics process in a stochastic way assuming infinite numbers of particles within each numerical grid. They therefore cannot provide the Lagrangian statistics of individual particles in cloud microphysics (i.e., aerosol particles, cloud particles, and rain drops) nor discuss the statistical fluctuations due to finite number of particles. We here simulate the entire precipitation process of warm-rain, with tracking individual particles. We use the Lagrangian Cloud Simulator (LCS), which is based on the Euler-Lagrangian framework. In that framework, flow motion and scalar transportation are computed with the Euler method, and particle motion with the Lagrangian one. The LCS tracks particle motions and collision events individually with considering the hydrodynamic interaction between approaching particles with a superposition method, that is, it can directly represent the collisional growth of cloud particles. It is essential for trustworthy collision detection to take account of the hydrodynamic interaction. In this study, we newly developed a stochastic model based on the Twomey cloud condensation nuclei (CCN) activation for the Lagrangian tracking simulation and integrated it into the LCS. Coupling with the Euler computation for water vapour and temperature fields, the initiation and condensational growth of water droplets were computed in the Lagrangian way. We applied the integrated LCS for a kinematic simulation of warm-rain processes in a vertically-elongated domain of, at largest, 0.03×0.03×3000 (m3) with horizontal periodicity. Aerosol particles with a realistic number density, 5×107 (m3), were evenly distributed over the domain at the initial state. Prescribed updraft at the early stage initiated development of a precipitating cloud. We have confirmed that the obtained bulk statistics fairly agree with those from a conventional spectral-bin scheme for a vertical column domain. The centre of the discussion will be the Lagrangian statistics which is collected from the individual behaviour of the tracked particles.
Design of a testing strategy using non-animal based test methods: lessons learnt from the ACuteTox project.

PubMed

Kopp-Schneider, Annette; Prieto, Pilar; Kinsner-Ovaskainen, Agnieszka; Stanzel, Sven

2013-06-01

In the framework of toxicology, a testing strategy can be viewed as a series of steps which are taken to come to a final prediction about a characteristic of a compound under study. The testing strategy is performed as a single-step procedure, usually called a test battery, using simultaneously all information collected on different endpoints, or as tiered approach in which a decision tree is followed. Design of a testing strategy involves statistical considerations, such as the development of a statistical prediction model. During the EU FP6 ACuteTox project, several prediction models were proposed on the basis of statistical classification algorithms which we illustrate here. The final choice of testing strategies was not based on statistical considerations alone. However, without thorough statistical evaluations a testing strategy cannot be identified. We present here a number of observations made from the statistical viewpoint which relate to the development of testing strategies. The points we make were derived from problems we had to deal with during the evaluation of this large research project. A central issue during the development of a prediction model is the danger of overfitting. Procedures are presented to deal with this challenge. Copyright © 2012 Elsevier Ltd. All rights reserved.
Using Bayesian regression to test hypotheses about relationships between parameters and covariates in cognitive models.

PubMed

Boehm, Udo; Steingroever, Helen; Wagenmakers, Eric-Jan

2018-06-01

An important tool in the advancement of cognitive science are quantitative models that represent different cognitive variables in terms of model parameters. To evaluate such models, their parameters are typically tested for relationships with behavioral and physiological variables that are thought to reflect specific cognitive processes. However, many models do not come equipped with the statistical framework needed to relate model parameters to covariates. Instead, researchers often revert to classifying participants into groups depending on their values on the covariates, and subsequently comparing the estimated model parameters between these groups. Here we develop a comprehensive solution to the covariate problem in the form of a Bayesian regression framework. Our framework can be easily added to existing cognitive models and allows researchers to quantify the evidential support for relationships between covariates and model parameters using Bayes factors. Moreover, we present a simulation study that demonstrates the superiority of the Bayesian regression framework to the conventional classification-based approach.
Bayesian networks for evaluation of evidence from forensic entomology.

PubMed

Andersson, M Gunnar; Sundström, Anders; Lindström, Anders

2013-09-01

In the aftermath of a CBRN incident, there is an urgent need to reconstruct events in order to bring the perpetrators to court and to take preventive actions for the future. The challenge is to discriminate, based on available information, between alternative scenarios. Forensic interpretation is used to evaluate to what extent results from the forensic investigation favor the prosecutors' or the defendants' arguments, using the framework of Bayesian hypothesis testing. Recently, several new scientific disciplines have been used in a forensic context. In the AniBioThreat project, the framework was applied to veterinary forensic pathology, tracing of pathogenic microorganisms, and forensic entomology. Forensic entomology is an important tool for estimating the postmortem interval in, for example, homicide investigations as a complement to more traditional methods. In this article we demonstrate the applicability of the Bayesian framework for evaluating entomological evidence in a forensic investigation through the analysis of a hypothetical scenario involving suspect movement of carcasses from a clandestine laboratory. Probabilities of different findings under the alternative hypotheses were estimated using a combination of statistical analysis of data, expert knowledge, and simulation, and entomological findings are used to update the beliefs about the prosecutors' and defendants' hypotheses and to calculate the value of evidence. The Bayesian framework proved useful for evaluating complex hypotheses using findings from several insect species, accounting for uncertainty about development rate, temperature, and precolonization. The applicability of the forensic statistic approach to evaluating forensic results from a CBRN incident is discussed.
Tipping point analysis of atmospheric oxygen concentration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Livina, V. N.; Forbes, A. B.; Vaz Martins, T. M.

2015-03-15

We apply tipping point analysis to nine observational oxygen concentration records around the globe, analyse their dynamics and perform projections under possible future scenarios, leading to oxygen deficiency in the atmosphere. The analysis is based on statistical physics framework with stochastic modelling, where we represent the observed data as a composition of deterministic and stochastic components estimated from the observed data using Bayesian and wavelet techniques.
Logic Assumptions and Risks Framework Applied to Defence Campaign Planning and Evaluation

DTIC Science & Technology

2013-05-01

based on prescriptive targets of reduction in particular crime statistics in a certain timeframe. Similarly, if overall desired effects are not well...the Evaluation Journal of Australasia, Australasian Evaluation Society. UNCLASSIFIED 21 UNCLASSIFIED DSTO-TR-2840 These six campaign functions...Callahan’s article in “Anecdotally” Newsletter January 2013, Anecdote Pty Ltd., a commercial consultancy specialising in narrative technique for business
A statistical parts-based appearance model of inter-subject variability.

PubMed

Toews, Matthew; Collins, D Louis; Arbel, Tal

2006-01-01

In this article, we present a general statistical parts-based model for representing the appearance of an image set, applied to the problem of inter-subject MR brain image matching. In contrast with global image representations such as active appearance models, the parts-based model consists of a collection of localized image parts whose appearance, geometry and occurrence frequency are quantified statistically. The parts-based approach explicitly addresses the case where one-to-one correspondence does not exist between subjects due to anatomical differences, as parts are not expected to occur in all subjects. The model can be learned automatically, discovering structures that appear with statistical regularity in a large set of subject images, and can be robustly fit to new images, all in the presence of significant inter-subject variability. As parts are derived from generic scale-invariant features, the framework can be applied in a wide variety of image contexts, in order to study the commonality of anatomical parts or to group subjects according to the parts they share. Experimentation shows that a parts-based model can be learned from a large set of MR brain images, and used to determine parts that are common within the group of subjects. Preliminary results indicate that the model can be used to automatically identify distinctive features for inter-subject image registration despite large changes in appearance.
Particle acceleration in solar active regions being in the state of self-organized criticality.

NASA Astrophysics Data System (ADS)

Vlahos, Loukas

We review the recent observational results on flare initiation and particle acceleration in solar active regions. Elaborating a statistical approach to describe the spatiotemporally intermittent electric field structures formed inside a flaring solar active region, we investigate the efficiency of such structures in accelerating charged particles (electrons and protons). The large-scale magnetic configuration in the solar atmosphere responds to the strong turbulent flows that convey perturbations across the active region by initiating avalanche-type processes. The resulting unstable structures correspond to small-scale dissipation regions hosting strong electric fields. Previous research on particle acceleration in strongly turbulent plasmas provides a general framework for addressing such a problem. This framework combines various electromagnetic field configurations obtained by magnetohydrodynamical (MHD) or cellular automata (CA) simulations, or by employing a statistical description of the field’s strength and configuration with test particle simulations. We work on data-driven 3D magnetic field extrapolations, based on a self-organized criticality models (SOC). A relativistic test-particle simulation traces each particle’s guiding center within these configurations. Using the simulated particle-energy distributions we test our results against observations, in the framework of the collisional thick target model (CTTM) of solar hard X-ray (HXR) emission and compare our results with the current observations.
Condensate statistics and thermodynamics of weakly interacting Bose gas: Recursion relation approach

NASA Astrophysics Data System (ADS)

Dorfman, K. E.; Kim, M.; Svidzinsky, A. A.

2011-03-01

We study condensate statistics and thermodynamics of weakly interacting Bose gas with a fixed total number N of particles in a cubic box. We find the exact recursion relation for the canonical ensemble partition function. Using this relation, we calculate the distribution function of condensate particles for N=200. We also calculate the distribution function based on multinomial expansion of the characteristic function. Similar to the ideal gas, both approaches give exact statistical moments for all temperatures in the framework of Bogoliubov model. We compare them with the results of unconstraint canonical ensemble quasiparticle formalism and the hybrid master equation approach. The present recursion relation can be used for any external potential and boundary conditions. We investigate the temperature dependence of the first few statistical moments of condensate fluctuations as well as thermodynamic potentials and heat capacity analytically and numerically in the whole temperature range.
Edwards statistical mechanics for jammed granular matter

NASA Astrophysics Data System (ADS)

Baule, Adrian; Morone, Flaviano; Herrmann, Hans J.; Makse, Hernán A.

2018-01-01

In 1989, Sir Sam Edwards made the visionary proposition to treat jammed granular materials using a volume ensemble of equiprobable jammed states in analogy to thermal equilibrium statistical mechanics, despite their inherent athermal features. Since then, the statistical mechanics approach for jammed matter—one of the very few generalizations of Gibbs-Boltzmann statistical mechanics to out-of-equilibrium matter—has garnered an extraordinary amount of attention by both theorists and experimentalists. Its importance stems from the fact that jammed states of matter are ubiquitous in nature appearing in a broad range of granular and soft materials such as colloids, emulsions, glasses, and biomatter. Indeed, despite being one of the simplest states of matter—primarily governed by the steric interactions between the constitutive particles—a theoretical understanding based on first principles has proved exceedingly challenging. Here a systematic approach to jammed matter based on the Edwards statistical mechanical ensemble is reviewed. The construction of microcanonical and canonical ensembles based on the volume function, which replaces the Hamiltonian in jammed systems, is discussed. The importance of approximation schemes at various levels is emphasized leading to quantitative predictions for ensemble averaged quantities such as packing fractions and contact force distributions. An overview of the phenomenology of jammed states and experiments, simulations, and theoretical models scrutinizing the strong assumptions underlying Edwards approach is given including recent results suggesting the validity of Edwards ergodic hypothesis for jammed states. A theoretical framework for packings whose constitutive particles range from spherical to nonspherical shapes such as dimers, polymers, ellipsoids, spherocylinders or tetrahedra, hard and soft, frictional, frictionless and adhesive, monodisperse, and polydisperse particles in any dimensions is discussed providing insight into a unifying phase diagram for all jammed matter. Furthermore, the connection between the Edwards ensemble of metastable jammed states and metastability in spin glasses is established. This highlights the fact that the packing problem can be understood as a constraint satisfaction problem for excluded volume and force and torque balance leading to a unifying framework between the Edwards ensemble of equiprobable jammed states and out-of-equilibrium spin glasses.
Understanding Statistical Concepts and Terms in Context: The GovStat Ontology and the Statistical Interactive Glossary.

ERIC Educational Resources Information Center

Haas, Stephanie W.; Pattuelli, Maria Cristina; Brown, Ron T.

2003-01-01

Describes the Statistical Interactive Glossary (SIG), an enhanced glossary of statistical terms supported by the GovStat ontology of statistical concepts. Presents a conceptual framework whose components articulate different aspects of a term's basic explanation that can be manipulated to produce a variety of presentations. The overarching…
Active contours on statistical manifolds and texture segmentation

Treesearch

Sang-Mook Lee; A. Lynn Abbott; Neil A. Clark; Philip A. Araman

2005-01-01

A new approach to active contours on statistical manifolds is presented. The statistical manifolds are 2- dimensional Riemannian manifolds that are statistically defined by maps that transform a parameter domain onto a set of probability density functions. In this novel framework, color or texture features are measured at each image point and their statistical...

Active contours on statistical manifolds and texture segmentaiton

Treesearch

Sang-Mook Lee; A. Lynn Abbott; Neil A. Clark; Philip A. Araman

2005-01-01

A new approach to active contours on statistical manifolds is presented. The statistical manifolds are 2- dimensional Riemannian manifolds that are statistically defined by maps that transform a parameter domain onto-a set of probability density functions. In this novel framework, color or texture features are measured at each Image point and their statistical...
Body Composition Assessment in Axial CT Images Using FEM-Based Automatic Segmentation of Skeletal Muscle.

PubMed

Popuri, Karteek; Cobzas, Dana; Esfandiari, Nina; Baracos, Vickie; Jägersand, Martin

2016-02-01

The proportions of muscle and fat tissues in the human body, referred to as body composition is a vital measurement for cancer patients. Body composition has been recently linked to patient survival and the onset/recurrence of several types of cancers in numerous cancer research studies. This paper introduces a fully automatic framework for the segmentation of muscle and fat tissues from CT images to estimate body composition. We developed a novel finite element method (FEM) deformable model that incorporates a priori shape information via a statistical deformation model (SDM) within the template-based segmentation framework. The proposed method was validated on 1000 abdominal and 530 thoracic CT images and we obtained very good segmentation results with Jaccard scores in excess of 90% for both the muscle and fat regions.
Incorporating geologic information into hydraulic tomography: A general framework based on geostatistical approach

NASA Astrophysics Data System (ADS)

Zha, Yuanyuan; Yeh, Tian-Chyi J.; Illman, Walter A.; Onoe, Hironori; Mok, Chin Man W.; Wen, Jet-Chau; Huang, Shao-Yang; Wang, Wenke

2017-04-01

Hydraulic tomography (HT) has become a mature aquifer test technology over the last two decades. It collects nonredundant information of aquifer heterogeneity by sequentially stressing the aquifer at different wells and collecting aquifer responses at other wells during each stress. The collected information is then interpreted by inverse models. Among these models, the geostatistical approaches, built upon the Bayesian framework, first conceptualize hydraulic properties to be estimated as random fields, which are characterized by means and covariance functions. They then use the spatial statistics as prior information with the aquifer response data to estimate the spatial distribution of the hydraulic properties at a site. Since the spatial statistics describe the generic spatial structures of the geologic media at the site rather than site-specific ones (e.g., known spatial distributions of facies, faults, or paleochannels), the estimates are often not optimal. To improve the estimates, we introduce a general statistical framework, which allows the inclusion of site-specific spatial patterns of geologic features. Subsequently, we test this approach with synthetic numerical experiments. Results show that this approach, using conditional mean and covariance that reflect site-specific large-scale geologic features, indeed improves the HT estimates. Afterward, this approach is applied to HT surveys at a kilometer-scale-fractured granite field site with a distinct fault zone. We find that by including fault information from outcrops and boreholes for HT analysis, the estimated hydraulic properties are improved. The improved estimates subsequently lead to better prediction of flow during a different pumping test at the site.
Structured statistical models of inductive reasoning.

PubMed

Kemp, Charles; Tenenbaum, Joshua B

2009-01-01

Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describes [corrected] 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.
BAYESIAN ESTIMATION OF THERMONUCLEAR REACTION RATES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Iliadis, C.; Anderson, K. S.; Coc, A.

The problem of estimating non-resonant astrophysical S -factors and thermonuclear reaction rates, based on measured nuclear cross sections, is of major interest for nuclear energy generation, neutrino physics, and element synthesis. Many different methods have been applied to this problem in the past, almost all of them based on traditional statistics. Bayesian methods, on the other hand, are now in widespread use in the physical sciences. In astronomy, for example, Bayesian statistics is applied to the observation of extrasolar planets, gravitational waves, and Type Ia supernovae. However, nuclear physics, in particular, has been slow to adopt Bayesian methods. We presentmore » astrophysical S -factors and reaction rates based on Bayesian statistics. We develop a framework that incorporates robust parameter estimation, systematic effects, and non-Gaussian uncertainties in a consistent manner. The method is applied to the reactions d(p, γ ){sup 3}He, {sup 3}He({sup 3}He,2p){sup 4}He, and {sup 3}He( α , γ ){sup 7}Be, important for deuterium burning, solar neutrinos, and Big Bang nucleosynthesis.« less
Multiresolution multiscale active mask segmentation of fluorescence microscope images

NASA Astrophysics Data System (ADS)

Srinivasa, Gowri; Fickus, Matthew; Kovačević, Jelena

2009-08-01

We propose an active mask segmentation framework that combines the advantages of statistical modeling, smoothing, speed and flexibility offered by the traditional methods of region-growing, multiscale, multiresolution and active contours respectively. At the crux of this framework is a paradigm shift from evolving contours in the continuous domain to evolving multiple masks in the discrete domain. Thus, the active mask framework is particularly suited to segment digital images. We demonstrate the use of the framework in practice through the segmentation of punctate patterns in fluorescence microscope images. Experiments reveal that statistical modeling helps the multiple masks converge from a random initial configuration to a meaningful one. This obviates the need for an involved initialization procedure germane to most of the traditional methods used to segment fluorescence microscope images. While we provide the mathematical details of the functions used to segment fluorescence microscope images, this is only an instantiation of the active mask framework. We suggest some other instantiations of the framework to segment different types of images.
On the use of attractor dimension as a feature in structural health monitoring

USGS Publications Warehouse

Nichols, J.M.; Virgin, L.N.; Todd, M.D.; Nichols, J.D.

2003-01-01

Recent works in the vibration-based structural health monitoring community have emphasised the use of correlation dimension as a discriminating statistic in seperating a damaged from undamaged response. This paper explores the utility of attractor dimension as a 'feature' and offers some comparisons between different metrics reflecting dimension. This focus is on evaluating the performance of two different measures of dimension as damage indicators in a structural health monitoring context. Results indicate that the correlation dimension is probably a poor choice of statistic for the purpose of signal discrimination. Other measures of dimension may be used for the same purposes with a higher degree of statistical reliability. The question of competing methodologies is placed in a hypothesis testing framework and answered with experimental data taken from a cantilivered beam.
Langley Wind Tunnel Data Quality Assurance-Check Standard Results

NASA Technical Reports Server (NTRS)

Hemsch, Michael J.; Grubb, John P.; Krieger, William B.; Cler, Daniel L.

2000-01-01

A framework for statistical evaluation, control and improvement of wind funnel measurement processes is presented The methodology is adapted from elements of the Measurement Assurance Plans developed by the National Bureau of Standards (now the National Institute of Standards and Technology) for standards and calibration laboratories. The present methodology is based on the notions of statistical quality control (SQC) together with check standard testing and a small number of customer repeat-run sets. The results of check standard and customer repeat-run -sets are analyzed using the statistical control chart-methods of Walter A. Shewhart long familiar to the SQC community. Control chart results are presented for. various measurement processes in five facilities at Langley Research Center. The processes include test section calibration, force and moment measurements with a balance, and instrument calibration.
Maximum entropy models as a tool for building precise neural controls.

PubMed

Savin, Cristina; Tkačik, Gašper

2017-10-01

Neural responses are highly structured, with population activity restricted to a small subset of the astronomical range of possible activity patterns. Characterizing these statistical regularities is important for understanding circuit computation, but challenging in practice. Here we review recent approaches based on the maximum entropy principle used for quantifying collective behavior in neural activity. We highlight recent models that capture population-level statistics of neural data, yielding insights into the organization of the neural code and its biological substrate. Furthermore, the MaxEnt framework provides a general recipe for constructing surrogate ensembles that preserve aspects of the data, but are otherwise maximally unstructured. This idea can be used to generate a hierarchy of controls against which rigorous statistical tests are possible. Copyright © 2017 Elsevier Ltd. All rights reserved.
Sinogram noise reduction for low-dose CT by statistics-based nonlinear filters

NASA Astrophysics Data System (ADS)

Wang, Jing; Lu, Hongbing; Li, Tianfang; Liang, Zhengrong

2005-04-01

Low-dose CT (computed tomography) sinogram data have been shown to be signal-dependent with an analytical relationship between the sample mean and sample variance. Spatially-invariant low-pass linear filters, such as the Butterworth and Hanning filters, could not adequately handle the data noise and statistics-based nonlinear filters may be an alternative choice, in addition to other choices of minimizing cost functions on the noisy data. Anisotropic diffusion filter and nonlinear Gaussian filters chain (NLGC) are two well-known classes of nonlinear filters based on local statistics for the purpose of edge-preserving noise reduction. These two filters can utilize the noise properties of the low-dose CT sinogram for adaptive noise reduction, but can not incorporate signal correlative information for an optimal regularized solution. Our previously-developed Karhunen-Loeve (KL) domain PWLS (penalized weighted least square) minimization considers the signal correlation via the KL strategy and seeks the PWLS cost function minimization for an optimal regularized solution for each KL component, i.e., adaptive to the KL components. This work compared the nonlinear filters with the KL-PWLS framework for low-dose CT application. Furthermore, we investigated the nonlinear filters for post KL-PWLS noise treatment in the sinogram space, where the filters were applied after ramp operation on the KL-PWLS treated sinogram data prior to backprojection operation (for image reconstruction). By both computer simulation and experimental low-dose CT data, the nonlinear filters could not outperform the KL-PWLS framework. The gain of post KL-PWLS edge-preserving noise filtering in the sinogram space is not significant, even the noise has been modulated by the ramp operation.
A game-based platform for crowd-sourcing biomedical image diagnosis and standardized remote training and education of diagnosticians

NASA Astrophysics Data System (ADS)

Feng, Steve; Woo, Minjae; Chandramouli, Krithika; Ozcan, Aydogan

2015-03-01

Over the past decade, crowd-sourcing complex image analysis tasks to a human crowd has emerged as an alternative to energy-inefficient and difficult-to-implement computational approaches. Following this trend, we have developed a mathematical framework for statistically combining human crowd-sourcing of biomedical image analysis and diagnosis through games. Using a web-based smart game (BioGames), we demonstrated this platform's effectiveness for telediagnosis of malaria from microscopic images of individual red blood cells (RBCs). After public release in early 2012 (http://biogames.ee.ucla.edu), more than 3000 gamers (experts and non-experts) used this BioGames platform to diagnose over 2800 distinct RBC images, marking them as positive (infected) or negative (non-infected). Furthermore, we asked expert diagnosticians to tag the same set of cells with labels of positive, negative, or questionable (insufficient information for a reliable diagnosis) and statistically combined their decisions to generate a gold standard malaria image library. Our framework utilized minimally trained gamers' diagnoses to generate a set of statistical labels with an accuracy that is within 98% of our gold standard image library, demonstrating the "wisdom of the crowd". Using the same image library, we have recently launched a web-based malaria training and educational game allowing diagnosticians to compare their performance with their peers. After diagnosing a set of ~500 cells per game, diagnosticians can compare their quantified scores against a leaderboard and view their misdiagnosed cells. Using this platform, we aim to expand our gold standard library with new RBC images and provide a quantified digital tool for measuring and improving diagnostician training globally.
Statistical analysis of hydrological response in urbanising catchments based on adaptive sampling using inter-amount times

NASA Astrophysics Data System (ADS)

ten Veldhuis, Marie-Claire; Schleiss, Marc

2017-04-01

In this study, we introduced an alternative approach for analysis of hydrological flow time series, using an adaptive sampling framework based on inter-amount times (IATs). The main difference with conventional flow time series is the rate at which low and high flows are sampled: the unit of analysis for IATs is a fixed flow amount, instead of a fixed time window. We analysed statistical distributions of flows and IATs across a wide range of sampling scales to investigate sensitivity of statistical properties such as quantiles, variance, skewness, scaling parameters and flashiness indicators to the sampling scale. We did this based on streamflow time series for 17 (semi)urbanised basins in North Carolina, US, ranging from 13 km2 to 238 km2 in size. Results showed that adaptive sampling of flow time series based on inter-amounts leads to a more balanced representation of low flow and peak flow values in the statistical distribution. While conventional sampling gives a lot of weight to low flows, as these are most ubiquitous in flow time series, IAT sampling gives relatively more weight to high flow values, when given flow amounts are accumulated in shorter time. As a consequence, IAT sampling gives more information about the tail of the distribution associated with high flows, while conventional sampling gives relatively more information about low flow periods. We will present results of statistical analyses across a range of subdaily to seasonal scales and will highlight some interesting insights that can be derived from IAT statistics with respect to basin flashiness and impact urbanisation on hydrological response.
General aviation avionics statistics : 1977.

DOT National Transportation Integrated Search

1980-06-01

This report presents avionics statistics for the 1977 general aviation (GA) aircraft fleet and is the fourth in a series. The statistics are presented in a capability group framework which enables one to relate airborne avionics equipment to the capa...
A Statistical Framework for Protein Quantitation in Bottom-Up MS-Based Proteomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karpievitch, Yuliya; Stanley, Jeffrey R.; Taverner, Thomas

2009-08-15

Motivation: Quantitative mass spectrometry-based proteomics requires protein-level estimates and associated confidence measures. Challenges include the presence of low quality or incorrectly identified peptides and informative missingness. Furthermore, models are required for rolling peptide-level information up to the protein level. Results: We present a statistical model that carefully accounts for informative missingness in peak intensities and allows unbiased, model-based, protein-level estimation and inference. The model is applicable to both label-based and label-free quantitation experiments. We also provide automated, model-based, algorithms for filtering of proteins and peptides as well as imputation of missing values. Two LC/MS datasets are used to illustrate themore » methods. In simulation studies, our methods are shown to achieve substantially more discoveries than standard alternatives. Availability: The software has been made available in the opensource proteomics platform DAnTE (http://omics.pnl.gov/software/). Contact: adabney@stat.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online.« less
Automatic media-adventitia IVUS image segmentation based on sparse representation framework and dynamic directional active contour model.

PubMed

Zakeri, Fahimeh Sadat; Setarehdan, Seyed Kamaledin; Norouzi, Somayye

2017-10-01

Segmentation of the arterial wall boundaries from intravascular ultrasound images is an important image processing task in order to quantify arterial wall characteristics such as shape, area, thickness and eccentricity. Since manual segmentation of these boundaries is a laborious and time consuming procedure, many researchers attempted to develop (semi-) automatic segmentation techniques as a powerful tool for educational and clinical purposes in the past but as yet there is no any clinically approved method in the market. This paper presents a deterministic-statistical strategy for automatic media-adventitia border detection by a fourfold algorithm. First, a smoothed initial contour is extracted based on the classification in the sparse representation framework which is combined with the dynamic directional convolution vector field. Next, an active contour model is utilized for the propagation of the initial contour toward the interested borders. Finally, the extracted contour is refined in the leakage, side branch openings and calcification regions based on the image texture patterns. The performance of the proposed algorithm is evaluated by comparing the results to those manually traced borders by an expert on 312 different IVUS images obtained from four different patients. The statistical analysis of the results demonstrates the efficiency of the proposed method in the media-adventitia border detection with enough consistency in the leakage and calcification regions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.

PubMed

Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal

2017-01-01

Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
The Development of Design Tools for Fault Tolerant Quantum Dot Cellular Automata Based Logic

NASA Technical Reports Server (NTRS)

Armstrong, Curtis D.; Humphreys, William M.

2003-01-01

We are developing software to explore the fault tolerance of quantum dot cellular automata gate architectures in the presence of manufacturing variations and device defects. The Topology Optimization Methodology using Applied Statistics (TOMAS) framework extends the capabilities of the A Quantum Interconnected Network Array Simulator (AQUINAS) by adding front-end and back-end software and creating an environment that integrates all of these components. The front-end tools establish all simulation parameters, configure the simulation system, automate the Monte Carlo generation of simulation files, and execute the simulation of these files. The back-end tools perform automated data parsing, statistical analysis and report generation.
A new statistical method for transfer coefficient calculations in the framework of the general multiple-compartment model of transport for radionuclides in biological systems.

PubMed

Garcia, F; Arruda-Neto, J D; Manso, M V; Helene, O M; Vanin, V R; Rodriguez, O; Mesa, J; Likhachev, V P; Filho, J W; Deppman, A; Perez, G; Guzman, F; de Camargo, S P

1999-10-01

A new and simple statistical procedure (STATFLUX) for the calculation of transfer coefficients of radionuclide transport to animals and plants is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. By using experimentally available curves of radionuclide concentrations versus time, for each animal compartment (organs), flow parameters were estimated by employing a least-squares procedure, whose consistency is tested. Some numerical results are presented in order to compare the STATFLUX transfer coefficients with those from other works and experimental data.
A quantum framework for likelihood ratios

NASA Astrophysics Data System (ADS)

Bond, Rachael L.; He, Yang-Hui; Ormerod, Thomas C.

The ability to calculate precise likelihood ratios is fundamental to science, from Quantum Information Theory through to Quantum State Estimation. However, there is no assumption-free statistical methodology to achieve this. For instance, in the absence of data relating to covariate overlap, the widely used Bayes’ theorem either defaults to the marginal probability driven “naive Bayes’ classifier”, or requires the use of compensatory expectation-maximization techniques. This paper takes an information-theoretic approach in developing a new statistical formula for the calculation of likelihood ratios based on the principles of quantum entanglement, and demonstrates that Bayes’ theorem is a special case of a more general quantum mechanical expression.
Studying Weather and Climate Extremes in a Non-stationary Framework

NASA Astrophysics Data System (ADS)

Wu, Z.

2010-12-01

The study of weather and climate extremes often uses the theory of extreme values. Such a detection method has a major problem: to obtain the probability distribution of extremes, one has to implicitly assume the Earth’s climate is stationary over a long period within which the climatology is defined. While such detection makes some sense in a purely statistical view of stationary processes, it can lead to misleading statistical properties of weather and climate extremes caused by long term climate variability and change, and may also cause enormous difficulty in attributing and predicting these extremes. To alleviate this problem, here we report a novel non-stationary framework for studying weather and climate extremes in a non-stationary framework. In this new framework, the weather and climate extremes will be defined as timescale-dependent quantities derived from the anomalies with respect to non-stationary climatologies of different timescales. With this non-stationary framework, the non-stationary and nonlinear nature of climate system will be taken into account; and the attribution and the prediction of weather and climate extremes can then be separated into 1) the change of the statistical properties of the weather and climate extremes themselves and 2) the background climate variability and change. The new non-stationary framework will use the ensemble empirical mode decomposition (EEMD) method, which is a recent major improvement of the Hilbert-Huang Transform for time-frequency analysis. Using this tool, we will adaptively decompose various weather and climate data from observation and climate models in terms of the components of the various natural timescales contained in the data. With such decompositions, the non-stationary statistical properties (both spatial and temporal) of weather and climate anomalies and of their corresponding climatologies will be analyzed and documented.

A statistical framework for protein quantitation in bottom-up MS-based proteomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karpievitch, Yuliya; Stanley, Jeffrey R.; Taverner, Thomas

2009-08-15

ABSTRACT Motivation: Quantitative mass spectrometry-based proteomics requires protein-level estimates and confidence measures. Challenges include the presence of low-quality or incorrectly identified peptides and widespread, informative, missing data. Furthermore, models are required for rolling peptide-level information up to the protein level. Results: We present a statistical model for protein abundance in terms of peptide peak intensities, applicable to both label-based and label-free quantitation experiments. The model allows for both random and censoring missingness mechanisms and provides naturally for protein-level estimates and confidence measures. The model is also used to derive automated filtering and imputation routines. Three LC-MS datasets are used tomore » illustrate the methods. Availability: The software has been made available in the open-source proteomics platform DAnTE (Polpitiya et al. (2008)) (http://omics.pnl.gov/software/). Contact: adabney@stat.tamu.edu« less
Risk assessment of vector-borne diseases for public health governance.

PubMed

Sedda, L; Morley, D W; Braks, M A H; De Simone, L; Benz, D; Rogers, D J

2014-12-01

In the context of public health, risk governance (or risk analysis) is a framework for the assessment and subsequent management and/or control of the danger posed by an identified disease threat. Generic frameworks in which to carry out risk assessment have been developed by various agencies. These include monitoring, data collection, statistical analysis and dissemination. Due to the inherent complexity of disease systems, however, the generic approach must be modified for individual, disease-specific risk assessment frameworks. The analysis was based on the review of the current risk assessments of vector-borne diseases adopted by the main Public Health organisations (OIE, WHO, ECDC, FAO, CDC etc…). Literature, legislation and statistical assessment of the risk analysis frameworks. This review outlines the need for the development of a general public health risk assessment method for vector-borne diseases, in order to guarantee that sufficient information is gathered to apply robust models of risk assessment. Stochastic (especially spatial) methods, often in Bayesian frameworks are now gaining prominence in standard risk assessment procedures because of their ability to assess accurately model uncertainties. Risk assessment needs to be addressed quantitatively wherever possible, and submitted with its quality assessment in order to enable successful public health measures to be adopted. In terms of current practice, often a series of different models and analyses are applied to the same problem, with results and outcomes that are difficult to compare because of the unknown model and data uncertainties. Therefore, the risk assessment areas in need of further research are identified in this article. Copyright © 2014 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Sparse approximation of currents for statistics on curves and surfaces.

PubMed

Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas

2008-01-01

Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.
A New Adaptive Framework for Collaborative Filtering Prediction

PubMed Central

Almosallam, Ibrahim A.; Shang, Yi

2010-01-01

Collaborative filtering is one of the most successful techniques for recommendation systems and has been used in many commercial services provided by major companies including Amazon, TiVo and Netflix. In this paper we focus on memory-based collaborative filtering (CF). Existing CF techniques work well on dense data but poorly on sparse data. To address this weakness, we propose to use z-scores instead of explicit ratings and introduce a mechanism that adaptively combines global statistics with item-based values based on data density level. We present a new adaptive framework that encapsulates various CF algorithms and the relationships among them. An adaptive CF predictor is developed that can self adapt from user-based to item-based to hybrid methods based on the amount of available ratings. Our experimental results show that the new predictor consistently obtained more accurate predictions than existing CF methods, with the most significant improvement on sparse data sets. When applied to the Netflix Challenge data set, our method performed better than existing CF and singular value decomposition (SVD) methods and achieved 4.67% improvement over Netflix’s system. PMID:21572924
A New Adaptive Framework for Collaborative Filtering Prediction.

PubMed

Almosallam, Ibrahim A; Shang, Yi

2008-06-01

Collaborative filtering is one of the most successful techniques for recommendation systems and has been used in many commercial services provided by major companies including Amazon, TiVo and Netflix. In this paper we focus on memory-based collaborative filtering (CF). Existing CF techniques work well on dense data but poorly on sparse data. To address this weakness, we propose to use z-scores instead of explicit ratings and introduce a mechanism that adaptively combines global statistics with item-based values based on data density level. We present a new adaptive framework that encapsulates various CF algorithms and the relationships among them. An adaptive CF predictor is developed that can self adapt from user-based to item-based to hybrid methods based on the amount of available ratings. Our experimental results show that the new predictor consistently obtained more accurate predictions than existing CF methods, with the most significant improvement on sparse data sets. When applied to the Netflix Challenge data set, our method performed better than existing CF and singular value decomposition (SVD) methods and achieved 4.67% improvement over Netflix's system.
Word-level language modeling for P300 spellers based on discriminative graphical models

NASA Astrophysics Data System (ADS)

Delgado Saa, Jaime F.; de Pesters, Adriana; McFarland, Dennis; Çetin, Müjdat

2015-04-01

Objective. In this work we propose a probabilistic graphical model framework that uses language priors at the level of words as a mechanism to increase the performance of P300-based spellers. Approach. This paper is concerned with brain-computer interfaces based on P300 spellers. Motivated by P300 spelling scenarios involving communication based on a limited vocabulary, we propose a probabilistic graphical model framework and an associated classification algorithm that uses learned statistical models of language at the level of words. Exploiting such high-level contextual information helps reduce the error rate of the speller. Main results. Our experimental results demonstrate that the proposed approach offers several advantages over existing methods. Most importantly, it increases the classification accuracy while reducing the number of times the letters need to be flashed, increasing the communication rate of the system. Significance. The proposed approach models all the variables in the P300 speller in a unified framework and has the capability to correct errors in previous letters in a word, given the data for the current one. The structure of the model we propose allows the use of efficient inference algorithms, which in turn makes it possible to use this approach in real-time applications.
Robust approaches to quantification of margin and uncertainty for sparse data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hund, Lauren; Schroeder, Benjamin B.; Rumsey, Kelin

Characterizing the tails of probability distributions plays a key role in quantification of margins and uncertainties (QMU), where the goal is characterization of low probability, high consequence events based on continuous measures of performance. When data are collected using physical experimentation, probability distributions are typically fit using statistical methods based on the collected data, and these parametric distributional assumptions are often used to extrapolate about the extreme tail behavior of the underlying probability distribution. In this project, we character- ize the risk associated with such tail extrapolation. Specifically, we conducted a scaling study to demonstrate the large magnitude of themore » risk; then, we developed new methods for communicat- ing risk associated with tail extrapolation from unvalidated statistical models; lastly, we proposed a Bayesian data-integration framework to mitigate tail extrapolation risk through integrating ad- ditional information. We conclude that decision-making using QMU is a complex process that cannot be achieved using statistical analyses alone.« less
Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model.

PubMed

Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A

2011-10-01

Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Breast cancer statistics and prediction methodology: a systematic review and analysis.

PubMed

Dubey, Ashutosh Kumar; Gupta, Umesh; Jain, Sonal

2015-01-01

Breast cancer is a menacing cancer, primarily affecting women. Continuous research is going on for detecting breast cancer in the early stage as the possibility of cure in early stages is bright. There are two main objectives of this current study, first establish statistics for breast cancer and second to find methodologies which can be helpful in the early stage detection of the breast cancer based on previous studies. The breast cancer statistics for incidence and mortality of the UK, US, India and Egypt were considered for this study. The finding of this study proved that the overall mortality rates of the UK and US have been improved because of awareness, improved medical technology and screening, but in case of India and Egypt the condition is less positive because of lack of awareness. The methodological findings of this study suggest a combined framework based on data mining and evolutionary algorithms. It provides a strong bridge in improving the classification and detection accuracy of breast cancer data.
Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript

PubMed Central

Amancio, Diego R.; Altmann, Eduardo G.; Rybski, Diego; Oliveira, Osvaldo N.; Costa, Luciano da F.

2013-01-01

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications. PMID:23844002
Probing the statistical properties of unknown texts: application to the Voynich Manuscript.

PubMed

Amancio, Diego R; Altmann, Eduardo G; Rybski, Diego; Oliveira, Osvaldo N; Costa, Luciano da F

2013-01-01

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
Recognition of pornographic web pages by classifying texts and images.

PubMed

Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

2007-06-01

With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
RGG: A general GUI Framework for R scripts

PubMed Central

Visne, Ilhami; Dilaveroglu, Erkan; Vierlinger, Klemens; Lauss, Martin; Yildiz, Ahmet; Weinhaeusel, Andreas; Noehammer, Christa; Leisch, Friedrich; Kriegner, Albert

2009-01-01

Background R is the leading open source statistics software with a vast number of biostatistical and bioinformatical analysis packages. To exploit the advantages of R, extensive scripting/programming skills are required. Results We have developed a software tool called R GUI Generator (RGG) which enables the easy generation of Graphical User Interfaces (GUIs) for the programming language R by adding a few Extensible Markup Language (XML) – tags. RGG consists of an XML-based GUI definition language and a Java-based GUI engine. GUIs are generated in runtime from defined GUI tags that are embedded into the R script. User-GUI input is returned to the R code and replaces the XML-tags. RGG files can be developed using any text editor. The current version of RGG is available as a stand-alone software (RGGRunner) and as a plug-in for JGR. Conclusion RGG is a general GUI framework for R that has the potential to introduce R statistics (R packages, built-in functions and scripts) to users with limited programming skills and helps to bridge the gap between R developers and GUI-dependent users. RGG aims to abstract the GUI development from individual GUI toolkits by using an XML-based GUI definition language. Thus RGG can be easily integrated in any software. The RGG project further includes the development of a web-based repository for RGG-GUIs. RGG is an open source project licensed under the Lesser General Public License (LGPL) and can be downloaded freely at PMID:19254356
Statistical model based iterative reconstruction in clinical CT systems. Part III. Task-based kV/mAs optimization for radiation dose reduction

PubMed Central

Li, Ke; Gomez-Cardona, Daniel; Hsieh, Jiang; Lubner, Meghan G.; Pickhardt, Perry J.; Chen, Guang-Hong

2015-01-01

Purpose: For a given imaging task and patient size, the optimal selection of x-ray tube potential (kV) and tube current-rotation time product (mAs) is pivotal in achieving the maximal radiation dose reduction while maintaining the needed diagnostic performance. Although contrast-to-noise (CNR)-based strategies can be used to optimize kV/mAs for computed tomography (CT) imaging systems employing the linear filtered backprojection (FBP) reconstruction method, a more general framework needs to be developed for systems using the nonlinear statistical model-based iterative reconstruction (MBIR) method. The purpose of this paper is to present such a unified framework for the optimization of kV/mAs selection for both FBP- and MBIR-based CT systems. Methods: The optimal selection of kV and mAs was formulated as a constrained optimization problem to minimize the objective function, Dose(kV,mAs), under the constraint that the achievable detectability index d′(kV,mAs) is not lower than the prescribed value of d℞′ for a given imaging task. Since it is difficult to analytically model the dependence of d′ on kV and mAs for the highly nonlinear MBIR method, this constrained optimization problem is solved with comprehensive measurements of Dose(kV,mAs) and d′(kV,mAs) at a variety of kV–mAs combinations, after which the overlay of the dose contours and d′ contours is used to graphically determine the optimal kV–mAs combination to achieve the lowest dose while maintaining the needed detectability for the given imaging task. As an example, d′ for a 17 mm hypoattenuating liver lesion detection task was experimentally measured with an anthropomorphic abdominal phantom at four tube potentials (80, 100, 120, and 140 kV) and fifteen mA levels (25 and 50–700) with a sampling interval of 50 mA at a fixed rotation time of 0.5 s, which corresponded to a dose (CTDIvol) range of [0.6, 70] mGy. Using the proposed method, the optimal kV and mA that minimized dose for the prescribed detectability level of d℞′=16 were determined. As another example, the optimal kV and mA for an 8 mm hyperattenuating liver lesion detection task were also measured using the developed framework. Both an in vivo animal and human subject study were used as demonstrations of how the developed framework can be applied to the clinical work flow. Results: For the first task, the optimal kV and mAs were measured to be 100 and 500, respectively, for FBP, which corresponded to a dose level of 24 mGy. In comparison, the optimal kV and mAs for MBIR were 80 and 150, respectively, which corresponded to a dose level of 4 mGy. The topographies of the iso-d′ map and the iso-CNR map were the same for FBP; thus, the use of d′- and CNR-based optimization methods generated the same results for FBP. However, the topographies of the iso-d′ and iso-CNR map were significantly different in MBIR; the CNR-based method overestimated the performance of MBIR, predicting an overly aggressive dose reduction factor. For the second task, the developed framework generated the following optimization results: for FBP, kV = 140, mA = 350, dose = 37.5 mGy; for MBIR, kV = 120, mA = 250, dose = 18.8 mGy. Again, the CNR-based method overestimated the performance of MBIR. Results of the preliminary in vivo studies were consistent with those of the phantom experiments. Conclusions: A unified and task-driven kV/mAs optimization framework has been developed in this work. The framework is applicable to both linear and nonlinear CT systems such as those using the MBIR method. As expected, the developed framework can be reduced to the conventional CNR-based kV/mAs optimization frameworks if the system is linear. For MBIR-based nonlinear CT systems, however, the developed task-based kV/mAs optimization framework is needed to achieve the maximal dose reduction while maintaining the desired diagnostic performance. PMID:26328971
Genetic Algorithm Based Framework for Automation of Stochastic Modeling of Multi-Season Streamflows

NASA Astrophysics Data System (ADS)

Srivastav, R. K.; Srinivasan, K.; Sudheer, K.

2009-05-01

Synthetic streamflow data generation involves the synthesis of likely streamflow patterns that are statistically indistinguishable from the observed streamflow data. The various kinds of stochastic models adopted for multi-season streamflow generation in hydrology are: i) parametric models which hypothesize the form of the periodic dependence structure and the distributional form a priori (examples are PAR, PARMA); disaggregation models that aim to preserve the correlation structure at the periodic level and the aggregated annual level; ii) Nonparametric models (examples are bootstrap/kernel based methods), which characterize the laws of chance, describing the stream flow process, without recourse to prior assumptions as to the form or structure of these laws; (k-nearest neighbor (k-NN), matched block bootstrap (MABB)); non-parametric disaggregation model. iii) Hybrid models which blend both parametric and non-parametric models advantageously to model the streamflows effectively. Despite many of these developments that have taken place in the field of stochastic modeling of streamflows over the last four decades, accurate prediction of the storage and the critical drought characteristics has been posing a persistent challenge to the stochastic modeler. This is partly because, usually, the stochastic streamflow model parameters are estimated by minimizing a statistically based objective function (such as maximum likelihood (MLE) or least squares (LS) estimation) and subsequently the efficacy of the models is being validated based on the accuracy of prediction of the estimates of the water-use characteristics, which requires large number of trial simulations and inspection of many plots and tables. Still accurate prediction of the storage and the critical drought characteristics may not be ensured. In this study a multi-objective optimization framework is proposed to find the optimal hybrid model (blend of a simple parametric model, PAR(1) model and matched block bootstrap (MABB) ) based on the explicit objective functions of minimizing the relative bias and relative root mean square error in estimating the storage capacity of the reservoir. The optimal parameter set of the hybrid model is obtained based on the search over a multi- dimensional parameter space (involving simultaneous exploration of the parametric (PAR(1)) as well as the non-parametric (MABB) components). This is achieved using the efficient evolutionary search based optimization tool namely, non-dominated sorting genetic algorithm - II (NSGA-II). This approach helps in reducing the drudgery involved in the process of manual selection of the hybrid model, in addition to predicting the basic summary statistics dependence structure, marginal distribution and water-use characteristics accurately. The proposed optimization framework is used to model the multi-season streamflows of River Beaver and River Weber of USA. In case of both the rivers, the proposed GA-based hybrid model yields a much better prediction of the storage capacity (where simultaneous exploration of both parametric and non-parametric components is done) when compared with the MLE-based hybrid models (where the hybrid model selection is done in two stages, thus probably resulting in a sub-optimal model). This framework can be further extended to include different linear/non-linear hybrid stochastic models at other temporal and spatial scales as well.
A quasi-likelihood approach to non-negative matrix factorization

PubMed Central

Devarajan, Karthik; Cheung, Vincent C.K.

2017-01-01

A unified approach to non-negative matrix factorization based on the theory of generalized linear models is proposed. This approach embeds a variety of statistical models, including the exponential family, within a single theoretical framework and provides a unified view of such factorizations from the perspective of quasi-likelihood. Using this framework, a family of algorithms for handling signal-dependent noise is developed and its convergence proven using the Expectation-Maximization algorithm. In addition, a measure to evaluate the goodness-of-fit of the resulting factorization is described. The proposed methods allow modeling of non-linear effects via appropriate link functions and are illustrated using an application in biomedical signal processing. PMID:27348511
Geostatistical estimation of forest biomass in interior Alaska combining Landsat-derived tree cover, sampled airborne lidar and field observations

NASA Astrophysics Data System (ADS)

Babcock, Chad; Finley, Andrew O.; Andersen, Hans-Erik; Pattison, Robert; Cook, Bruce D.; Morton, Douglas C.; Alonzo, Michael; Nelson, Ross; Gregoire, Timothy; Ene, Liviu; Gobakken, Terje; Næsset, Erik

2018-06-01

The goal of this research was to develop and examine the performance of a geostatistical coregionalization modeling approach for combining field inventory measurements, strip samples of airborne lidar and Landsat-based remote sensing data products to predict aboveground biomass (AGB) in interior Alaska's Tanana Valley. The proposed modeling strategy facilitates pixel-level mapping of AGB density predictions across the entire spatial domain. Additionally, the coregionalization framework allows for statistically sound estimation of total AGB for arbitrary areal units within the study area---a key advance to support diverse management objectives in interior Alaska. This research focuses on appropriate characterization of prediction uncertainty in the form of posterior predictive coverage intervals and standard deviations. Using the framework detailed here, it is possible to quantify estimation uncertainty for any spatial extent, ranging from pixel-level predictions of AGB density to estimates of AGB stocks for the full domain. The lidar-informed coregionalization models consistently outperformed their counterpart lidar-free models in terms of point-level predictive performance and total AGB precision. Additionally, the inclusion of Landsat-derived forest cover as a covariate further improved estimation precision in regions with lower lidar sampling intensity. Our findings also demonstrate that model-based approaches that do not explicitly account for residual spatial dependence can grossly underestimate uncertainty, resulting in falsely precise estimates of AGB. On the other hand, in a geostatistical setting, residual spatial structure can be modeled within a Bayesian hierarchical framework to obtain statistically defensible assessments of uncertainty for AGB estimates.
EHR-based phenotyping: Bulk learning and evaluation.

PubMed

Chiu, Po-Hsiang; Hripcsak, George

2017-06-01

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set. Copyright © 2017 Elsevier Inc. All rights reserved.
Relationships among Classical Test Theory and Item Response Theory Frameworks via Factor Analytic Models

ERIC Educational Resources Information Center

Kohli, Nidhi; Koran, Jennifer; Henn, Lisa

2015-01-01

There are well-defined theoretical differences between the classical test theory (CTT) and item response theory (IRT) frameworks. It is understood that in the CTT framework, person and item statistics are test- and sample-dependent. This is not the perception with IRT. For this reason, the IRT framework is considered to be theoretically superior…
Method to characterize directional changes in Arctic sea ice drift and associated deformation due to synoptic atmospheric variations using Lagrangian dispersion statistics

NASA Astrophysics Data System (ADS)

Lukovich, Jennifer V.; Geiger, Cathleen A.; Barber, David G.

2017-07-01

A framework is developed to assess the directional changes in sea ice drift paths and associated deformation processes in response to atmospheric forcing. The framework is based on Lagrangian statistical analyses leveraging particle dispersion theory which tells us whether ice drift is in a subdiffusive, diffusive, ballistic, or superdiffusive dynamical regime using single-particle (absolute) dispersion statistics. In terms of sea ice deformation, the framework uses two- and three-particle dispersion to characterize along- and across-shear transport as well as differential kinematic parameters. The approach is tested with GPS beacons deployed in triplets on sea ice in the southern Beaufort Sea at varying distances from the coastline in fall of 2009 with eight individual events characterized. One transition in particular follows the sea level pressure (SLP) high on 8 October in 2009 while the sea ice drift was in a superdiffusive dynamic regime. In this case, the dispersion scaling exponent (which is a slope between single-particle absolute dispersion of sea ice drift and elapsed time) changed from superdiffusive (α ˜ 3) to ballistic (α ˜ 2) as the SLP was rounding its maximum pressure value. Following this shift between regimes, there was a loss in synchronicity between sea ice drift and atmospheric motion patterns. While this is only one case study, the outcomes suggest similar studies be conducted on more buoy arrays to test momentum transfer linkages between storms and sea ice responses as a function of dispersion regime states using scaling exponents. The tools and framework developed in this study provide a unique characterization technique to evaluate these states with respect to sea ice processes in general. Application of these techniques can aid ice hazard assessments and weather forecasting in support of marine transportation and indigenous use of near-shore Arctic areas.

A novel bi-level meta-analysis approach: applied to biological pathway analysis.

PubMed

Nguyen, Tin; Tagett, Rebecca; Donato, Michele; Mitrea, Cristina; Draghici, Sorin

2016-02-01

The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fisher's, Stouffer's, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study. We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fisher's, Stouffer's and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimer's disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis. The R scripts are available on demand from the authors. sorin@wayne.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Quality Evaluation of Zirconium Dioxide Frameworks Produced in Five Dental Laboratories from Different Countries.

PubMed

Schneebeli, Esther; Brägger, Urs; Scherrer, Susanne S; Keller, Andrea; Wittneben, Julia G; Hicklin, Stefan P

2017-07-01

The aim of this study was to assess and compare quality as well as economic aspects of CAD/CAM high strength ceramic three-unit FDP frameworks ordered from dental laboratories located in emerging countries and Switzerland. The master casts of six cases were sent to five dental laboratories located in Thailand (Bangkok), China (Peking and Shenzhen), Turkey (Izmir), and Switzerland (Bern). Each laboratory was using a different CAD/CAM system. The clinical fit of the frameworks was qualitatively assessed, and the thickness of the framework material, the connector height, the width, and the diameter were evaluated using a measuring sensor. The analysis of the internal fit of the frameworks was performed by means of a replica technique, whereas the inner and outer surfaces of the frameworks were evaluated for traces of postprocessing and damage to the intaglio surface with light and electronic microscopes. Groups (dental laboratories and cases) were compared for statistically significant differences using Mann-Whitney U-tests after Bonferroni correction. An acceptable clinical fit was found at 97.9% of the margins produced in laboratory E, 87.5% in B, 93.7% in C, 79.2% in A, and 62.5% in D. The mean framework thicknesses were not statistically significantly different for the premolar regions; however, for the molar area 4/8 of the evaluated sites were statistically significantly different. Circumference, surface, and width of the connectors produced in the different laboratories were statistically significantly different but not the height. There were great differences in the designs for the pontic and connector regions, and some of the frameworks would not be recommended for clinical use. Traces of heavy postprocessing were found in frameworks from some of the laboratories. The prices per framework ranged from US$177 to US$896. By ordering laboratory work in developing countries, a considerable price reduction was obtained compared to the price level in Switzerland. Despite the use of the standardized CAD/CAM chains of production in all laboratories, a large variability in the quality aspects, such as clinical marginal fit, connector and pontic design, as well as postprocessing traces was noted. Recommended sound handling of postprocessing was not applied in all laboratories. Dentists should be aware of the true and factitious advantages of CAD/CAM production chains and not lose control over the process. © 2015 by the American College of Prosthodontists.
Crime Scenes and Mystery Players! Using Driving Questions to Support the Development of Statistical Literacy

ERIC Educational Resources Information Center

Leavy, Aisling; Hourigan, Mairead

2016-01-01

We argue that the development of statistical literacy is greatly supported by engaging students in carrying out statistical investigations. We describe the use of driving questions and interesting contexts to motivate two statistical investigations. The PPDAC cycle is use as an organizing framework to support the process statistical investigation.
Intelligent Systems Approaches to Product Sound Quality Analysis

NASA Astrophysics Data System (ADS)

Pietila, Glenn M.

As a product market becomes more competitive, consumers become more discriminating in the way in which they differentiate between engineered products. The consumer often makes a purchasing decision based on the sound emitted from the product during operation by using the sound to judge quality or annoyance. Therefore, in recent years, many sound quality analysis tools have been developed to evaluate the consumer preference as it relates to a product sound and to quantify this preference based on objective measurements. This understanding can be used to direct a product design process in order to help differentiate the product from competitive products or to establish an impression on consumers regarding a product's quality or robustness. The sound quality process is typically a statistical tool that is used to model subjective preference, or merit score, based on objective measurements, or metrics. In this way, new product developments can be evaluated in an objective manner without the laborious process of gathering a sample population of consumers for subjective studies each time. The most common model used today is the Multiple Linear Regression (MLR), although recently non-linear Artificial Neural Network (ANN) approaches are gaining popularity. This dissertation will review publicly available published literature and present additional intelligent systems approaches that can be used to improve on the current sound quality process. The focus of this work is to address shortcomings in the current paired comparison approach to sound quality analysis. This research will propose a framework for an adaptive jury analysis approach as an alternative to the current Bradley-Terry model. The adaptive jury framework uses statistical hypothesis testing to focus on sound pairings that are most interesting and is expected to address some of the restrictions required by the Bradley-Terry model. It will also provide a more amicable framework for an intelligent systems approach. Next, an unsupervised jury clustering algorithm is used to identify and classify subgroups within a jury who have conflicting preferences. In addition, a nested Artificial Neural Network (ANN) architecture is developed to predict subjective preference based on objective sound quality metrics, in the presence of non-linear preferences. Finally, statistical decomposition and correlation algorithms are reviewed that can help an analyst establish a clear understanding of the variability of the product sounds used as inputs into the jury study and to identify correlations between preference scores and sound quality metrics in the presence of non-linearities.
An online tool for Operational Probabilistic Drought Forecasting System (OPDFS): a Statistical-Dynamical Framework

NASA Astrophysics Data System (ADS)

Zarekarizi, M.; Moradkhani, H.; Yan, H.

2017-12-01

The Operational Probabilistic Drought Forecasting System (OPDFS) is an online tool recently developed at Portland State University for operational agricultural drought forecasting. This is an integrated statistical-dynamical framework issuing probabilistic drought forecasts monthly for the lead times of 1, 2, and 3 months. The statistical drought forecasting method utilizes copula functions in order to condition the future soil moisture values on the antecedent states. Due to stochastic nature of land surface properties, the antecedent soil moisture states are uncertain; therefore, data assimilation system based on Particle Filtering (PF) is employed to quantify the uncertainties associated with the initial condition of the land state, i.e. soil moisture. PF assimilates the satellite soil moisture data to Variable Infiltration Capacity (VIC) land surface model and ultimately updates the simulated soil moisture. The OPDFS builds on the NOAA's seasonal drought outlook by offering drought probabilities instead of qualitative ordinal categories and provides the user with the probability maps associated with a particular drought category. A retrospective assessment of the OPDFS showed that the forecasting of the 2012 Great Plains and 2014 California droughts were possible at least one month in advance. The OPDFS offers a timely assistance to water managers, stakeholders and decision-makers to develop resilience against uncertain upcoming droughts.
A statistical framework for applying RNA profiling to chemical hazard detection.

PubMed

Kostich, Mitchell S

2017-12-01

Use of 'omics technologies in environmental science is expanding. However, application is mostly restricted to characterizing molecular steps leading from toxicant interaction with molecular receptors to apical endpoints in laboratory species. Use in environmental decision-making is limited, due to difficulty in elucidating mechanisms in sufficient detail to make quantitative outcome predictions in any single species or in extending predictions to aquatic communities. Here we introduce a mechanism-agnostic statistical approach, supplementing mechanistic investigation by allowing probabilistic outcome prediction even when understanding of molecular pathways is limited, and facilitating extrapolation from results in laboratory test species to predictions about aquatic communities. We use concepts familiar to environmental managers, supplemented with techniques employed for clinical interpretation of 'omics-based biomedical tests. We describe the framework in step-wise fashion, beginning with single test replicates of a single RNA variant, then extending to multi-gene RNA profiling, collections of test replicates, and integration of complementary data. In order to simplify the presentation, we focus on using RNA profiling for distinguishing presence versus absence of chemical hazards, but the principles discussed can be extended to other types of 'omics measurements, multi-class problems, and regression. We include a supplemental file demonstrating many of the concepts using the open source R statistical package. Published by Elsevier Ltd.
Elements of an integrated health monitoring framework

NASA Astrophysics Data System (ADS)

Fraser, Michael; Elgamal, Ahmed; Conte, Joel P.; Masri, Sami; Fountain, Tony; Gupta, Amarnath; Trivedi, Mohan; El Zarki, Magda

2003-07-01

Internet technologies are increasingly facilitating real-time monitoring of Bridges and Highways. The advances in wireless communications for instance, are allowing practical deployments for large extended systems. Sensor data, including video signals, can be used for long-term condition assessment, traffic-load regulation, emergency response, and seismic safety applications. Computer-based automated signal-analysis algorithms routinely process the incoming data and determine anomalies based on pre-defined response thresholds and more involved signal analysis techniques. Upon authentication, appropriate action may be authorized for maintenance, early warning, and/or emergency response. In such a strategy, data from thousands of sensors can be analyzed with near real-time and long-term assessment and decision-making implications. Addressing the above, a flexible and scalable (e.g., for an entire Highway system, or portfolio of Networked Civil Infrastructure) software architecture/framework is being developed and implemented. This framework will network and integrate real-time heterogeneous sensor data, database and archiving systems, computer vision, data analysis and interpretation, physics-based numerical simulation of complex structural systems, visualization, reliability & risk analysis, and rational statistical decision-making procedures. Thus, within this framework, data is converted into information, information into knowledge, and knowledge into decision at the end of the pipeline. Such a decision-support system contributes to the vitality of our economy, as rehabilitation, renewal, replacement, and/or maintenance of this infrastructure are estimated to require expenditures in the Trillion-dollar range nationwide, including issues of Homeland security and natural disaster mitigation. A pilot website (http://bridge.ucsd.edu/compositedeck.html) currently depicts some basic elements of the envisioned integrated health monitoring analysis framework.
Stochastic Analysis and Design of Heterogeneous Microstructural Materials System

NASA Astrophysics Data System (ADS)

Xu, Hongyi

Advanced materials system refers to new materials that are comprised of multiple traditional constituents but complex microstructure morphologies, which lead to superior properties over the conventional materials. To accelerate the development of new advanced materials system, the objective of this dissertation is to develop a computational design framework and the associated techniques for design automation of microstructure materials systems, with an emphasis on addressing the uncertainties associated with the heterogeneity of microstructural materials. Five key research tasks are identified: design representation, design evaluation, design synthesis, material informatics and uncertainty quantification. Design representation of microstructure includes statistical characterization and stochastic reconstruction. This dissertation develops a new descriptor-based methodology, which characterizes 2D microstructures using descriptors of composition, dispersion and geometry. Statistics of 3D descriptors are predicted based on 2D information to enable 2D-to-3D reconstruction. An efficient sequential reconstruction algorithm is developed to reconstruct statistically equivalent random 3D digital microstructures. In design evaluation, a stochastic decomposition and reassembly strategy is developed to deal with the high computational costs and uncertainties induced by material heterogeneity. The properties of Representative Volume Elements (RVE) are predicted by stochastically reassembling SVE elements with stochastic properties into a coarse representation of the RVE. In design synthesis, a new descriptor-based design framework is developed, which integrates computational methods of microstructure characterization and reconstruction, sensitivity analysis, Design of Experiments (DOE), metamodeling and optimization the enable parametric optimization of the microstructure for achieving the desired material properties. Material informatics is studied to efficiently reduce the dimension of microstructure design space. This dissertation develops a machine learning-based methodology to identify the key microstructure descriptors that highly impact properties of interest. In uncertainty quantification, a comparative study on data-driven random process models is conducted to provide guidance for choosing the most accurate model in statistical uncertainty quantification. Two new goodness-of-fit metrics are developed to provide quantitative measurements of random process models' accuracy. The benefits of the proposed methods are demonstrated by the example of designing the microstructure of polymer nanocomposites. This dissertation provides material-generic, intelligent modeling/design methodologies and techniques to accelerate the process of analyzing and designing new microstructural materials system.
Improved biliary detection and diagnosis through intelligent machine analysis.

PubMed

Logeswaran, Rajasvaran

2012-09-01

This paper reports on work undertaken to improve automated detection of bile ducts in magnetic resonance cholangiopancreatography (MRCP) images, with the objective of conducting preliminary classification of the images for diagnosis. The proposed I-BDeDIMA (Improved Biliary Detection and Diagnosis through Intelligent Machine Analysis) scheme is a multi-stage framework consisting of successive phases of image normalization, denoising, structure identification, object labeling, feature selection and disease classification. A combination of multiresolution wavelet, dynamic intensity thresholding, segment-based region growing, region elimination, statistical analysis and neural networks, is used in this framework to achieve good structure detection and preliminary diagnosis. Tests conducted on over 200 clinical images with known diagnosis have shown promising results of over 90% accuracy. The scheme outperforms related work in the literature, making it a viable framework for computer-aided diagnosis of biliary diseases. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Maximum Likelihood Reconstruction for Magnetic Resonance Fingerprinting

PubMed Central

Zhao, Bo; Setsompop, Kawin; Ye, Huihui; Cauley, Stephen; Wald, Lawrence L.

2017-01-01

This paper introduces a statistical estimation framework for magnetic resonance (MR) fingerprinting, a recently proposed quantitative imaging paradigm. Within this framework, we present a maximum likelihood (ML) formalism to estimate multiple parameter maps directly from highly undersampled, noisy k-space data. A novel algorithm, based on variable splitting, the alternating direction method of multipliers, and the variable projection method, is developed to solve the resulting optimization problem. Representative results from both simulations and in vivo experiments demonstrate that the proposed approach yields significantly improved accuracy in parameter estimation, compared to the conventional MR fingerprinting reconstruction. Moreover, the proposed framework provides new theoretical insights into the conventional approach. We show analytically that the conventional approach is an approximation to the ML reconstruction; more precisely, it is exactly equivalent to the first iteration of the proposed algorithm for the ML reconstruction, provided that a gridding reconstruction is used as an initialization. PMID:26915119
Maximum Likelihood Reconstruction for Magnetic Resonance Fingerprinting.

PubMed

Zhao, Bo; Setsompop, Kawin; Ye, Huihui; Cauley, Stephen F; Wald, Lawrence L

2016-08-01

This paper introduces a statistical estimation framework for magnetic resonance (MR) fingerprinting, a recently proposed quantitative imaging paradigm. Within this framework, we present a maximum likelihood (ML) formalism to estimate multiple MR tissue parameter maps directly from highly undersampled, noisy k-space data. A novel algorithm, based on variable splitting, the alternating direction method of multipliers, and the variable projection method, is developed to solve the resulting optimization problem. Representative results from both simulations and in vivo experiments demonstrate that the proposed approach yields significantly improved accuracy in parameter estimation, compared to the conventional MR fingerprinting reconstruction. Moreover, the proposed framework provides new theoretical insights into the conventional approach. We show analytically that the conventional approach is an approximation to the ML reconstruction; more precisely, it is exactly equivalent to the first iteration of the proposed algorithm for the ML reconstruction, provided that a gridding reconstruction is used as an initialization.
Mapping the Energy Cascade in the North Atlantic Ocean: The Coarse-graining Approach

DOE PAGES

Aluie, Hussein; Hecht, Matthew; Vallis, Geoffrey K.

2017-11-14

A coarse-graining framework is implemented to analyze nonlinear processes, measure energy transfer rates and map out the energy pathways from simulated global ocean data. Traditional tools to measure the energy cascade from turbulence theory, such as spectral flux or spectral transfer rely on the assumption of statistical homogeneity, or at least a large separation between the scales of motion and the scales of statistical inhomogeneity. The coarse-graining framework allows for probing the fully nonlinear dynamics simultaneously in scale and in space, and is not restricted by those assumptions. This study describes how the framework can be applied to ocean flows.
Mapping the Energy Cascade in the North Atlantic Ocean: The Coarse-graining Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aluie, Hussein; Hecht, Matthew; Vallis, Geoffrey K.

A coarse-graining framework is implemented to analyze nonlinear processes, measure energy transfer rates and map out the energy pathways from simulated global ocean data. Traditional tools to measure the energy cascade from turbulence theory, such as spectral flux or spectral transfer rely on the assumption of statistical homogeneity, or at least a large separation between the scales of motion and the scales of statistical inhomogeneity. The coarse-graining framework allows for probing the fully nonlinear dynamics simultaneously in scale and in space, and is not restricted by those assumptions. This study describes how the framework can be applied to ocean flows.
Software for Data Analysis with Graphical Models

NASA Technical Reports Server (NTRS)

Buntine, Wray L.; Roy, H. Scott

1994-01-01

Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
Metrics and Mappings: A Framework for Understanding Real-World Quantitative Estimation.

ERIC Educational Resources Information Center

Brown, Norman R.; Siegler, Robert S.

1993-01-01

A metrics and mapping framework is proposed to account for how heuristics, domain-specific reasoning, and intuitive statistical induction processes are integrated to generate estimates. Results of 4 experiments involving 188 undergraduates illustrate framework usefulness and suggest when people use heuristics and when they emphasize…
Quantum mechanics as classical statistical mechanics with an ontic extension and an epistemic restriction.

PubMed

Budiyono, Agung; Rohrlich, Daniel

2017-11-03

Where does quantum mechanics part ways with classical mechanics? How does quantum randomness differ fundamentally from classical randomness? We cannot fully explain how the theories differ until we can derive them within a single axiomatic framework, allowing an unambiguous account of how one theory is the limit of the other. Here we derive non-relativistic quantum mechanics and classical statistical mechanics within a common framework. The common axioms include conservation of average energy and conservation of probability current. But two axioms distinguish quantum mechanics from classical statistical mechanics: an "ontic extension" defines a nonseparable (global) random variable that generates physical correlations, and an "epistemic restriction" constrains allowed phase space distributions. The ontic extension and epistemic restriction, with strength on the order of Planck's constant, imply quantum entanglement and uncertainty relations. This framework suggests that the wave function is epistemic, yet it does not provide an ontic dynamics for individual systems.
Certified Reduced Basis Model Characterization: a Frequentistic Uncertainty Framework

DTIC Science & Technology

2011-01-11

14) It then follows that the Legendre coefficient random vector, (Z [0], Z [1], . . . , Z [I])(ω), is (I+1)– variate normally distributed with mean (δ...I. Note each two-sided inequality represents two constraints. 3. PDE-Based Statistical Inference We now proceed to the parametrized partial...appearance of defects or geometric variations relative to an initial baseline, or perhaps manufacturing departures from nominal specifications; if our
Successful classification of cocaine dependence using brain imaging: a generalizable machine learning approach.

PubMed

Mete, Mutlu; Sakoglu, Unal; Spence, Jeffrey S; Devous, Michael D; Harris, Thomas S; Adinoff, Bryon

2016-10-06

Neuroimaging studies have yielded significant advances in the understanding of neural processes relevant to the development and persistence of addiction. However, these advances have not explored extensively for diagnostic accuracy in human subjects. The aim of this study was to develop a statistical approach, using a machine learning framework, to correctly classify brain images of cocaine-dependent participants and healthy controls. In this study, a framework suitable for educing potential brain regions that differed between the two groups was developed and implemented. Single Photon Emission Computerized Tomography (SPECT) images obtained during rest or a saline infusion in three cohorts of 2-4 week abstinent cocaine-dependent participants (n = 93) and healthy controls (n = 69) were used to develop a classification model. An information theoretic-based feature selection algorithm was first conducted to reduce the number of voxels. A density-based clustering algorithm was then used to form spatially connected voxel clouds in three-dimensional space. A statistical classifier, Support Vectors Machine (SVM), was then used for participant classification. Statistically insignificant voxels of spatially connected brain regions were removed iteratively and classification accuracy was reported through the iterations. The voxel-based analysis identified 1,500 spatially connected voxels in 30 distinct clusters after a grid search in SVM parameters. Participants were successfully classified with 0.88 and 0.89 F-measure accuracies in 10-fold cross validation (10xCV) and leave-one-out (LOO) approaches, respectively. Sensitivity and specificity were 0.90 and 0.89 for LOO; 0.83 and 0.83 for 10xCV. Many of the 30 selected clusters are highly relevant to the addictive process, including regions relevant to cognitive control, default mode network related self-referential thought, behavioral inhibition, and contextual memories. Relative hyperactivity and hypoactivity of regional cerebral blood flow in brain regions in cocaine-dependent participants are presented with corresponding level of significance. The SVM-based approach successfully classified cocaine-dependent and healthy control participants using voxels selected with information theoretic-based and statistical methods from participants' SPECT data. The regions found in this study align with brain regions reported in the literature. These findings support the future use of brain imaging and SVM-based classifier in the diagnosis of substance use disorders and furthering an understanding of their underlying pathology.
Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis.

PubMed

Rigaill, Guillem; Balzergue, Sandrine; Brunaud, Véronique; Blondet, Eddy; Rau, Andrea; Rogier, Odile; Caius, José; Maugis-Rabusseau, Cathy; Soubigou-Taconnat, Ludivine; Aubourg, Sébastien; Lurin, Claire; Martin-Magniette, Marie-Laure; Delannoy, Etienne

2018-01-01

Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models. Our results show the relevance of a proper modeling of the mean by using linear or generalized linear modeling. Once the mean is properly modeled, the impact of the other parameters on the performance of the test is much less important. Finally, we propose to use the simple visualization of the raw P-value histogram as a practical evaluation criterion of the performance of differential analysis methods on real data sets. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A semi-supervised learning framework for biomedical event extraction based on hidden topics.

PubMed

Zhou, Deyu; Zhong, Dayou

2015-05-01

Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences. Copyright © 2015 Elsevier B.V. All rights reserved.

The role of ensemble-based statistics in variational assimilation of cloud-affected observations from infrared imagers

NASA Astrophysics Data System (ADS)

Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris

2017-04-01

Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the apparent benefit from using ensembles for cloudy radiance assimilation in an EnVar context.
An Oracle-based co-training framework for writer identification in offline handwriting

NASA Astrophysics Data System (ADS)

Porwal, Utkarsh; Rajan, Sreeranga; Govindaraju, Venu

2012-01-01

State-of-the-art techniques for writer identification have been centered primarily on enhancing the performance of the system for writer identification. Machine learning algorithms have been used extensively to improve the accuracy of such system assuming sufficient amount of data is available for training. Little attention has been paid to the prospect of harnessing the information tapped in a large amount of un-annotated data. This paper focuses on co-training based framework that can be used for iterative labeling of the unlabeled data set exploiting the independence between the multiple views (features) of the data. This paradigm relaxes the assumption of sufficiency of the data available and tries to generate labeled data from unlabeled data set along with improving the accuracy of the system. However, performance of co-training based framework is dependent on the effectiveness of the algorithm used for the selection of data points to be added in the labeled set. We propose an Oracle based approach for data selection that learns the patterns in the score distribution of classes for labeled data points and then predicts the labels (writers) of the unlabeled data point. This method for selection statistically learns the class distribution and predicts the most probable class unlike traditional selection algorithms which were based on heuristic approaches. We conducted experiments on publicly available IAM dataset and illustrate the efficacy of the proposed approach.
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov

PubMed Central

Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V

2016-01-01

Objective: Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. Methods: We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. Results and Discussion: The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. PMID:27013523
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.

PubMed

Xu, Jun; Lee, Hee-Jin; Zeng, Jia; Wu, Yonghui; Zhang, Yaoyun; Huang, Liang-Chin; Johnson, Amber; Holla, Vijaykumar; Bailey, Ann M; Cohen, Trevor; Meric-Bernstam, Funda; Bernstam, Elmer V; Xu, Hua

2016-07-01

Clinical trials investigating drugs that target specific genetic alterations in tumors are important for promoting personalized cancer therapy. The goal of this project is to create a knowledge base of cancer treatment trials with annotations about genetic alterations from ClinicalTrials.gov. We developed a semi-automatic framework that combines advanced text-processing techniques with manual review to curate genetic alteration information in cancer trials. The framework consists of a document classification system to identify cancer treatment trials from ClinicalTrials.gov and an information extraction system to extract gene and alteration pairs from the Title and Eligibility Criteria sections of clinical trials. By applying the framework to trials at ClinicalTrials.gov, we created a knowledge base of cancer treatment trials with genetic alteration annotations. We then evaluated each component of the framework against manually reviewed sets of clinical trials and generated descriptive statistics of the knowledge base. The automated cancer treatment trial identification system achieved a high precision of 0.9944. Together with the manual review process, it identified 20 193 cancer treatment trials from ClinicalTrials.gov. The automated gene-alteration extraction system achieved a precision of 0.8300 and a recall of 0.6803. After validation by manual review, we generated a knowledge base of 2024 cancer trials that are labeled with specific genetic alteration information. Analysis of the knowledge base revealed the trend of increased use of targeted therapy for cancer, as well as top frequent gene-alteration pairs of interest. We expect this knowledge base to be a valuable resource for physicians and patients who are seeking information about personalized cancer therapy. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Data-driven fuel consumption estimation: A multivariate adaptive regression spline approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Yuche; Zhu, Lei; Gonder, Jeffrey

Providing guidance and information to drivers to help them make fuel-efficient route choices remains an important and effective strategy in the near term to reduce fuel consumption from the transportation sector. One key component in implementing this strategy is a fuel-consumption estimation model. In this paper, we developed a mesoscopic fuel consumption estimation model that can be implemented into an eco-routing system. Our proposed model presents a framework that utilizes large-scale, real-world driving data, clusters road links by free-flow speed and fits one statistical model for each of cluster. This model includes predicting variables that were rarely or never consideredmore » before, such as free-flow speed and number of lanes. We applied the model to a real-world driving data set based on a global positioning system travel survey in the Philadelphia-Camden-Trenton metropolitan area. Results from the statistical analyses indicate that the independent variables we chose influence the fuel consumption rates of vehicles. But the magnitude and direction of the influences are dependent on the type of road links, specifically free-flow speeds of links. Here, a statistical diagnostic is conducted to ensure the validity of the models and results. Although the real-world driving data we used to develop statistical relationships are specific to one region, the framework we developed can be easily adjusted and used to explore the fuel consumption relationship in other regions.« less
Data-driven fuel consumption estimation: A multivariate adaptive regression spline approach

DOE PAGES

Chen, Yuche; Zhu, Lei; Gonder, Jeffrey; ...

2017-08-12

Providing guidance and information to drivers to help them make fuel-efficient route choices remains an important and effective strategy in the near term to reduce fuel consumption from the transportation sector. One key component in implementing this strategy is a fuel-consumption estimation model. In this paper, we developed a mesoscopic fuel consumption estimation model that can be implemented into an eco-routing system. Our proposed model presents a framework that utilizes large-scale, real-world driving data, clusters road links by free-flow speed and fits one statistical model for each of cluster. This model includes predicting variables that were rarely or never consideredmore » before, such as free-flow speed and number of lanes. We applied the model to a real-world driving data set based on a global positioning system travel survey in the Philadelphia-Camden-Trenton metropolitan area. Results from the statistical analyses indicate that the independent variables we chose influence the fuel consumption rates of vehicles. But the magnitude and direction of the influences are dependent on the type of road links, specifically free-flow speeds of links. Here, a statistical diagnostic is conducted to ensure the validity of the models and results. Although the real-world driving data we used to develop statistical relationships are specific to one region, the framework we developed can be easily adjusted and used to explore the fuel consumption relationship in other regions.« less
Open Source Tools for Seismicity Analysis

NASA Astrophysics Data System (ADS)

Powers, P.

2010-12-01

The spatio-temporal analysis of seismicity plays an important role in earthquake forecasting and is integral to research on earthquake interactions and triggering. For instance, the third version of the Uniform California Earthquake Rupture Forecast (UCERF), currently under development, will use Epidemic Type Aftershock Sequences (ETAS) as a model for earthquake triggering. UCERF will be a "living" model and therefore requires robust, tested, and well-documented ETAS algorithms to ensure transparency and reproducibility. Likewise, as earthquake aftershock sequences unfold, real-time access to high quality hypocenter data makes it possible to monitor the temporal variability of statistical properties such as the parameters of the Omori Law and the Gutenberg Richter b-value. Such statistical properties are valuable as they provide a measure of how much a particular sequence deviates from expected behavior and can be used when assigning probabilities of aftershock occurrence. To address these demands and provide public access to standard methods employed in statistical seismology, we present well-documented, open-source JavaScript and Java software libraries for the on- and off-line analysis of seismicity. The Javascript classes facilitate web-based asynchronous access to earthquake catalog data and provide a framework for in-browser display, analysis, and manipulation of catalog statistics; implementations of this framework will be made available on the USGS Earthquake Hazards website. The Java classes, in addition to providing tools for seismicity analysis, provide tools for modeling seismicity and generating synthetic catalogs. These tools are extensible and will be released as part of the open-source OpenSHA Commons library.
Statistical testing and power analysis for brain-wide association study.

PubMed

Gong, Weikang; Wan, Lin; Lu, Wenlian; Ma, Liang; Cheng, Fan; Cheng, Wei; Grünewald, Stefan; Feng, Jianfeng

2018-04-05

The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS. Copyright © 2018 Elsevier B.V. All rights reserved.
Scenario based optimization of a container vessel with respect to its projected operating conditions

NASA Astrophysics Data System (ADS)

Wagner, Jonas; Binkowski, Eva; Bronsart, Robert

2014-06-01

In this paper the scenario based optimization of the bulbous bow of the KRISO Container Ship (KCS) is presented. The optimization of the parametrically modeled vessel is based on a statistically developed operational profile generated from noon-to-noon reports of a comparable 3600 TEU container vessel and specific development functions representing the growth of global economy during the vessels service time. In order to consider uncertainties, statistical fluctuations are added. An analysis of these data lead to a number of most probable upcoming operating conditions (OC) the vessel will stay in the future. According to their respective likeliness an objective function for the evaluation of the optimal design variant of the vessel is derived and implemented within the parametrical optimization workbench FRIENDSHIP Framework. In the following this evaluation is done with respect to vessel's calculated effective power based on the usage of potential flow code. The evaluation shows, that the usage of scenarios within the optimization process has a strong influence on the hull form.
The Role of Discrete Global Grid Systems in the Global Statistical Geospatial Framework

NASA Astrophysics Data System (ADS)

Purss, M. B. J.; Peterson, P.; Minchin, S. A.; Bermudez, L. E.

2016-12-01

The United Nations Committee of Experts on Global Geospatial Information Management (UN-GGIM) has proposed the development of a Global Statistical Geospatial Framework (GSGF) as a mechanism for the establishment of common analytical systems that enable the integration of statistical and geospatial information. Conventional coordinate reference systems address the globe with a continuous field of points suitable for repeatable navigation and analytical geometry. While this continuous field is represented on a computer in a digitized and discrete fashion by tuples of fixed-precision floating point values, it is a non-trivial exercise to relate point observations spatially referenced in this way to areal coverages on the surface of the Earth. The GSGF states the need to move to gridded data delivery and the importance of using common geographies and geocoding. The challenges associated with meeting these goals are not new and there has been a significant effort within the geospatial community to develop nested gridding standards to tackle these issues over many years. These efforts have recently culminated in the development of a Discrete Global Grid Systems (DGGS) standard which has been developed under the auspices of Open Geospatial Consortium (OGC). DGGS provide a fixed areal based geospatial reference frame for the persistent location of measured Earth observations, feature interpretations, and modelled predictions. DGGS address the entire planet by partitioning it into a discrete hierarchical tessellation of progressively finer resolution cells, which are referenced by a unique index that facilitates rapid computation, query and analysis. The geometry and location of the cell is the principle aspect of a DGGS. Data integration, decomposition, and aggregation is optimised in the DGGS hierarchical structure and can be exploited for efficient multi-source data processing, storage, discovery, transmission, visualization, computation, analysis, and modelling. During the 6th Session of the UN-GGIM in August 2016 the role of DGGS in the context of the GSGF was formally acknowledged. This paper proposes to highlight the synergies and role of DGGS in the Global Statistical Geospatial Framework and to show examples of the use of DGGS to combine geospatial statistics with traditional geoscientific data.
A Unifying Framework for Teaching Nonparametric Statistical Tests

ERIC Educational Resources Information Center

Bargagliotti, Anna E.; Orrison, Michael E.

2014-01-01

Increased importance is being placed on statistics at both the K-12 and undergraduate level. Research divulging effective methods to teach specific statistical concepts is still widely sought after. In this paper, we focus on best practices for teaching topics in nonparametric statistics at the undergraduate level. To motivate the work, we…
Survey of Native English Speakers and Spanish-Speaking English Language Learners in Tertiary Introductory Statistics

ERIC Educational Resources Information Center

Lesser, Lawrence M.; Wagler, Amy E.; Esquinca, Alberto; Valenzuela, M. Guadalupe

2013-01-01

The framework of linguistic register and case study research on Spanish-speaking English language learners (ELLs) learning statistics informed the construction of a quantitative instrument, the Communication, Language, And Statistics Survey (CLASS). CLASS aims to assess whether ELLs and non-ELLs approach the learning of statistics differently with…
Using GAISE and NCTM Standards as Frameworks for Teaching Probability and Statistics to Pre-Service Elementary and Middle School Mathematics Teachers

ERIC Educational Resources Information Center

Metz, Mary Louise

2010-01-01

Statistics education has become an increasingly important component of the mathematics education of today's citizens. In part to address the call for a more statistically literate citizenship, The "Guidelines for Assessment and Instruction in Statistics Education (GAISE)" were developed in 2005 by the American Statistical Association. These…
Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.

PubMed

Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe

2012-01-01

Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.
Intrinsic Bayesian Active Contours for Extraction of Object Boundaries in Images

PubMed Central

Srivastava, Anuj

2010-01-01

We present a framework for incorporating prior information about high-probability shapes in the process of contour extraction and object recognition in images. Here one studies shapes as elements of an infinite-dimensional, non-linear quotient space, and statistics of shapes are defined and computed intrinsically using differential geometry of this shape space. Prior models on shapes are constructed using probability distributions on tangent bundles of shape spaces. Similar to the past work on active contours, where curves are driven by vector fields based on image gradients and roughness penalties, we incorporate the prior shape knowledge in the form of vector fields on curves. Through experimental results, we demonstrate the use of prior shape models in the estimation of object boundaries, and their success in handling partial obscuration and missing data. Furthermore, we describe the use of this framework in shape-based object recognition or classification. PMID:21076692
The importance of stress percolation patterns in rocks and other polycrystalline materials.

PubMed

Burnley, P C

2013-01-01

A new framework for thinking about the deformation behavior of rocks and other heterogeneous polycrystalline materials is proposed, based on understanding the patterns of stress transmission through these materials. Here, using finite element models, I show that stress percolates through polycrystalline materials that have heterogeneous elastic and plastic properties of the same order as those found in rocks. The pattern of stress percolation is related to the degree of heterogeneity in and statistical distribution of the elastic and plastic properties of the constituent grains in the aggregate. The development of these stress patterns leads directly to shear localization, and their existence provides insight into the formation of rhythmic features such as compositional banding and foliation in rocks that are reacting or dissolving while being deformed. In addition, this framework provides a foundation for understanding and predicting the macroscopic rheology of polycrystalline materials based on single-crystal elastic and plastic mechanical properties.
The importance of stress percolation patterns in rocks and other polycrystalline materials

PubMed Central

Burnley, P.C.

2013-01-01

A new framework for thinking about the deformation behavior of rocks and other heterogeneous polycrystalline materials is proposed, based on understanding the patterns of stress transmission through these materials. Here, using finite element models, I show that stress percolates through polycrystalline materials that have heterogeneous elastic and plastic properties of the same order as those found in rocks. The pattern of stress percolation is related to the degree of heterogeneity in and statistical distribution of the elastic and plastic properties of the constituent grains in the aggregate. The development of these stress patterns leads directly to shear localization, and their existence provides insight into the formation of rhythmic features such as compositional banding and foliation in rocks that are reacting or dissolving while being deformed. In addition, this framework provides a foundation for understanding and predicting the macroscopic rheology of polycrystalline materials based on single-crystal elastic and plastic mechanical properties. PMID:23823992
Atlas-based liver segmentation and hepatic fat-fraction assessment for clinical trials.

PubMed

Yan, Zhennan; Zhang, Shaoting; Tan, Chaowei; Qin, Hongxing; Belaroussi, Boubakeur; Yu, Hui Jing; Miller, Colin; Metaxas, Dimitris N

2015-04-01

Automated assessment of hepatic fat-fraction is clinically important. A robust and precise segmentation would enable accurate, objective and consistent measurement of hepatic fat-fraction for disease quantification, therapy monitoring and drug development. However, segmenting the liver in clinical trials is a challenging task due to the variability of liver anatomy as well as the diverse sources the images were acquired from. In this paper, we propose an automated and robust framework for liver segmentation and assessment. It uses single statistical atlas registration to initialize a robust deformable model to obtain fine segmentation. Fat-fraction map is computed by using chemical shift based method in the delineated region of liver. This proposed method is validated on 14 abdominal magnetic resonance (MR) volumetric scans. The qualitative and quantitative comparisons show that our proposed method can achieve better segmentation accuracy with less variance comparing with two other atlas-based methods. Experimental results demonstrate the promises of our assessment framework. Copyright © 2014 Elsevier Ltd. All rights reserved.
The difference between a dynamic and mechanical approach to stroke treatment.

PubMed

Helgason, Cathy M

2007-06-01

The current classification of stroke is based on causation, also called pathogenesis, and relies on binary logic faithful to the Aristotelian tradition. Accordingly, a pathology is or is not the cause of the stroke, is considered independent of others, and is the target for treatment. It is the subject for large double-blind randomized clinical therapeutic trials. The scientific view behind clinical trials is the fundamental concept that information is statistical, and causation is determined by probabilities. Therefore, the cause and effect relation will be determined by probability-theory-based statistics. This is the basis of evidence-based medicine, which calls for the results of such trials to be the basis for physician decisions regarding diagnosis and treatment. However, there are problems with the methodology behind evidence-based medicine. Calculations using probability-theory-based statistics regarding cause and effect are performed within an automatic system where there are known inputs and outputs. This method of research provides a framework of certainty with no surprise elements or outcomes. However, it is not a system or method that will come up with previously unknown variables, concepts, or universal principles; it is not a method that will give a new outcome; and it is not a method that allows for creativity, expertise, or new insight for problem solving.
Experimental Test of Heisenberg's Measurement Uncertainty Relation Based on Statistical Distances

NASA Astrophysics Data System (ADS)

Ma, Wenchao; Ma, Zhihao; Wang, Hengyan; Chen, Zhihua; Liu, Ying; Kong, Fei; Li, Zhaokai; Peng, Xinhua; Shi, Mingjun; Shi, Fazhan; Fei, Shao-Ming; Du, Jiangfeng

2016-04-01

Incompatible observables can be approximated by compatible observables in joint measurement or measured sequentially, with constrained accuracy as implied by Heisenberg's original formulation of the uncertainty principle. Recently, Busch, Lahti, and Werner proposed inaccuracy trade-off relations based on statistical distances between probability distributions of measurement outcomes [P. Busch et al., Phys. Rev. Lett. 111, 160405 (2013); P. Busch et al., Phys. Rev. A 89, 012129 (2014)]. Here we reformulate their theoretical framework, derive an improved relation for qubit measurement, and perform an experimental test on a spin system. The relation reveals that the worst-case inaccuracy is tightly bounded from below by the incompatibility of target observables, and is verified by the experiment employing joint measurement in which two compatible observables designed to approximate two incompatible observables on one qubit are measured simultaneously.

Experimental Test of Heisenberg's Measurement Uncertainty Relation Based on Statistical Distances.

PubMed

Ma, Wenchao; Ma, Zhihao; Wang, Hengyan; Chen, Zhihua; Liu, Ying; Kong, Fei; Li, Zhaokai; Peng, Xinhua; Shi, Mingjun; Shi, Fazhan; Fei, Shao-Ming; Du, Jiangfeng

2016-04-22

Incompatible observables can be approximated by compatible observables in joint measurement or measured sequentially, with constrained accuracy as implied by Heisenberg's original formulation of the uncertainty principle. Recently, Busch, Lahti, and Werner proposed inaccuracy trade-off relations based on statistical distances between probability distributions of measurement outcomes [P. Busch et al., Phys. Rev. Lett. 111, 160405 (2013); P. Busch et al., Phys. Rev. A 89, 012129 (2014)]. Here we reformulate their theoretical framework, derive an improved relation for qubit measurement, and perform an experimental test on a spin system. The relation reveals that the worst-case inaccuracy is tightly bounded from below by the incompatibility of target observables, and is verified by the experiment employing joint measurement in which two compatible observables designed to approximate two incompatible observables on one qubit are measured simultaneously.
Quantifying risks with exact analytical solutions of derivative pricing distribution

NASA Astrophysics Data System (ADS)

Zhang, Kun; Liu, Jing; Wang, Erkang; Wang, Jin

2017-04-01

Derivative (i.e. option) pricing is essential for modern financial instrumentations. Despite of the previous efforts, the exact analytical forms of the derivative pricing distributions are still challenging to obtain. In this study, we established a quantitative framework using path integrals to obtain the exact analytical solutions of the statistical distribution for bond and bond option pricing for the Vasicek model. We discuss the importance of statistical fluctuations away from the expected option pricing characterized by the distribution tail and their associations to value at risk (VaR). The framework established here is general and can be applied to other financial derivatives for quantifying the underlying statistical distributions.
Post-fire debris flow prediction in Western United States: Advancements based on a nonparametric statistical technique

NASA Astrophysics Data System (ADS)

Nikolopoulos, E. I.; Destro, E.; Bhuiyan, M. A. E.; Borga, M., Sr.; Anagnostou, E. N.

2017-12-01

Fire disasters affect modern societies at global scale inducing significant economic losses and human casualties. In addition to their direct impacts they have various adverse effects on hydrologic and geomorphologic processes of a region due to the tremendous alteration of the landscape characteristics (vegetation, soil properties etc). As a consequence, wildfires often initiate a cascade of hazards such as flash floods and debris flows that usually follow the occurrence of a wildfire thus magnifying the overall impact in a region. Post-fire debris flows (PFDF) is one such type of hazards frequently occurring in Western United States where wildfires are a common natural disaster. Prediction of PDFD is therefore of high importance in this region and over the last years a number of efforts from United States Geological Survey (USGS) and National Weather Service (NWS) have been focused on the development of early warning systems that will help mitigate PFDF risk. This work proposes a prediction framework that is based on a nonparametric statistical technique (random forests) that allows predicting the occurrence of PFDF at regional scale with a higher degree of accuracy than the commonly used approaches that are based on power-law thresholds and logistic regression procedures. The work presented is based on a recently released database from USGS that reports a total of 1500 storms that triggered and did not trigger PFDF in a number of fire affected catchments in Western United States. The database includes information on storm characteristics (duration, accumulation, max intensity etc) and other auxiliary information of land surface properties (soil erodibility index, local slope etc). Results show that the proposed model is able to achieve a satisfactory prediction accuracy (threat score > 0.6) superior of previously published prediction frameworks highlighting the potential of nonparametric statistical techniques for development of PFDF prediction systems.
Using high-resolution variant frequencies to empower clinical genome interpretation.

PubMed

Whiffin, Nicola; Minikel, Eric; Walsh, Roddy; O'Donnell-Luria, Anne H; Karczewski, Konrad; Ing, Alexander Y; Barton, Paul J R; Funke, Birgit; Cook, Stuart A; MacArthur, Daniel; Ware, James S

2017-10-01

PurposeWhole-exome and whole-genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognized as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants.MethodsWe present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets.ResultsUsing the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, without removing true pathogenic variants (false-positive rate<0.001).ConclusionWe outline a statistically robust framework for assessing whether a variant is "too common" to be causative for a Mendelian disorder of interest. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.
Multidimensional Analysis of Linguistic Networks

NASA Astrophysics Data System (ADS)

Araújo, Tanya; Banisch, Sven

Network-based approaches play an increasingly important role in the analysis of data even in systems in which a network representation is not immediately apparent. This is particularly true for linguistic networks, which use to be induced from a linguistic data set for which a network perspective is only one out of several options for representation. Here we introduce a multidimensional framework for network construction and analysis with special focus on linguistic networks. Such a framework is used to show that the higher is the abstraction level of network induction, the harder is the interpretation of the topological indicators used in network analysis. Several examples are provided allowing for the comparison of different linguistic networks as well as to networks in other fields of application of network theory. The computation and the intelligibility of some statistical indicators frequently used in linguistic networks are discussed. It suggests that the field of linguistic networks, by applying statistical tools inspired by network studies in other domains, may, in its current state, have only a limited contribution to the development of linguistic theory.
Bayesian analysis of the flutter margin method in aeroelasticity

DOE PAGES

Khalil, Mohammad; Poirel, Dominique; Sarkar, Abhijit

2016-08-27

A Bayesian statistical framework is presented for Zimmerman and Weissenburger flutter margin method which considers the uncertainties in aeroelastic modal parameters. The proposed methodology overcomes the limitations of the previously developed least-square based estimation technique which relies on the Gaussian approximation of the flutter margin probability density function (pdf). Using the measured free-decay responses at subcritical (preflutter) airspeeds, the joint non-Gaussain posterior pdf of the modal parameters is sampled using the Metropolis–Hastings (MH) Markov chain Monte Carlo (MCMC) algorithm. The posterior MCMC samples of the modal parameters are then used to obtain the flutter margin pdfs and finally the fluttermore » speed pdf. The usefulness of the Bayesian flutter margin method is demonstrated using synthetic data generated from a two-degree-of-freedom pitch-plunge aeroelastic model. The robustness of the statistical framework is demonstrated using different sets of measurement data. In conclusion, it will be shown that the probabilistic (Bayesian) approach reduces the number of test points required in providing a flutter speed estimate for a given accuracy and precision.« less
Prediction of the low-velocity distribution from the pore structure in simple porous media

NASA Astrophysics Data System (ADS)

de Anna, Pietro; Quaife, Bryan; Biros, George; Juanes, Ruben

2017-12-01

The macroscopic properties of fluid flow and transport through porous media are a direct consequence of the underlying pore structure. However, precise relations that characterize flow and transport from the statistics of pore-scale disorder have remained elusive. Here we investigate the relationship between pore structure and the resulting fluid flow and asymptotic transport behavior in two-dimensional geometries of nonoverlapping circular posts. We derive an analytical relationship between the pore throat size distribution fλ˜λ-β and the distribution of the low fluid velocities fu˜u-β /2 , based on a conceptual model of porelets (the flow established within each pore throat, here a Hagen-Poiseuille flow). Our model allows us to make predictions, within a continuous-time random-walk framework, for the asymptotic statistics of the spreading of fluid particles along their own trajectories. These predictions are confirmed by high-fidelity simulations of Stokes flow and advective transport. The proposed framework can be extended to other configurations which can be represented as a collection of known flow distributions.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Kandler; Shi, Ying; Santhanagopalan, Shriram

Predictive models of Li-ion battery lifetime must consider a multiplicity of electrochemical, thermal, and mechanical degradation modes experienced by batteries in application environments. To complicate matters, Li-ion batteries can experience different degradation trajectories that depend on storage and cycling history of the application environment. Rates of degradation are controlled by factors such as temperature history, electrochemical operating window, and charge/discharge rate. We present a generalized battery life prognostic model framework for battery systems design and control. The model framework consists of trial functions that are statistically regressed to Li-ion cell life datasets wherein the cells have been aged under differentmore » levels of stress. Degradation mechanisms and rate laws dependent on temperature, storage, and cycling condition are regressed to the data, with multiple model hypotheses evaluated and the best model down-selected based on statistics. The resulting life prognostic model, implemented in state variable form, is extensible to arbitrary real-world scenarios. The model is applicable in real-time control algorithms to maximize battery life and performance. We discuss efforts to reduce lifetime prediction error and accommodate its inevitable impact in controller design.« less
A Framework for Authenticity in the Mathematics and Statistics Classroom

ERIC Educational Resources Information Center

Garrett, Lauretta; Huang, Li; Charleton, Maria Calhoun

2016-01-01

Authenticity is a term commonly used in reference to pedagogical and curricular qualities of mathematics teaching and learning, but its use lacks a coherent framework. The work of researchers in engineering education provides such a framework. Authentic qualities of mathematics teaching and learning are fit within a model described by Strobel,…
Prostatome: A combined anatomical and disease based MRI atlas of the prostate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rusu, Mirabela; Madabhushi, Anant, E-mail: anant.madabhushi@case.edu; Bloch, B. Nicolas

Purpose: In this work, the authors introduce a novel framework, the anatomically constrained registration (AnCoR) scheme and apply it to create a fused anatomic-disease atlas of the prostate which the authors refer to as the prostatome. The prostatome combines a MRI based anatomic and a histology based disease atlas. Statistical imaging atlases allow for the integration of information across multiple scales and imaging modalities into a single canonical representation, in turn enabling a fused anatomical-disease representation which may facilitate the characterization of disease appearance relative to anatomic structures. While statistical atlases have been extensively developed and studied for the brain,more » approaches that have attempted to combine pathology and imaging data for study of prostate pathology are not extant. This works seeks to address this gap. Methods: The AnCoR framework optimizes a scoring function composed of two surface (prostate and central gland) misalignment measures and one intensity-based similarity term. This ensures the correct mapping of anatomic regions into the atlas, even when regional MRI intensities are inconsistent or highly variable between subjects. The framework allows for creation of an anatomic imaging and a disease atlas, while enabling their fusion into the anatomic imaging-disease atlas. The atlas presented here was constructed using 83 subjects with biopsy confirmed cancer who had pre-operative MRI (collected at two institutions) followed by radical prostatectomy. The imaging atlas results from mapping thein vivo MRI into the canonical space, while the anatomic regions serve as domain constraints. Elastic co-registration MRI and corresponding ex vivo histology provides “ground truth” mapping of cancer extent on in vivo imaging for 23 subjects. Results: AnCoR was evaluated relative to alternative construction strategies that use either MRI intensities or the prostate surface alone for registration. The AnCoR framework yielded a central gland Dice similarity coefficient (DSC) of 90%, and prostate DSC of 88%, while the misalignment of the urethra and verumontanum was found to be 3.45 mm, and 4.73 mm, respectively, which were measured to be significantly smaller compared to the alternative strategies. As might have been anticipated from our limited cohort of biopsy confirmed cancers, the disease atlas showed that most of the tumor extent was limited to the peripheral zone. Moreover, central gland tumors were typically larger in size, possibly because they are only discernible at a much later stage. Conclusions: The authors presented the AnCoR framework to explicitly model anatomic constraints for the construction of a fused anatomic imaging-disease atlas. The framework was applied to constructing a preliminary version of an anatomic-disease atlas of the prostate, the prostatome. The prostatome could facilitate the quantitative characterization of gland morphology and imaging features of prostate cancer. These techniques, may be applied on a large sample size data set to create a fully developed prostatome that could serve as a spatial prior for targeted biopsies by urologists. Additionally, the AnCoR framework could allow for incorporation of complementary imaging and molecular data, thereby enabling their careful correlation for population based radio-omics studies.« less
Quantifying Variation in Gait Features from Wearable Inertial Sensors Using Mixed Effects Models

PubMed Central

Cresswell, Kellen Garrison; Shin, Yongyun; Chen, Shanshan

2017-01-01

The emerging technology of wearable inertial sensors has shown its advantages in collecting continuous longitudinal gait data outside laboratories. This freedom also presents challenges in collecting high-fidelity gait data. In the free-living environment, without constant supervision from researchers, sensor-based gait features are susceptible to variation from confounding factors such as gait speed and mounting uncertainty, which are challenging to control or estimate. This paper is one of the first attempts in the field to tackle such challenges using statistical modeling. By accepting the uncertainties and variation associated with wearable sensor-based gait data, we shift our efforts from detecting and correcting those variations to modeling them statistically. From gait data collected on one healthy, non-elderly subject during 48 full-factorial trials, we identified four major sources of variation, and quantified their impact on one gait outcome—range per cycle—using a random effects model and a fixed effects model. The methodology developed in this paper lays the groundwork for a statistical framework to account for sources of variation in wearable gait data, thus facilitating informative statistical inference for free-living gait analysis. PMID:28245602
Observers Exploit Stochastic Models of Sensory Change to Help Judge the Passage of Time

PubMed Central

Ahrens, Misha B.; Sahani, Maneesh

2011-01-01

Summary Sensory stimulation can systematically bias the perceived passage of time [1–5], but why and how this happens is mysterious. In this report, we provide evidence that such biases may ultimately derive from an innate and adaptive use of stochastically evolving dynamic stimuli to help refine estimates derived from internal timekeeping mechanisms [6–15]. A simplified statistical model based on probabilistic expectations of stimulus change derived from the second-order temporal statistics of the natural environment [16, 17] makes three predictions. First, random noise-like stimuli whose statistics violate natural expectations should induce timing bias. Second, a previously unexplored obverse of this effect is that similar noise stimuli with natural statistics should reduce the variability of timing estimates. Finally, this reduction in variability should scale with the interval being timed, so as to preserve the overall Weber law of interval timing. All three predictions are borne out experimentally. Thus, in the context of our novel theoretical framework, these results suggest that observers routinely rely on sensory input to augment their sense of the passage of time, through a process of Bayesian inference based on expectations of change in the natural environment. PMID:21256018
Neurophysiological Markers of Statistical Learning in Music and Language: Hierarchy, Entropy, and Uncertainty.

PubMed

Daikoku, Tatsuya

2018-06-19

Statistical learning (SL) is a method of learning based on the transitional probabilities embedded in sequential phenomena such as music and language. It has been considered an implicit and domain-general mechanism that is innate in the human brain and that functions independently of intention to learn and awareness of what has been learned. SL is an interdisciplinary notion that incorporates information technology, artificial intelligence, musicology, and linguistics, as well as psychology and neuroscience. A body of recent study has suggested that SL can be reflected in neurophysiological responses based on the framework of information theory. This paper reviews a range of work on SL in adults and children that suggests overlapping and independent neural correlations in music and language, and that indicates disability of SL. Furthermore, this article discusses the relationships between the order of transitional probabilities (TPs) (i.e., hierarchy of local statistics) and entropy (i.e., global statistics) regarding SL strategies in human's brains; claims importance of information-theoretical approaches to understand domain-general, higher-order, and global SL covering both real-world music and language; and proposes promising approaches for the application of therapy and pedagogy from various perspectives of psychology, neuroscience, computational studies, musicology, and linguistics.
Structural graph-based morphometry: A multiscale searchlight framework based on sulcal pits.

PubMed

Takerkart, Sylvain; Auzias, Guillaume; Brun, Lucile; Coulon, Olivier

2017-01-01

Studying the topography of the cortex has proved valuable in order to characterize populations of subjects. In particular, the recent interest towards the deepest parts of the cortical sulci - the so-called sulcal pits - has opened new avenues in that regard. In this paper, we introduce the first fully automatic brain morphometry method based on the study of the spatial organization of sulcal pits - Structural Graph-Based Morphometry (SGBM). Our framework uses attributed graphs to model local patterns of sulcal pits, and further relies on three original contributions. First, a graph kernel is defined to provide a new similarity measure between pit-graphs, with few parameters that can be efficiently estimated from the data. Secondly, we present the first searchlight scheme dedicated to brain morphometry, yielding dense information maps covering the full cortical surface. Finally, a multi-scale inference strategy is designed to jointly analyze the searchlight information maps obtained at different spatial scales. We demonstrate the effectiveness of our framework by studying gender differences and cortical asymmetries: we show that SGBM can both localize informative regions and estimate their spatial scales, while providing results which are consistent with the literature. Thanks to the modular design of our kernel and the vast array of available kernel methods, SGBM can easily be extended to include a more detailed description of the sulcal patterns and solve different statistical problems. Therefore, we suggest that our SGBM framework should be useful for both reaching a better understanding of the normal brain and defining imaging biomarkers in clinical settings. Copyright © 2016 Elsevier B.V. All rights reserved.
Application of an integrated Weather Research and Forecasting (WRF)/CALPUFF modeling tool for source apportionment of atmospheric pollutants for air quality management: A case study in the urban area of Benxi, China.

PubMed

Wu, Hao; Zhang, Yan; Yu, Qi; Ma, Weichun

2018-04-01

In this study, the authors endeavored to develop an effective framework for improving local urban air quality on meso-micro scales in cities in China that are experiencing rapid urbanization. Within this framework, the integrated Weather Research and Forecasting (WRF)/CALPUFF modeling system was applied to simulate the concentration distributions of typical pollutants (particulate matter with an aerodynamic diameter <10 μm [PM 10 ], sulfur dioxide [SO 2 ], and nitrogen oxides [NO x ]) in the urban area of Benxi. Statistical analyses were performed to verify the credibility of this simulation, including the meteorological fields and concentration fields. The sources were then categorized using two different classification methods (the district-based and type-based methods), and the contributions to the pollutant concentrations from each source category were computed to provide a basis for appropriate control measures. The statistical indexes showed that CALMET had sufficient ability to predict the meteorological conditions, such as the wind fields and temperatures, which provided meteorological data for the subsequent CALPUFF run. The simulated concentrations from CALPUFF showed considerable agreement with the observed values but were generally underestimated. The spatial-temporal concentration pattern revealed that the maximum concentrations tended to appear in the urban centers and during the winter. In terms of their contributions to pollutant concentrations, the districts of Xihu, Pingshan, and Mingshan all affected the urban air quality to different degrees. According to the type-based classification, which categorized the pollution sources as belonging to the Bengang Group, large point sources, small point sources, and area sources, the source apportionment showed that the Bengang Group, the large point sources, and the area sources had considerable impacts on urban air quality. Finally, combined with the industrial characteristics, detailed control measures were proposed with which local policy makers could improve the urban air quality in Benxi. In summary, the results of this study showed that this framework has credibility for effectively improving urban air quality, based on the source apportionment of atmospheric pollutants. The authors endeavored to build up an effective framework based on the integrated WRF/CALPUFF to improve the air quality in many cities on meso-micro scales in China. Via this framework, the integrated modeling tool is accurately used to study the characteristics of meteorological fields, concentration fields, and source apportionments of pollutants in target area. The impacts of classified sources on air quality together with the industrial characteristics can provide more effective control measures for improving air quality. Through the case study, the technical framework developed in this study, particularly the source apportionment, could provide important data and technical support for policy makers to assess air pollution on the scale of a city in China or even the world.
Statistical Learning and Language: An Individual Differences Study

ERIC Educational Resources Information Center

Misyak, Jennifer B.; Christiansen, Morten H.

2012-01-01

Although statistical learning and language have been assumed to be intertwined, this theoretical presupposition has rarely been tested empirically. The present study investigates the relationship between statistical learning and language using a within-subject design embedded in an individual-differences framework. Participants were administered…
A Statistical Test for Comparing Nonnested Covariance Structure Models.

ERIC Educational Resources Information Center

Levy, Roy; Hancock, Gregory R.

While statistical procedures are well known for comparing hierarchically related (nested) covariance structure models, statistical tests for comparing nonhierarchically related (nonnested) models have proven more elusive. While isolated attempts have been made, none exists within the commonly used maximum likelihood estimation framework, thereby…
A proposed framework for assessing risk from less-than-lifetime exposures to carcinogens.

PubMed

Felter, Susan P; Conolly, Rory B; Bercu, Joel P; Bolger, P Michael; Boobis, Alan R; Bos, Peter M J; Carthew, Philip; Doerrer, Nancy G; Goodman, Jay I; Harrouk, Wafa A; Kirkland, David J; Lau, Serrine S; Llewellyn, G Craig; Preston, R Julian; Schoeny, Rita; Schnatter, A Robert; Tritscher, Angelika; van Velsen, Frans; Williams, Gary M

2011-07-01

Quantitative methods for estimation of cancer risk have been developed for daily, lifetime human exposures. There are a variety of studies or methodologies available to address less-than-lifetime exposures. However, a common framework for evaluating risk from less-than-lifetime exposures (including short-term and/or intermittent exposures) does not exist, which could result in inconsistencies in risk assessment practice. To address this risk assessment need, a committee of the International Life Sciences Institute (ILSI) Health and Environmental Sciences Institute conducted a multisector workshop in late 2009 to discuss available literature, different methodologies, and a proposed framework. The proposed framework provides a decision tree and guidance for cancer risk assessments for less-than-lifetime exposures based on current knowledge of mode of action and dose-response. Available data from rodent studies and epidemiological studies involving less-than-lifetime exposures are considered, in addition to statistical approaches described in the literature for evaluating the impact of changing the dose rate and exposure duration for exposure to carcinogens. The decision tree also provides for scenarios in which an assumption of potential carcinogenicity is appropriate (e.g., based on structural alerts or genotoxicity data), but bioassay or other data are lacking from which a chemical-specific cancer potency can be determined. This paper presents an overview of the rationale for the workshop, reviews historical background, describes the proposed framework for assessing less-than-lifetime exposures to potential human carcinogens, and suggests next steps.
A Stochastic Framework for Modeling the Population Dynamics of Convective Clouds

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hagos, Samson; Feng, Zhe; Plant, Robert S.

A stochastic prognostic framework for modeling the population dynamics of convective clouds and representing them in climate models is proposed. The approach used follows the non-equilibrium statistical mechanical approach through a master equation. The aim is to represent the evolution of the number of convective cells of a specific size and their associated cloud-base mass flux, given a large-scale forcing. In this framework, referred to as STOchastic framework for Modeling Population dynamics of convective clouds (STOMP), the evolution of convective cell size is predicted from three key characteristics: (i) the probability of growth, (ii) the probability of decay, and (iii)more » the cloud-base mass flux. STOMP models are constructed and evaluated against CPOL radar observations at Darwin and convection permitting model (CPM) simulations. Multiple models are constructed under various assumptions regarding these three key parameters and the realisms of these models are evaluated. It is shown that in a model where convective plumes prefer to aggregate spatially and mass flux is a non-linear function of convective cell area, mass flux manifests a recharge-discharge behavior under steady forcing. Such a model also produces observed behavior of convective cell populations and CPM simulated mass flux variability under diurnally varying forcing. Besides its use in developing understanding of convection processes and the controls on convective cell size distributions, this modeling framework is also designed to be capable of providing alternative, non-equilibrium, closure formulations for spectral mass flux parameterizations.« less
The A, C, G, and T of Genome Assembly.

PubMed

Wajid, Bilal; Sohail, Muhammad U; Ekti, Ali R; Serpedin, Erchin

2016-01-01

Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.

KOJAK: Scalable Semantic Link Discovery Via Integrated Knowledge-Based and Statistical Reasoning

DTIC Science & Technology

2006-11-01

program can find interesting connections in a network without having to learn the patterns of interestingness beforehand. The key advantage of our...Interesting Instances in Semantic Graphs Below we describe how the UNICORN framework can discover interesting instances in a multi-relational dataset...We can now describe how UNICORN solves the first problem of finding the top interesting nodes in a semantic net by ranking them according to
Survey of the World Agricultural Documentation Services, Draft; Prepared on Behalf of the FAO Panel of Experts on "AGRIS" (International Information System for the Agricultural Sciences and Technology).

ERIC Educational Resources Information Center

Buntrock, H.

The purpose of the survey was: (1) to evaluate existing agricultural information services and (2) to propose possible frameworks for an improved world-wide agricultural information service. The principal statistical results of the survey are summarized in the following figures which are based on data collected in nearly all instances for the year…
Analytical approach for collective diffusion: One-dimensional lattice with the nearest neighbor and the next nearest neighbor lateral interactions

NASA Astrophysics Data System (ADS)

Tarasenko, Alexander

2018-01-01

Diffusion of particles adsorbed on a homogeneous one-dimensional lattice is investigated using a theoretical approach and MC simulations. The analytical dependencies calculated in the framework of approach are tested using the numerical data. The perfect coincidence of the data obtained by these different methods demonstrates that the correctness of the approach based on the theory of the non-equilibrium statistical operator.
Evaluation of high-resolution sea ice models on the basis of statistical and scaling properties of Arctic sea ice drift and deformation

NASA Astrophysics Data System (ADS)

Girard, L.; Weiss, J.; Molines, J. M.; Barnier, B.; Bouillon, S.

2009-08-01

Sea ice drift and deformation from models are evaluated on the basis of statistical and scaling properties. These properties are derived from two observation data sets: the RADARSAT Geophysical Processor System (RGPS) and buoy trajectories from the International Arctic Buoy Program (IABP). Two simulations obtained with the Louvain-la-Neuve Ice Model (LIM) coupled to a high-resolution ocean model and a simulation obtained with the Los Alamos Sea Ice Model (CICE) were analyzed. Model ice drift compares well with observations in terms of large-scale velocity field and distributions of velocity fluctuations although a significant bias on the mean ice speed is noted. On the other hand, the statistical properties of ice deformation are not well simulated by the models: (1) The distributions of strain rates are incorrect: RGPS distributions of strain rates are power law tailed, i.e., exhibit "wild randomness," whereas models distributions remain in the Gaussian attraction basin, i.e., exhibit "mild randomness." (2) The models are unable to reproduce the spatial and temporal correlations of the deformation fields: In the observations, ice deformation follows spatial and temporal scaling laws that express the heterogeneity and the intermittency of deformation. These relations do not appear in simulated ice deformation. Mean deformation in models is almost scale independent. The statistical properties of ice deformation are a signature of the ice mechanical behavior. The present work therefore suggests that the mechanical framework currently used by models is inappropriate. A different modeling framework based on elastic interactions could improve the representation of the statistical and scaling properties of ice deformation.
A GPU-Parallelized Eigen-Based Clutter Filter Framework for Ultrasound Color Flow Imaging.

PubMed

Chee, Adrian J Y; Yiu, Billy Y S; Yu, Alfred C H

2017-01-01

Eigen-filters with attenuation response adapted to clutter statistics in color flow imaging (CFI) have shown improved flow detection sensitivity in the presence of tissue motion. Nevertheless, its practical adoption in clinical use is not straightforward due to the high computational cost for solving eigendecompositions. Here, we provide a pedagogical description of how a real-time computing framework for eigen-based clutter filtering can be developed through a single-instruction, multiple data (SIMD) computing approach that can be implemented on a graphical processing unit (GPU). Emphasis is placed on the single-ensemble-based eigen-filtering approach (Hankel singular value decomposition), since it is algorithmically compatible with GPU-based SIMD computing. The key algebraic principles and the corresponding SIMD algorithm are explained, and annotations on how such algorithm can be rationally implemented on the GPU are presented. Real-time efficacy of our framework was experimentally investigated on a single GPU device (GTX Titan X), and the computing throughput for varying scan depths and slow-time ensemble lengths was studied. Using our eigen-processing framework, real-time video-range throughput (24 frames/s) can be attained for CFI frames with full view in azimuth direction (128 scanlines), up to a scan depth of 5 cm ( λ pixel axial spacing) for slow-time ensemble length of 16 samples. The corresponding CFI image frames, with respect to the ones derived from non-adaptive polynomial regression clutter filtering, yielded enhanced flow detection sensitivity in vivo, as demonstrated in a carotid imaging case example. These findings indicate that the GPU-enabled eigen-based clutter filtering can improve CFI flow detection performance in real time.
Wavelet domain image restoration with adaptive edge-preserving regularization.

PubMed

Belge, M; Kilmer, M E; Miller, E L

2000-01-01

In this paper, we consider a wavelet based edge-preserving regularization scheme for use in linear image restoration problems. Our efforts build on a collection of mathematical results indicating that wavelets are especially useful for representing functions that contain discontinuities (i.e., edges in two dimensions or jumps in one dimension). We interpret the resulting theory in a statistical signal processing framework and obtain a highly flexible framework for adapting the degree of regularization to the local structure of the underlying image. In particular, we are able to adapt quite easily to scale-varying and orientation-varying features in the image while simultaneously retaining the edge preservation properties of the regularizer. We demonstrate a half-quadratic algorithm for obtaining the restorations from observed data.
Detecting Anomalous Insiders in Collaborative Information Systems

PubMed Central

Chen, You; Nyemba, Steve; Malin, Bradley

2012-01-01

Collaborative information systems (CISs) are deployed within a diverse array of environments that manage sensitive information. Current security mechanisms detect insider threats, but they are ill-suited to monitor systems in which users function in dynamic teams. In this paper, we introduce the community anomaly detection system (CADS), an unsupervised learning framework to detect insider threats based on the access logs of collaborative environments. The framework is based on the observation that typical CIS users tend to form community structures based on the subjects accessed (e.g., patients’ records viewed by healthcare providers). CADS consists of two components: 1) relational pattern extraction, which derives community structures and 2) anomaly prediction, which leverages a statistical model to determine when users have sufficiently deviated from communities. We further extend CADS into MetaCADS to account for the semantics of subjects (e.g., patients’ diagnoses). To empirically evaluate the framework, we perform an assessment with three months of access logs from a real electronic health record (EHR) system in a large medical center. The results illustrate our models exhibit significant performance gains over state-of-the-art competitors. When the number of illicit users is low, MetaCADS is the best model, but as the number grows, commonly accessed semantics lead to hiding in a crowd, such that CADS is more prudent. PMID:24489520
Statistical approach for selection of biologically informative genes.

PubMed

Das, Samarendra; Rai, Anil; Mishra, D C; Rai, Shesh N

2018-05-20

Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies. Published by Elsevier B.V.
Hand and goods judgment algorithm based on depth information

NASA Astrophysics Data System (ADS)

Li, Mingzhu; Zhang, Jinsong; Yan, Dan; Wang, Qin; Zhang, Ruiqi; Han, Jing

2016-03-01

A tablet computer with a depth camera and a color camera is loaded on a traditional shopping cart. The inside information of the shopping cart is obtained by two cameras. In the shopping cart monitoring field, it is very important for us to determine whether the customer with goods in or out of the shopping cart. This paper establishes a basic framework for judging empty hand, it includes the hand extraction process based on the depth information, process of skin color model building based on WPCA (Weighted Principal Component Analysis), an algorithm for judging handheld products based on motion and skin color information, statistical process. Through this framework, the first step can ensure the integrity of the hand information, and effectively avoids the influence of sleeve and other debris, the second step can accurately extract skin color and eliminate the similar color interference, light has little effect on its results, it has the advantages of fast computation speed and high efficiency, and the third step has the advantage of greatly reducing the noise interference and improving the accuracy.
Development of an audit method to assess the prevalence of the ACGME's general competencies in an undergraduate medical education curriculum.

PubMed

Mooney, Christopher J; Lurie, Stephen J; Lyness, Jeffrey M; Lambert, David R; Guzick, David S

2010-10-01

Despite the use of competency-based frameworks to evaluate physicians, the role of competency-based objectives in undergraduate medical education remains uncertain. By use of an audit methodology, we sought to determine how the six Accreditation Council for Graduate Medical Education (ACGME) competencies, conceptualized as educational domains, would map onto an undergraduate medical curriculum. Standardized audit forms listing required activities were provided to course directors, who were then asked to indicate which of the domains were represented in each activity. Descriptive statistics were calculated. Of 1,500 activities, there was a mean of 2.13 domains per activity. Medical Knowledge was the most prevalent (44%), followed by Patient Care (20%), Interpersonal and Communication Skills (12%), Professionalism (9%), Systems-Based Practice (8%), and Practice-Based Learning and Improvement (7%). There was considerable variation by year and course. The domains provide a useful framework for organizing didactic components. Faculty can also consider activities in light of the domains, providing a vocabulary for instituting curricular change and innovation.
Computational Analysis for Rocket-Based Combined-Cycle Systems During Rocket-Only Operation

NASA Technical Reports Server (NTRS)

Steffen, C. J., Jr.; Smith, T. D.; Yungster, S.; Keller, D. J.

2000-01-01

A series of Reynolds-averaged Navier-Stokes calculations were employed to study the performance of rocket-based combined-cycle systems operating in an all-rocket mode. This parametric series of calculations were executed within a statistical framework, commonly known as design of experiments. The parametric design space included four geometric and two flowfield variables set at three levels each, for a total of 729 possible combinations. A D-optimal design strategy was selected. It required that only 36 separate computational fluid dynamics (CFD) solutions be performed to develop a full response surface model, which quantified the linear, bilinear, and curvilinear effects of the six experimental variables. The axisymmetric, Reynolds-averaged Navier-Stokes simulations were executed with the NPARC v3.0 code. The response used in the statistical analysis was created from Isp efficiency data integrated from the 36 CFD simulations. The influence of turbulence modeling was analyzed by using both one- and two-equation models. Careful attention was also given to quantify the influence of mesh dependence, iterative convergence, and artificial viscosity upon the resulting statistical model. Thirteen statistically significant effects were observed to have an influence on rocket-based combined-cycle nozzle performance. It was apparent that the free-expansion process, directly downstream of the rocket nozzle, can influence the Isp efficiency. Numerical schlieren images and particle traces have been used to further understand the physical phenomena behind several of the statistically significant results.
Applying Sociocultural Theory to Teaching Statistics for Doctoral Social Work Students

ERIC Educational Resources Information Center

Mogro-Wilson, Cristina; Reeves, Michael G.; Charter, Mollie Lazar

2015-01-01

This article describes the development of two doctoral-level multivariate statistics courses utilizing sociocultural theory, an integrative pedagogical framework. In the first course, the implementation of sociocultural theory helps to support the students through a rigorous introduction to statistics. The second course involves students…
SU-D-BRA-04: Computerized Framework for Marker-Less Localization of Anatomical Feature Points in Range Images Based On Differential Geometry Features for Image-Guided Radiation Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soufi, M; Arimura, H; Toyofuku, F

Purpose: To propose a computerized framework for localization of anatomical feature points on the patient surface in infrared-ray based range images by using differential geometry (curvature) features. Methods: The general concept was to reconstruct the patient surface by using a mathematical modeling technique for the computation of differential geometry features that characterize the local shapes of the patient surfaces. A region of interest (ROI) was firstly extracted based on a template matching technique applied on amplitude (grayscale) images. The extracted ROI was preprocessed for reducing temporal and spatial noises by using Kalman and bilateral filters, respectively. Next, a smooth patientmore » surface was reconstructed by using a non-uniform rational basis spline (NURBS) model. Finally, differential geometry features, i.e. the shape index and curvedness features were computed for localizing the anatomical feature points. The proposed framework was trained for optimizing shape index and curvedness thresholds and tested on range images of an anthropomorphic head phantom. The range images were acquired by an infrared ray-based time-of-flight (TOF) camera. The localization accuracy was evaluated by measuring the mean of minimum Euclidean distances (MMED) between reference (ground truth) points and the feature points localized by the proposed framework. The evaluation was performed for points localized on convex regions (e.g. apex of nose) and concave regions (e.g. nasofacial sulcus). Results: The proposed framework has localized anatomical feature points on convex and concave anatomical landmarks with MMEDs of 1.91±0.50 mm and 3.70±0.92 mm, respectively. A statistically significant difference was obtained between the feature points on the convex and concave regions (P<0.001). Conclusion: Our study has shown the feasibility of differential geometry features for localization of anatomical feature points on the patient surface in range images. The proposed framework might be useful for tasks involving feature-based image registration in range-image guided radiation therapy.« less
A surrogate-based sensitivity quantification and Bayesian inversion of a regional groundwater flow model

NASA Astrophysics Data System (ADS)

Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.; Amerjeed, Mansoor

2018-02-01

Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the high-fidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARS-based Sobol' method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system.
A novel latent gaussian copula framework for modeling spatial correlation in quantized SAR imagery with applications to ATR

NASA Astrophysics Data System (ADS)

Thelen, Brian T.; Xique, Ismael J.; Burns, Joseph W.; Goley, G. Steven; Nolan, Adam R.; Benson, Jonathan W.

2017-04-01

With all of the new remote sensing modalities available, and with ever increasing capabilities and frequency of collection, there is a desire to fundamentally understand/quantify the information content in the collected image data relative to various exploitation goals, such as detection/classification. A fundamental approach for this is the framework of Bayesian decision theory, but a daunting challenge is to have significantly flexible and accurate multivariate models for the features and/or pixels that capture a wide assortment of distributions and dependen- cies. In addition, data can come in the form of both continuous and discrete representations, where the latter is often generated based on considerations of robustness to imaging conditions and occlusions/degradations. In this paper we propose a novel suite of "latent" models fundamentally based on multivariate Gaussian copula models that can be used for quantized data from SAR imagery. For this Latent Gaussian Copula (LGC) model, we derive an approximate, maximum-likelihood estimation algorithm and demonstrate very reasonable estimation performance even for the larger images with many pixels. However applying these LGC models to large dimen- sions/images within a Bayesian decision/classification theory is infeasible due to the computational/numerical issues in evaluating the true full likelihood, and we propose an alternative class of novel pseudo-likelihoood detection statistics that are computationally feasible. We show in a few simple examples that these statistics have the potential to provide very good and robust detection/classification performance. All of this framework is demonstrated on a simulated SLICY data set, and the results show the importance of modeling the dependencies, and of utilizing the pseudo-likelihood methods.
Towards a resource-based habitat approach for spatial modelling of vector-borne disease risks.

PubMed

Hartemink, Nienke; Vanwambeke, Sophie O; Purse, Bethan V; Gilbert, Marius; Van Dyck, Hans

2015-11-01

Given the veterinary and public health impact of vector-borne diseases, there is a clear need to assess the suitability of landscapes for the emergence and spread of these diseases. Current approaches for predicting disease risks neglect key features of the landscape as components of the functional habitat of vectors or hosts, and hence of the pathogen. Empirical-statistical methods do not explicitly incorporate biological mechanisms, whereas current mechanistic models are rarely spatially explicit; both methods ignore the way animals use the landscape (i.e. movement ecology). We argue that applying a functional concept for habitat, i.e. the resource-based habitat concept (RBHC), can solve these issues. The RBHC offers a framework to identify systematically the different ecological resources that are necessary for the completion of the transmission cycle and to relate these resources to (combinations of) landscape features and other environmental factors. The potential of the RBHC as a framework for identifying suitable habitats for vector-borne pathogens is explored and illustrated with the case of bluetongue virus, a midge-transmitted virus affecting ruminants. The concept facilitates the study of functional habitats of the interacting species (vectors as well as hosts) and provides new insight into spatial and temporal variation in transmission opportunities and exposure that ultimately determine disease risks. It may help to identify knowledge gaps and control options arising from changes in the spatial configuration of key resources across the landscape. The RBHC framework may act as a bridge between existing mechanistic and statistical modelling approaches. © 2014 The Authors. Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
A theoretical Gaussian framework for anomalous change detection in hyperspectral images

NASA Astrophysics Data System (ADS)

Acito, Nicola; Diani, Marco; Corsini, Giovanni

2017-10-01

Exploitation of temporal series of hyperspectral images is a relatively new discipline that has a wide variety of possible applications in fields like remote sensing, area surveillance, defense and security, search and rescue and so on. In this work, we discuss how images taken at two different times can be processed to detect changes caused by insertion, deletion or displacement of small objects in the monitored scene. This problem is known in the literature as anomalous change detection (ACD) and it can be viewed as the extension, to the multitemporal case, of the well-known anomaly detection problem in a single image. In fact, in both cases, the hyperspectral images are processed blindly in an unsupervised manner and without a-priori knowledge about the target spectrum. We introduce the ACD problem using an approach based on the statistical decision theory and we derive a common framework including different ACD approaches. Particularly, we clearly define the observation space, the data statistical distribution conditioned to the two competing hypotheses and the procedure followed to come with the solution. The proposed overview places emphasis on techniques based on the multivariate Gaussian model that allows a formal presentation of the ACD problem and the rigorous derivation of the possible solutions in a way that is both mathematically more tractable and easier to interpret. We also discuss practical problems related to the application of the detectors in the real world and present affordable solutions. Namely, we describe the ACD processing chain including the strategies that are commonly adopted to compensate pervasive radiometric changes, caused by the different illumination/atmospheric conditions, and to mitigate the residual geometric image co-registration errors. Results obtained on real freely available data are discussed in order to test and compare the methods within the proposed general framework.
Reconciling statistical and systems science approaches to public health.

PubMed

Ip, Edward H; Rahmandad, Hazhir; Shoham, David A; Hammond, Ross; Huang, Terry T-K; Wang, Youfa; Mabry, Patricia L

2013-10-01

Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision.
Reconciling Statistical and Systems Science Approaches to Public Health

PubMed Central

Ip, Edward H.; Rahmandad, Hazhir; Shoham, David A.; Hammond, Ross; Huang, Terry T.-K.; Wang, Youfa; Mabry, Patricia L.

2016-01-01

Although systems science has emerged as a set of innovative approaches to study complex phenomena, many topically focused researchers including clinicians and scientists working in public health are somewhat befuddled by this methodology that at times appears to be radically different from analytic methods, such as statistical modeling, to which the researchers are accustomed. There also appears to be conflicts between complex systems approaches and traditional statistical methodologies, both in terms of their underlying strategies and the languages they use. We argue that the conflicts are resolvable, and the sooner the better for the field. In this article, we show how statistical and systems science approaches can be reconciled, and how together they can advance solutions to complex problems. We do this by comparing the methods within a theoretical framework based on the work of population biologist Richard Levins. We present different types of models as representing different tradeoffs among the four desiderata of generality, realism, fit, and precision. PMID:24084395
Methods for Assessment of Memory Reactivation.

PubMed

Liu, Shizhao; Grosmark, Andres D; Chen, Zhe

2018-04-13

It has been suggested that reactivation of previously acquired experiences or stored information in declarative memories in the hippocampus and neocortex contributes to memory consolidation and learning. Understanding memory consolidation depends crucially on the development of robust statistical methods for assessing memory reactivation. To date, several statistical methods have seen established for assessing memory reactivation based on bursts of ensemble neural spike activity during offline states. Using population-decoding methods, we propose a new statistical metric, the weighted distance correlation, to assess hippocampal memory reactivation (i.e., spatial memory replay) during quiet wakefulness and slow-wave sleep. The new metric can be combined with an unsupervised population decoding analysis, which is invariant to latent state labeling and allows us to detect statistical dependency beyond linearity in memory traces. We validate the new metric using two rat hippocampal recordings in spatial navigation tasks. Our proposed analysis framework may have a broader impact on assessing memory reactivations in other brain regions under different behavioral tasks.

Conducting Human Research

DTIC Science & Technology

2009-08-05

Socio-cultural data acquisition, extraction, and management.??? First the idea of a theoretical framework will be very briefly discussed as well as...SUBJECT TERMS human behavior, theoretical framework , hypothesis development, experimental design, ethical research, statistical power, human laboratory...who throw rocks? • How can we make them stay too far away to throw rocks? UNCLASSIFIED – Approved for Public Release Theoretical Framework / Conceptual
An Example-Based Brain MRI Simulation Framework.

PubMed

He, Qing; Roy, Snehashis; Jog, Amod; Pham, Dzung L

2015-02-21

The simulation of magnetic resonance (MR) images plays an important role in the validation of image analysis algorithms such as image segmentation, due to lack of sufficient ground truth in real MR images. Previous work on MRI simulation has focused on explicitly modeling the MR image formation process. However, because of the overwhelming complexity of MR acquisition these simulations must involve simplifications and approximations that can result in visually unrealistic simulated images. In this work, we describe an example-based simulation framework, which uses an "atlas" consisting of an MR image and its anatomical models derived from the hard segmentation. The relationships between the MR image intensities and its anatomical models are learned using a patch-based regression that implicitly models the physics of the MR image formation. Given the anatomical models of a new brain, a new MR image can be simulated using the learned regression. This approach has been extended to also simulate intensity inhomogeneity artifacts based on the statistical model of training data. Results show that the example based MRI simulation method is capable of simulating different image contrasts and is robust to different choices of atlas. The simulated images resemble real MR images more than simulations produced by a physics-based model.
MIDAS: Regionally linear multivariate discriminative statistical mapping.

PubMed

Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos

2018-07-01

Statistical parametric maps formed via voxel-wise mass-univariate tests, such as the general linear model, are commonly used to test hypotheses about regionally specific effects in neuroimaging cross-sectional studies where each subject is represented by a single image. Despite being informative, these techniques remain limited as they ignore multivariate relationships in the data. Most importantly, the commonly employed local Gaussian smoothing, which is important for accounting for registration errors and making the data follow Gaussian distributions, is usually chosen in an ad hoc fashion. Thus, it is often suboptimal for the task of detecting group differences and correlations with non-imaging variables. Information mapping techniques, such as searchlight, which use pattern classifiers to exploit multivariate information and obtain more powerful statistical maps, have become increasingly popular in recent years. However, existing methods may lead to important interpretation errors in practice (i.e., misidentifying a cluster as informative, or failing to detect truly informative voxels), while often being computationally expensive. To address these issues, we introduce a novel efficient multivariate statistical framework for cross-sectional studies, termed MIDAS, seeking highly sensitive and specific voxel-wise brain maps, while leveraging the power of regional discriminant analysis. In MIDAS, locally linear discriminative learning is applied to estimate the pattern that best discriminates between two groups, or predicts a variable of interest. This pattern is equivalent to local filtering by an optimal kernel whose coefficients are the weights of the linear discriminant. By composing information from all neighborhoods that contain a given voxel, MIDAS produces a statistic that collectively reflects the contribution of the voxel to the regional classifiers as well as the discriminative power of the classifiers. Critically, MIDAS efficiently assesses the statistical significance of the derived statistic by analytically approximating its null distribution without the need for computationally expensive permutation tests. The proposed framework was extensively validated using simulated atrophy in structural magnetic resonance imaging (MRI) and further tested using data from a task-based functional MRI study as well as a structural MRI study of cognitive performance. The performance of the proposed framework was evaluated against standard voxel-wise general linear models and other information mapping methods. The experimental results showed that MIDAS achieves relatively higher sensitivity and specificity in detecting group differences. Together, our results demonstrate the potential of the proposed approach to efficiently map effects of interest in both structural and functional data. Copyright © 2018. Published by Elsevier Inc.
The epistemology of mathematical and statistical modeling: a quiet methodological revolution.

PubMed

Rodgers, Joseph Lee

2010-01-01

A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rule-based framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models. Copyrigiht 2009 APA, all rights reserved.
Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data.

PubMed

Carmichael, Owen; Sakhanenko, Lyudmila

2015-05-15

We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way.
Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data

PubMed Central

Carmichael, Owen; Sakhanenko, Lyudmila

2015-01-01

We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way. PMID:25937674
SHARE: system design and case studies for statistical health information release

PubMed Central

Gardner, James; Xiong, Li; Xiao, Yonghui; Gao, Jingjing; Post, Andrew R; Jiang, Xiaoqian; Ohno-Machado, Lucila

2013-01-01

Objectives We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. Materials and Methods SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. Results Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. Conclusions SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses. PMID:23059729
Are sectioning and soldering of short-span implant-supported prostheses necessary procedures?

PubMed

Bianchini, Marco A; Souza, João G O; Souza, Dircilene C; Magini, Ricardo S; Benfatti, Cesar A M; Cardoso, Antonio C

2011-01-01

The aim of this study was to evaluate the fit between dental abutments and the metal framework of a 3-unit fixed prosthesis screwed to two implants to determine whether sectioning and soldering of the framework are in fact necessary procedures. The study was based on a model of a metal framework of a 3-unit prosthesis screwed to two implants. A total of 18 metal frameworks were constructed and divided into 3 groups: (1) NS group - each framework was cast in one piece and not sectioned; (2) CS group - the components of each sectioned framework were joined by conventional soldering; and (3) LW group - the components of each sectioned framework were joined by laser welding. The control group consisted of six silver-palladium alloy copings that were not cast together. Two analyses were mperformed: in the first analysis, the framework was screwed only to the first abutment, and in the second analysis, the framework was screwed to both abutments. The prosthetic fit was assessed at a single point using a measuring microscope (Measurescope, Nikon, Japan) and the marginal gap was measured in micrometers. Statistical analysis was performed using analysis of variance (ANOVA), Scheffe's test, Student's t-test, and Mann-Whitney U test. The NS group had larger marginal gaps than the other groups (p<0.01), while the CS and LW groups had a similar degree of misfit with no significant difference between them. The results revealed that, in the case of short-span 3-unit fixed prostheses, the framework should be sectioned and soldered or welded to prevent or reduce marginal gaps between the metal framework and dental abutments.
Phenomenology of small violations of Fermi and Bose statistics

NASA Astrophysics Data System (ADS)

Greenberg, O. W.; Mohapatra, Rabindra N.

1989-04-01

In a recent paper, we proposed a ``paronic'' field-theory framework for possible small deviations from the Pauli exclusion principle. This theory cannot be represented in a positive-metric (Hilbert) space. Nonetheless, the issue of possible small violations of the exclusion principle can be addressed in the framework of quantum mechanics, without being connected with a local quantum field theory. In this paper, we discuss the phenomenology of small violations of both Fermi and Bose statistics. We consider the implications of such violations in atomic, nuclear, particle, and condensed-matter physics and in astrophysics and cosmology. We also discuss experiments that can detect small violations of Fermi and Bose statistics or place stringent bounds on their validity.
The Development of a Professional Statistics Teaching Identity

ERIC Educational Resources Information Center

Whitaker, Douglas

2016-01-01

Motivated by the increased statistics expectations for students and their teachers because of the widespread adoption of the Common Core State Standards for Mathematics, this study explores exemplary, in-service statistics teachers' professional identities using a theoretical framework informed by Gee (2000) and communities of practice (Lave &…
An Effective Post-Filtering Framework for 3-D PET Image Denoising Based on Noise and Sensitivity Characteristics

NASA Astrophysics Data System (ADS)

Kim, Ji Hye; Ahn, Il Jun; Nam, Woo Hyun; Ra, Jong Beom

2015-02-01

Positron emission tomography (PET) images usually suffer from a noticeable amount of statistical noise. In order to reduce this noise, a post-filtering process is usually adopted. However, the performance of this approach is limited because the denoising process is mostly performed on the basis of the Gaussian random noise. It has been reported that in a PET image reconstructed by the expectation-maximization (EM), the noise variance of each voxel depends on its mean value, unlike in the case of Gaussian noise. In addition, we observe that the variance also varies with the spatial sensitivity distribution in a PET system, which reflects both the solid angle determined by a given scanner geometry and the attenuation information of a scanned object. Thus, if a post-filtering process based on the Gaussian random noise is applied to PET images without consideration of the noise characteristics along with the spatial sensitivity distribution, the spatially variant non-Gaussian noise cannot be reduced effectively. In the proposed framework, to effectively reduce the noise in PET images reconstructed by the 3-D ordinary Poisson ordered subset EM (3-D OP-OSEM), we first denormalize an image according to the sensitivity of each voxel so that the voxel mean value can represent its statistical properties reliably. Based on our observation that each noisy denormalized voxel has a linear relationship between the mean and variance, we try to convert this non-Gaussian noise image to a Gaussian noise image. We then apply a block matching 4-D algorithm that is optimized for noise reduction of the Gaussian noise image, and reconvert and renormalize the result to obtain a final denoised image. Using simulated phantom data and clinical patient data, we demonstrate that the proposed framework can effectively suppress the noise over the whole region of a PET image while minimizing degradation of the image resolution.
Dendritic tree extraction from noisy maximum intensity projection images in C. elegans.

PubMed

Greenblum, Ayala; Sznitman, Raphael; Fua, Pascal; Arratia, Paulo E; Oren, Meital; Podbilewicz, Benjamin; Sznitman, Josué

2014-06-12

Maximum Intensity Projections (MIP) of neuronal dendritic trees obtained from confocal microscopy are frequently used to study the relationship between tree morphology and mechanosensory function in the model organism C. elegans. Extracting dendritic trees from noisy images remains however a strenuous process that has traditionally relied on manual approaches. Here, we focus on automated and reliable 2D segmentations of dendritic trees following a statistical learning framework. Our dendritic tree extraction (DTE) method uses small amounts of labelled training data on MIPs to learn noise models of texture-based features from the responses of tree structures and image background. Our strategy lies in evaluating statistical models of noise that account for both the variability generated from the imaging process and from the aggregation of information in the MIP images. These noisy models are then used within a probabilistic, or Bayesian framework to provide a coarse 2D dendritic tree segmentation. Finally, some post-processing is applied to refine the segmentations and provide skeletonized trees using a morphological thinning process. Following a Leave-One-Out Cross Validation (LOOCV) method for an MIP databse with available "ground truth" images, we demonstrate that our approach provides significant improvements in tree-structure segmentations over traditional intensity-based methods. Improvements for MIPs under various imaging conditions are both qualitative and quantitative, as measured from Receiver Operator Characteristic (ROC) curves and the yield and error rates in the final segmentations. In a final step, we demonstrate our DTE approach on previously unseen MIP samples including the extraction of skeletonized structures, and compare our method to a state-of-the art dendritic tree tracing software. Overall, our DTE method allows for robust dendritic tree segmentations in noisy MIPs, outperforming traditional intensity-based methods. Such approach provides a useable segmentation framework, ultimately delivering a speed-up for dendritic tree identification on the user end and a reliable first step towards further morphological characterizations of tree arborization.
A 3D object-based model to simulate highly-heterogeneous, coarse, braided river deposits

NASA Astrophysics Data System (ADS)

Huber, E.; Huggenberger, P.; Caers, J.

2016-12-01

There is a critical need in hydrogeological modeling for geologically more realistic representation of the subsurface. Indeed, widely-used representations of the subsurface heterogeneity based on smooth basis functions such as cokriging or the pilot-point approach fail at reproducing the connectivity of high permeable geological structures that control subsurface solute transport. To realistically model the connectivity of high permeable structures of coarse, braided river deposits, multiple-point statistics and object-based models are promising alternatives. We therefore propose a new object-based model that, according to a sedimentological model, mimics the dominant processes of floodplain dynamics. Contrarily to existing models, this object-based model possesses the following properties: (1) it is consistent with field observations (outcrops, ground-penetrating radar data, etc.), (2) it allows different sedimentological dynamics to be modeled that result in different subsurface heterogeneity patterns, (3) it is light in memory and computationally fast, and (4) it can be conditioned to geophysical data. In this model, the main sedimentological elements (scour fills with open-framework-bimodal gravel cross-beds, gravel sheet deposits, open-framework and sand lenses) and their internal structures are described by geometrical objects. Several spatial distributions are proposed that allow to simulate the horizontal position of the objects on the floodplain as well as the net rate of sediment deposition. The model is grid-independent and any vertical section can be computed algebraically. Furthermore, model realizations can serve as training images for multiple-point statistics. The significance of this model is shown by its impact on the subsurface flow distribution that strongly depends on the sedimentological dynamics modeled. The code will be provided as a free and open-source R-package.
A Complementary Note to 'A Lag-1 Smoother Approach to System-Error Estimation': The Intrinsic Limitations of Residual Diagnostics

NASA Technical Reports Server (NTRS)

Todling, Ricardo

2015-01-01

Recently, this author studied an approach to the estimation of system error based on combining observation residuals derived from a sequential filter and fixed lag-1 smoother. While extending the methodology to a variational formulation, experimenting with simple models and making sure consistency was found between the sequential and variational formulations, the limitations of the residual-based approach came clearly to the surface. This note uses the sequential assimilation application to simple nonlinear dynamics to highlight the issue. Only when some of the underlying error statistics are assumed known is it possible to estimate the unknown component. In general, when considerable uncertainties exist in the underlying statistics as a whole, attempts to obtain separate estimates of the various error covariances are bound to lead to misrepresentation of errors. The conclusions are particularly relevant to present-day attempts to estimate observation-error correlations from observation residual statistics. A brief illustration of the issue is also provided by comparing estimates of error correlations derived from a quasi-operational assimilation system and a corresponding Observing System Simulation Experiments framework.
Maternal parental self-efficacy in the postpartum period.

PubMed

Leahy-Warren, Patricia; McCarthy, Geraldine

2011-12-01

To present an integrated literature review on maternal parental self-efficacy (MPSE) in the postpartum period. A literature search of CINAHL with full text and MEDLINE and PsycINFO from their start dates to February 2010. Inclusion criteria were English written research articles which reported the measurement of MPSE in the postpartum period. Articles were reviewed based on purpose, theoretical framework, data collection method, sample, main findings and nursing implications for maternal parenting. In addition, data related to the instruments that were used to measure MPSE were included. Data revealed is a statistically significant increase in MPSE over time from baseline; a positive relationship between MPSE and number of children, social support, maternal parenting satisfaction and marital satisfaction; and a negative relationship between MPSE and maternal stress, anxiety and postpartum depression. A variety of instruments to measure MPSE were used but the majority were based on Bandura's framework. Findings from this review may assist women's health researchers and clinical nurses/midwives in assessing and developing appropriate interventions for increasing risk awareness, enhancing MPSE and subsequent satisfaction with parenting and emotional well-being. Further research is necessary underpinned by theoretical frameworks using domain-specific instruments to identify predictors of MPSE. Copyright © 2010 Elsevier Ltd. All rights reserved.
From Cues to Nudge: A Knowledge-Based Framework for Surveillance of Healthcare-Associated Infections.

PubMed

Shaban-Nejad, Arash; Mamiya, Hiroshi; Riazanov, Alexandre; Forster, Alan J; Baker, Christopher J O; Tamblyn, Robyn; Buckeridge, David L

2016-01-01

We propose an integrated semantic web framework consisting of formal ontologies, web services, a reasoner and a rule engine that together recommend appropriate level of patient-care based on the defined semantic rules and guidelines. The classification of healthcare-associated infections within the HAIKU (Hospital Acquired Infections - Knowledge in Use) framework enables hospitals to consistently follow the standards along with their routine clinical practice and diagnosis coding to improve quality of care and patient safety. The HAI ontology (HAIO) groups over thousands of codes into a consistent hierarchy of concepts, along with relationships and axioms to capture knowledge on hospital-associated infections and complications with focus on the big four types, surgical site infections (SSIs), catheter-associated urinary tract infection (CAUTI); hospital-acquired pneumonia, and blood stream infection. By employing statistical inferencing in our study we use a set of heuristics to define the rule axioms to improve the SSI case detection. We also demonstrate how the occurrence of an SSI is identified using semantic e-triggers. The e-triggers will be used to improve our risk assessment of post-operative surgical site infections (SSIs) for patients undergoing certain type of surgeries (e.g., coronary artery bypass graft surgery (CABG)).
Robust Dehaze Algorithm for Degraded Image of CMOS Image Sensors.

PubMed

Qu, Chen; Bi, Du-Yan; Sui, Ping; Chao, Ai-Nong; Wang, Yun-Fei

2017-09-22

The CMOS (Complementary Metal-Oxide-Semiconductor) is a new type of solid image sensor device widely used in object tracking, object recognition, intelligent navigation fields, and so on. However, images captured by outdoor CMOS sensor devices are usually affected by suspended atmospheric particles (such as haze), causing a reduction in image contrast, color distortion problems, and so on. In view of this, we propose a novel dehazing approach based on a local consistent Markov random field (MRF) framework. The neighboring clique in traditional MRF is extended to the non-neighboring clique, which is defined on local consistent blocks based on two clues, where both the atmospheric light and transmission map satisfy the character of local consistency. In this framework, our model can strengthen the restriction of the whole image while incorporating more sophisticated statistical priors, resulting in more expressive power of modeling, thus, solving inadequate detail recovery effectively and alleviating color distortion. Moreover, the local consistent MRF framework can obtain details while maintaining better results for dehazing, which effectively improves the image quality captured by the CMOS image sensor. Experimental results verified that the method proposed has the combined advantages of detail recovery and color preservation.
Using a service sector segmented approach to identify community stakeholders who can improve access to suicide prevention services for veterans.

PubMed

Matthieu, Monica M; Gardiner, Giovanina; Ziegemeier, Ellen; Buxton, Miranda

2014-04-01

Veterans in need of social services may access many different community agencies within the public and private sectors. Each of these settings has the potential to be a pipeline for attaining needed health, mental health, and benefits services; however, many service providers lack information on how to conceptualize where Veterans go for services within their local community. This article describes a conceptual framework for outreach that uses a service sector segmented approach. This framework was developed to aid recruitment of a provider-based sample of stakeholders (N = 70) for a study on improving access to the Department of Veterans Affairs and community-based suicide prevention services. Results indicate that although there are statistically significant differences in the percent of Veterans served by the different service sectors (F(9, 55) = 2.71, p = 0.04), exposure to suicidal Veterans and providers' referral behavior is consistent across the sectors. Challenges to using this framework include isolating the appropriate sectors for targeted outreach efforts. The service sector segmented approach holds promise for identifying and referring at-risk Veterans in need of services. Reprint & Copyright © 2014 Association of Military Surgeons of the U.S.
Extending information retrieval methods to personalized genomic-based studies of disease.

PubMed

Ye, Shuyun; Dawson, John A; Kendziorski, Christina

2014-01-01

Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.
Framework for making better predictions by directly estimating variables’ predictivity

PubMed Central

Chernoff, Herman; Lo, Shaw-Hwa

2016-01-01

We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the I-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the I-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the I-score on real data to demonstrate the statistic’s predictive performance on sample data. We conjecture that using the partition retention and I-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired. PMID:27911830

A LES-based Eulerian-Lagrangian approach to predict the dynamics of bubble plumes

NASA Astrophysics Data System (ADS)

Fraga, Bruño; Stoesser, Thorsten; Lai, Chris C. K.; Socolofsky, Scott A.

2016-01-01

An approach for Eulerian-Lagrangian large-eddy simulation of bubble plume dynamics is presented and its performance evaluated. The main numerical novelties consist in defining the gas-liquid coupling based on the bubble size to mesh resolution ratio (Dp/Δx) and the interpolation between Eulerian and Lagrangian frameworks through the use of delta functions. The model's performance is thoroughly validated for a bubble plume in a cubic tank in initially quiescent water using experimental data obtained from high-resolution ADV and PIV measurements. The predicted time-averaged velocities and second-order statistics show good agreement with the measurements, including the reproduction of the anisotropic nature of the plume's turbulence. Further, the predicted Eulerian and Lagrangian velocity fields, second-order turbulence statistics and interfacial gas-liquid forces are quantified and discussed as well as the visualization of the time-averaged primary and secondary flow structure in the tank.
Precision Cosmology: The First Half Million Years

NASA Astrophysics Data System (ADS)

Jones, Bernard J. T.

2017-06-01

Cosmology seeks to characterise our Universe in terms of models based on well-understood and tested physics. Today we know our Universe with a precision that once would have been unthinkable. This book develops the entire mathematical, physical and statistical framework within which this has been achieved. It tells the story of how we arrive at our profound conclusions, starting from the early twentieth century and following developments up to the latest data analysis of big astronomical datasets. It provides an enlightening description of the mathematical, physical and statistical basis for understanding and interpreting the results of key space- and ground-based data. Subjects covered include general relativity, cosmological models, the inhomogeneous Universe, physics of the cosmic background radiation, and methods and results of data analysis. Extensive online supplementary notes, exercises, teaching materials, and exercises in Python make this the perfect companion for researchers, teachers and students in physics, mathematics, and astrophysics.
regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests.

PubMed

Gel, Bernat; Díez-Villanueva, Anna; Serra, Eduard; Buschbeck, Marcus; Peinado, Miguel A; Malinverni, Roberto

2016-01-15

Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/regioneR). rmalinverni@carrerasresearch.org. © The Author 2015. Published by Oxford University Press.
Extracting Association Patterns in Network Communications

PubMed Central

Portela, Javier; Villalba, Luis Javier García; Trujillo, Alejandra Guadalupe Silva; Orozco, Ana Lucila Sandoval; Kim, Tai-hoon

2015-01-01

In network communications, mixes provide protection against observers hiding the appearance of messages, patterns, length and links between senders and receivers. Statistical disclosure attacks aim to reveal the identity of senders and receivers in a communication network setting when it is protected by standard techniques based on mixes. This work aims to develop a global statistical disclosure attack to detect relationships between users. The only information used by the attacker is the number of messages sent and received by each user for each round, the batch of messages grouped by the anonymity system. A new modeling framework based on contingency tables is used. The assumptions are more flexible than those used in the literature, allowing to apply the method to multiple situations automatically, such as email data or social networks data. A classification scheme based on combinatoric solutions of the space of rounds retrieved is developed. Solutions about relationships between users are provided for all pairs of users simultaneously, since the dependence of the data retrieved needs to be addressed in a global sense. PMID:25679311
Extracting association patterns in network communications.

PubMed

Portela, Javier; Villalba, Luis Javier García; Trujillo, Alejandra Guadalupe Silva; Orozco, Ana Lucila Sandoval; Kim, Tai-hoon

2015-02-11

In network communications, mixes provide protection against observers hiding the appearance of messages, patterns, length and links between senders and receivers. Statistical disclosure attacks aim to reveal the identity of senders and receivers in a communication network setting when it is protected by standard techniques based on mixes. This work aims to develop a global statistical disclosure attack to detect relationships between users. The only information used by the attacker is the number of messages sent and received by each user for each round, the batch of messages grouped by the anonymity system. A new modeling framework based on contingency tables is used. The assumptions are more flexible than those used in the literature, allowing to apply the method to multiple situations automatically, such as email data or social networks data. A classification scheme based on combinatoric solutions of the space of rounds retrieved is developed. Solutions about relationships between users are provided for all pairs of users simultaneously, since the dependence of the data retrieved needs to be addressed in a global sense.
Impact of a community-based prevention marketing intervention to promote physical activity among middle-aged women.

PubMed

Sharpe, Patricia A; Burroughs, Ericka L; Granner, Michelle L; Wilcox, Sara; Hutto, Brent E; Bryant, Carol A; Peck, Lara; Pekuri, Linda

2010-06-01

A physical activity intervention applied principles of community-based participatory research, the community-based prevention marketing framework, and social cognitive theory. A nonrandomized design included women ages 35 to 54 in the southeastern United States. Women (n = 430 preprogram, n = 217 postprogram) enrolled in a 24-week behavioral intervention and were exposed to a media campaign. They were compared to cross-sectional survey samples at pre- (n = 245) and postprogram (n = 820) from the media exposed county and a no-intervention county (n = 234 pre, n = 822 post). Women in the behavioral intervention had statistically significant positive changes on physical activity minutes, walking, park and trail use, knowledge of mapped routes and exercise partner, and negative change on exercise self-efficacy. Media exposed women had statistically significant pre- to postprogram differences on knowledge of mapped routes. No-intervention women had significant pre- to postprogram differences on physical activity minutes, walking, and knowledge of mapped routes.
Basis function models for animal movement

USGS Publications Warehouse

Hooten, Mevin B.; Johnson, Devin S.

2017-01-01

Advances in satellite-based data collection techniques have served as a catalyst for new statistical methodology to analyze these data. In wildlife ecological studies, satellite-based data and methodology have provided a wealth of information about animal space use and the investigation of individual-based animal–environment relationships. With the technology for data collection improving dramatically over time, we are left with massive archives of historical animal telemetry data of varying quality. While many contemporary statistical approaches for inferring movement behavior are specified in discrete time, we develop a flexible continuous-time stochastic integral equation framework that is amenable to reduced-rank second-order covariance parameterizations. We demonstrate how the associated first-order basis functions can be constructed to mimic behavioral characteristics in realistic trajectory processes using telemetry data from mule deer and mountain lion individuals in western North America. Our approach is parallelizable and provides inference for heterogenous trajectories using nonstationary spatial modeling techniques that are feasible for large telemetry datasets. Supplementary materials for this article are available online.
Evidence of non-extensivity and complexity in the seismicity observed during 2011-2012 at the Santorini volcanic complex, Greece

NASA Astrophysics Data System (ADS)

Vallianatos, F.; Tzanis, A.; Michas, G.; Papadakis, G.

2012-04-01

Since the middle of summer 2011, an increase in the seismicity rates of the volcanic complex system of Santorini Island, Greece, was observed. In the present work, the temporal distribution of seismicity, as well as the magnitude distribution of earthquakes, have been studied using the concept of Non-Extensive Statistical Physics (NESP; Tsallis, 2009) along with the evolution of Shanon entropy H (also called information entropy). The analysis is based on the earthquake catalogue of the Geodynamic Institute of the National Observatory of Athens for the period July 2011-January 2012 (http://www.gein.noa.gr/). Non-Extensive Statistical Physics, which is a generalization of Boltzmann-Gibbs statistical physics, seems a suitable framework for studying complex systems. The observed distributions of seismicity rates at Santorini can be described (fitted) with NESP models to exceptionally well. This implies the inherent complexity of the Santorini volcanic seismicity, the applicability of NESP concepts to volcanic earthquake activity and the usefulness of NESP in investigating phenomena exhibiting multifractality and long-range coupling effects. Acknowledgments. This work was supported in part by the THALES Program of the Ministry of Education of Greece and the European Union in the framework of the project entitled "Integrated understanding of Seismicity, using innovative Methodologies of Fracture mechanics along with Earthquake and non extensive statistical physics - Application to the geodynamic system of the Hellenic Arc. SEISMO FEAR HELLARC". GM and GP wish to acknowledge the partial support of the Greek State Scholarships Foundation (ΙΚΥ).
A complete-pelvis segmentation framework for image-free total hip arthroplasty (THA): methodology and clinical study.

PubMed

Xie, Weiguo; Franke, Jochen; Chen, Cheng; Grützner, Paul A; Schumann, Steffen; Nolte, Lutz-P; Zheng, Guoyan

2015-06-01

Complete-pelvis segmentation in antero-posterior pelvic radiographs is required to create a patient-specific three-dimensional pelvis model for surgical planning and postoperative assessment in image-free navigation of total hip arthroplasty. A fast and robust framework for accurately segmenting the complete pelvis is presented, consisting of two consecutive modules. In the first module, a three-stage method was developed to delineate the left hemi-pelvis based on statistical appearance and shape models. To handle complex pelvic structures, anatomy-specific information processing techniques were employed. As the input to the second module, the delineated left hemi-pelvis was then reflected about an estimated symmetry line of the radiograph to initialize the right hemi-pelvis segmentation. The right hemi-pelvis was segmented by the same three-stage method, Two experiments conducted on respectively 143 and 40 AP radiographs demonstrated a mean segmentation accuracy of 1.61±0.68 mm. A clinical study to investigate the postoperative assessment of acetabular cup orientations based on the proposed framework revealed an average accuracy of 1.2°±0.9° and 1.6°±1.4° for anteversion and inclination, respectively. Delineation of each radiograph costs less than one minute. Despite further validation needed, the preliminary results implied the underlying clinical applicability of the proposed framework for image-free THA. Copyright © 2014 John Wiley & Sons, Ltd.
A Unified Theoretical Framework for Cognitive Sequencing.

PubMed

Savalia, Tejas; Shukla, Anuj; Bapi, Raju S

2016-01-01

The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit vs. explicit and goal-directed vs. habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus, attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops basal ganglia-frontal cortex and hippocampus-frontal cortex loops mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI) on developing awareness in implicit learning tasks.
A Unified Theoretical Framework for Cognitive Sequencing

PubMed Central

Savalia, Tejas; Shukla, Anuj; Bapi, Raju S.

2016-01-01

The capacity to sequence information is central to human performance. Sequencing ability forms the foundation stone for higher order cognition related to language and goal-directed planning. Information related to the order of items, their timing, chunking and hierarchical organization are important aspects in sequencing. Past research on sequencing has emphasized two distinct and independent dichotomies: implicit vs. explicit and goal-directed vs. habits. We propose a theoretical framework unifying these two streams. Our proposal relies on brain's ability to implicitly extract statistical regularities from the stream of stimuli and with attentional engagement organizing sequences explicitly and hierarchically. Similarly, sequences that need to be assembled purposively to accomplish a goal require engagement of attentional processes. With repetition, these goal-directed plans become habits with concomitant disengagement of attention. Thus, attention and awareness play a crucial role in the implicit-to-explicit transition as well as in how goal-directed plans become automatic habits. Cortico-subcortical loops basal ganglia-frontal cortex and hippocampus-frontal cortex loops mediate the transition process. We show how the computational principles of model-free and model-based learning paradigms, along with a pivotal role for attention and awareness, offer a unifying framework for these two dichotomies. Based on this framework, we make testable predictions related to the potential influence of response-to-stimulus interval (RSI) on developing awareness in implicit learning tasks. PMID:27917146
The A, C, G, and T of Genome Assembly

PubMed Central

Wajid, Bilal; Sohail, Muhammad U.; Ekti, Ali R.; Serpedin, Erchin

2016-01-01

Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research. PMID:27247941
Social significance of community structure: Statistical view

NASA Astrophysics Data System (ADS)

Li, Hui-Jia; Daniels, Jasmine J.

2015-01-01

Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p -value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.
Sub-grid scale models for discontinuous Galerkin methods based on the Mori-Zwanzig formalism

NASA Astrophysics Data System (ADS)

Parish, Eric; Duraisamy, Karthk

2017-11-01

The optimal prediction framework of Chorin et al., which is a reformulation of the Mori-Zwanzig (M-Z) formalism of non-equilibrium statistical mechanics, provides a framework for the development of mathematically-derived closure models. The M-Z formalism provides a methodology to reformulate a high-dimensional Markovian dynamical system as a lower-dimensional, non-Markovian (non-local) system. In this lower-dimensional system, the effects of the unresolved scales on the resolved scales are non-local and appear as a convolution integral. The non-Markovian system is an exact statement of the original dynamics and is used as a starting point for model development. In this work, we investigate the development of M-Z-based closures model within the context of the Variational Multiscale Method (VMS). The method relies on a decomposition of the solution space into two orthogonal subspaces. The impact of the unresolved subspace on the resolved subspace is shown to be non-local in time and is modeled through the M-Z-formalism. The models are applied to hierarchical discontinuous Galerkin discretizations. Commonalities between the M-Z closures and conventional flux schemes are explored. This work was supported in part by AFOSR under the project ''LES Modeling of Non-local effects using Statistical Coarse-graining'' with Dr. Jean-Luc Cambier as the technical monitor.
Using the ECD Framework to Support Evidentiary Reasoning in the Context of a Simulation Study for Detecting Learner Differences in Epistemic Games

ERIC Educational Resources Information Center

Sweet, Shauna J.; Rupp, Andre A.

2012-01-01

The "evidence-centered design" (ECD) framework is a powerful tool that supports careful and critical thinking about the identification and accumulation of evidence in assessment contexts. In this paper, we demonstrate how the ECD framework provides critical support for designing simulation studies to investigate statistical methods…
Climate Projections from the NARCliM Project: Bayesian Model Averaging of Maximum Temperature Projections

NASA Astrophysics Data System (ADS)

Olson, R.; Evans, J. P.; Fan, Y.

2015-12-01

NARCliM (NSW/ACT Regional Climate Modelling Project) is a regional climate project for Australia and the surrounding region. It dynamically downscales 4 General Circulation Models (GCMs) using three Regional Climate Models (RCMs) to provide climate projections for the CORDEX-AustralAsia region at 50 km resolution, and for south-east Australia at 10 km resolution. The project differs from previous work in the level of sophistication of model selection. Specifically, the selection process for GCMs included (i) conducting literature review to evaluate model performance, (ii) analysing model independence, and (iii) selecting models that span future temperature and precipitation change space. RCMs for downscaling the GCMs were chosen based on their performance for several precipitation events over South-East Australia, and on model independence.Bayesian Model Averaging (BMA) provides a statistically consistent framework for weighing the models based on their likelihood given the available observations. These weights are used to provide probability distribution functions (pdfs) for model projections. We develop a BMA framework for constructing probabilistic climate projections for spatially-averaged variables from the NARCliM project. The first step in the procedure is smoothing model output in order to exclude the influence of internal climate variability. Our statistical model for model-observations residuals is a homoskedastic iid process. Comparing RCMs with Australian Water Availability Project (AWAP) observations is used to determine model weights through Monte Carlo integration. Posterior pdfs of statistical parameters of model-data residuals are obtained using Markov Chain Monte Carlo. The uncertainty in the properties of the model-data residuals is fully accounted for when constructing the projections. We present the preliminary results of the BMA analysis for yearly maximum temperature for New South Wales state planning regions for the period 2060-2079.
A new statistical framework to assess structural alignment quality using information compression

PubMed Central

Collier, James H.; Allison, Lloyd; Lesk, Arthur M.; Garcia de la Banda, Maria; Konagurthu, Arun S.

2014-01-01

Motivation: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. Results: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. Availability: http://lcb.infotech.monash.edu.au/I-value Contact: arun.konagurthu@monash.edu Supplementary information: Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html PMID:25161241
Influence of laser-welding and electroerosion on passive fit of implant-supported prosthesis.

PubMed

Silva, Tatiana Bernardon; De Arruda Nobilo, Mauro Antonio; Pessanha Henriques, Guilherme Elias; Mesquita, Marcelo Ferraz; Guimaraes, Magali Beck

2008-01-01

This study investigated the influence of laser welding and electroerosion procedure on the passive fit of interim fixed implant-supported titanium frameworks. Twenty frameworks were made from a master model, with five parallel placed implants in the inter foramen region, and cast in commercially pure titanium. The frameworks were divided into 4 groups: 10 samples were tested before (G1) and after (G2) electroerosion application; and another 10 were sectioned into five pieces and laser welded before (G3) and after (G4) electroerosion application. The passive fit between the UCLA abutment of the framework and the implant was evaluated using an optical microscope Olympus STM (Olympus Optical Co., Tokyo, Japan) with 0.0005mm of accuracy. Statistical analyses showed significant differences between G1 and G2, G1 and G3, G1 and G4, G2 and G4. However, no statistical difference was observed when comparing G2 and G3. These results indicate that frameworks may show a more precise adaptation if they are sectioned and laser welded. In the same way, electroerosion improves the precision in the framework adaptation.
Journal selection decisions: a biomedical library operations research model. I. The framework.

PubMed Central

Kraft, D H; Polacsek, R A; Soergel, L; Burns, K; Klair, A

1976-01-01

The problem of deciding which journal titles to select for acquisition in a biomedical library is modeled. The approach taken is based on cost/benefit ratios. Measures of journal worth, methods of data collection, and journal cost data are considered. The emphasis is on the development of a practical process for selecting journal titles, based on the objectivity and rationality of the model; and on the collection of the approprate data and library statistics in a reasonable manner. The implications of this process towards an overall management information system (MIS) for biomedical serials handling are discussed. PMID:820391
Workflow based framework for life science informatics.

PubMed

Tiwari, Abhishek; Sekhar, Arvind K T

2007-10-01

Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.

Quantum generalized observables framework for psychological data: a case of preference reversals in US elections

NASA Astrophysics Data System (ADS)

Khrennikova, Polina; Haven, Emmanuel

2017-10-01

Politics is regarded as a vital area of public choice theory, and it is strongly relying on the assumptions of voters' rationality and as such, stability of preferences. However, recent opinion polls and real election outcomes in the USA have shown that voters often engage in `ticket splitting', by exhibiting contrasting party support in Congressional and Presidential elections (cf. Khrennikova 2014 Phys. Scripta T163, 014010 (doi:10.1088/0031-8949/2014/T163/014010); Khrennikova & Haven 2016 Phil. Trans. R. Soc. A 374, 20150106 (doi:10.1098/rsta.2015.0106); Smith et al. 1999 Am. J. Polit. Sci. 43, 737-764 (doi:10.2307/2991833)). Such types of preference reversals cannot be mathematically captured via the formula of total probability, thus showing that voters' decision making is at variance with the classical probabilistic information processing framework. In recent work, we have shown that quantum probability describes well the violation of Bayesian rationality in statistical data of voting in US elections, through the so-called interference effects of probability amplitudes. This paper is proposing a novel generalized observables framework of voting behaviour, by using the statistical data collected and analysed in previous studies by Khrennikova (Khrennikova 2015 Lect. Notes Comput. Sci. 8951, 196-209) and Khrennikova & Haven (Khrennikova & Haven 2016 Phil. Trans. R. Soc. A 374, 20150106 (doi:10.1098/rsta.2015.0106)). This framework aims to overcome the main problems associated with the quantum probabilistic representation of psychological data, namely the non-double stochasticity of transition probability matrices. We develop a simplified construction of generalized positive operator valued measures by formulating special non-orthonormal bases with respect to these operators. This article is part of the themed issue `Second quantum revolution: foundational questions'.
An order statistics approach to the halo model for galaxies

NASA Astrophysics Data System (ADS)

Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.

2017-04-01

We use the halo model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the 'central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the lognormal distribution around this mean and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering; however, this model predicts no luminosity dependence of large-scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically underpredicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the halo model for galaxies with more physically motivated galaxy formation models.
A Framework for Performing Multiscale Stochastic Progressive Failure Analysis of Composite Structures

NASA Technical Reports Server (NTRS)

Bednarcyk, Brett A.; Arnold, Steven M.

2006-01-01

A framework is presented that enables coupled multiscale analysis of composite structures. The recently developed, free, Finite Element Analysis - Micromechanics Analysis Code (FEAMAC) software couples the Micromechanics Analysis Code with Generalized Method of Cells (MAC/GMC) with ABAQUS to perform micromechanics based FEA such that the nonlinear composite material response at each integration point is modeled at each increment by MAC/GMC. As a result, the stochastic nature of fiber breakage in composites can be simulated through incorporation of an appropriate damage and failure model that operates within MAC/GMC on the level of the fiber. Results are presented for the progressive failure analysis of a titanium matrix composite tensile specimen that illustrate the power and utility of the framework and address the techniques needed to model the statistical nature of the problem properly. In particular, it is shown that incorporating fiber strength randomness on multiple scales improves the quality of the simulation by enabling failure at locations other than those associated with structural level stress risers.
A new framework for interactive quality assessment with application to light field coding

NASA Astrophysics Data System (ADS)

Viola, Irene; Ebrahimi, Touradj

2017-09-01

In recent years, light field has experienced a surge of popularity, mainly due to the recent advances in acquisition and rendering technologies that have made it more accessible to the public. Thanks to image-based rendering techniques, light field contents can be rendered in real time on common 2D screens, allowing virtual navigation through the captured scenes in an interactive fashion. However, this richer representation of the scene poses the problem of reliable quality assessments for light field contents. In particular, while subjective methodologies that enable interaction have already been proposed, no work has been done on assessing how users interact with light field contents. In this paper, we propose a new framework to subjectively assess the quality of light field contents in an interactive manner and simultaneously track users behaviour. The framework is successfully used to perform subjective assessment of two coding solutions. Moreover, statistical analysis performed on the results shows interesting correlation between subjective scores and average interaction time.
A Framework for Performing Multiscale Stochastic Progressive Failure Analysis of Composite Structures

NASA Technical Reports Server (NTRS)

Bednarcyk, Brett A.; Arnold, Steven M.

2007-01-01

A framework is presented that enables coupled multiscale analysis of composite structures. The recently developed, free, Finite Element Analysis-Micromechanics Analysis Code (FEAMAC) software couples the Micromechanics Analysis Code with Generalized Method of Cells (MAC/GMC) with ABAQUS to perform micromechanics based FEA such that the nonlinear composite material response at each integration point is modeled at each increment by MAC/GMC. As a result, the stochastic nature of fiber breakage in composites can be simulated through incorporation of an appropriate damage and failure model that operates within MAC/GMC on the level of the fiber. Results are presented for the progressive failure analysis of a titanium matrix composite tensile specimen that illustrate the power and utility of the framework and address the techniques needed to model the statistical nature of the problem properly. In particular, it is shown that incorporating fiber strength randomness on multiple scales improves the quality of the simulation by enabling failure at locations other than those associated with structural level stress risers.
A novel framework for improvement of road accidents considering decision-making styles of drivers in a large metropolitan area.

PubMed

Azadeh, Ali; Zarrin, Mansour; Hamid, Mehdi

2016-02-01

Road accidents can be caused by different factors such as human factors. Quality of the decision-making process of drivers could have a considerable impact on preventing disasters. The main objective of this study is the analysis of factors affecting road accidents by considering the severity of accidents and decision-making styles of drivers. To this end, a novel framework is proposed based on data envelopment analysis (DEA) and statistical methods (SMs) to assess the factors affecting road accidents. In this study, for the first time, dominant decision-making styles of drivers with respect to severity of injuries are identified. To show the applicability of the proposed framework, this research employs actual data of more than 500 samples in Tehran, Iran. The empirical results indicate that the flexible decision style is the dominant style for both minor and severe levels of accident injuries. Copyright © 2015 Elsevier Ltd. All rights reserved.
Systematic and fully automated identification of protein sequence patterns.

PubMed

Hart, R K; Royyuru, A K; Stolovitzky, G; Califano, A

2000-01-01

We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

PubMed

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-07-01

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

PubMed Central

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-01-01

Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689
Spatial Modeling for Resources Framework (SMRF): A modular framework for developing spatial forcing data for snow modeling in mountain basins

NASA Astrophysics Data System (ADS)

Havens, Scott; Marks, Danny; Kormos, Patrick; Hedrick, Andrew

2017-12-01

In the Western US and many mountainous regions of the world, critical water resources and climate conditions are difficult to monitor because the observation network is generally very sparse. The critical resource from the mountain snowpack is water flowing into streams and reservoirs that will provide for irrigation, flood control, power generation, and ecosystem services. Water supply forecasting in a rapidly changing climate has become increasingly difficult because of non-stationary conditions. In response, operational water supply managers have begun to move from statistical techniques towards the use of physically based models. As we begin to transition physically based models from research to operational use, we must address the most difficult and time-consuming aspect of model initiation: the need for robust methods to develop and distribute the input forcing data. In this paper, we present a new open source framework, the Spatial Modeling for Resources Framework (SMRF), which automates and simplifies the common forcing data distribution methods. It is computationally efficient and can be implemented for both research and operational applications. We present an example of how SMRF is able to generate all of the forcing data required to a run physically based snow model at 50-100 m resolution over regions of 1000-7000 km2. The approach has been successfully applied in real time and historical applications for both the Boise River Basin in Idaho, USA and the Tuolumne River Basin in California, USA. These applications use meteorological station measurements and numerical weather prediction model outputs as input. SMRF has significantly streamlined the modeling workflow, decreased model set up time from weeks to days, and made near real-time application of a physically based snow model possible.
Distributed decision-making in electric power system transmission maintenance scheduling using multi-agent systems (MAS)

NASA Astrophysics Data System (ADS)

Zhang, Zhong

In this work, motivated by the need to coordinate transmission maintenance scheduling among a multiplicity of self-interested entities in restructured power industry, a distributed decision support framework based on multiagent negotiation systems (MANS) is developed. An innovative risk-based transmission maintenance optimization procedure is introduced. Several models for linking condition monitoring information to the equipment's instantaneous failure probability are presented, which enable quantitative evaluation of the effectiveness of maintenance activities in terms of system cumulative risk reduction. Methodologies of statistical processing, equipment deterioration evaluation and time-dependent failure probability calculation are also described. A novel framework capable of facilitating distributed decision-making through multiagent negotiation is developed. A multiagent negotiation model is developed and illustrated that accounts for uncertainty and enables social rationality. Some issues of multiagent negotiation convergence and scalability are discussed. The relationships between agent-based negotiation and auction systems are also identified. A four-step MAS design methodology for constructing multiagent systems for power system applications is presented. A generic multiagent negotiation system, capable of inter-agent communication and distributed decision support through inter-agent negotiations, is implemented. A multiagent system framework for facilitating the automated integration of condition monitoring information and maintenance scheduling for power transformers is developed. Simulations of multiagent negotiation-based maintenance scheduling among several independent utilities are provided. It is shown to be a viable alternative solution paradigm to the traditional centralized optimization approach in today's deregulated environment. This multiagent system framework not only facilitates the decision-making among competing power system entities, but also provides a tool to use in studying competitive industry relative to monopolistic industry.
Statistical Research of Investment Development of Russian Regions

ERIC Educational Resources Information Center

Burtseva, Tatiana A.; Aleshnikova, Vera I.; Dubovik, Mayya V.; Naidenkova, Ksenya V.; Kovalchuk, Nadezda B.; Repetskaya, Natalia V.; Kuzmina, Oksana G.; Surkov, Anton A.; Bershadskaya, Olga I.; Smirennikova, Anna V.

2016-01-01

This article the article is concerned with a substantiation of procedures ensuring the implementation of statistical research and monitoring of investment development of the Russian regions, which would be pertinent for modern development of the state statistics. The aim of the study is to develop the methodological framework in order to estimate…
Long-term strategy for the statistical design of a forest health monitoring system

Treesearch

Hans T. Schreuder; Raymond L. Czaplewski

1993-01-01

A conceptual framework is given for a broad-scale survey of forest health that accomplishes three objectives: generate descriptive statistics; detect changes in such statistics; and simplify analytical inferences that identify, and possibly establish cause-effect relationships. Our paper discusses the development of sampling schemes to satisfy these three objectives,...
Using protein-protein interactions for refining gene networks estimated from microarray data by Bayesian networks.

PubMed

Nariai, N; Kim, S; Imoto, S; Miyano, S

2004-01-01

We propose a statistical method to estimate gene networks from DNA microarray data and protein-protein interactions. Because physical interactions between proteins or multiprotein complexes are likely to regulate biological processes, using only mRNA expression data is not sufficient for estimating a gene network accurately. Our method adds knowledge about protein-protein interactions to the estimation method of gene networks under a Bayesian statistical framework. In the estimated gene network, a protein complex is modeled as a virtual node based on principal component analysis. We show the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae cell cycle data. The proposed method improves the accuracy of the estimated gene networks, and successfully identifies some biological facts.
Robust estimation approach for blind denoising.

PubMed

Rabie, Tamer

2005-11-01

This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.
Statistical Learning Analysis in Neuroscience: Aiming for Transparency

PubMed Central

Hanke, Michael; Halchenko, Yaroslav O.; Haxby, James V.; Pollmann, Stefan

2009-01-01

Encouraged by a rise of reciprocal interest between the machine learning and neuroscience communities, several recent studies have demonstrated the explanatory power of statistical learning techniques for the analysis of neural data. In order to facilitate a wider adoption of these methods, neuroscientific research needs to ensure a maximum of transparency to allow for comprehensive evaluation of the employed procedures. We argue that such transparency requires “neuroscience-aware” technology for the performance of multivariate pattern analyses of neural data that can be documented in a comprehensive, yet comprehensible way. Recently, we introduced PyMVPA, a specialized Python framework for machine learning based data analysis that addresses this demand. Here, we review its features and applicability to various neural data modalities. PMID:20582270
Memory Effects and Nonequilibrium Correlations in the Dynamics of Open Quantum Systems

NASA Astrophysics Data System (ADS)

Morozov, V. G.

2018-01-01

We propose a systematic approach to the dynamics of open quantum systems in the framework of Zubarev's nonequilibrium statistical operator method. The approach is based on the relation between ensemble means of the Hubbard operators and the matrix elements of the reduced statistical operator of an open quantum system. This key relation allows deriving master equations for open systems following a scheme conceptually identical to the scheme used to derive kinetic equations for distribution functions. The advantage of the proposed formalism is that some relevant dynamical correlations between an open system and its environment can be taken into account. To illustrate the method, we derive a non-Markovian master equation containing the contribution of nonequilibrium correlations associated with energy conservation.
Examining statewide capacity for school health and mental health promotion: a post hoc application of a district capacity-building framework.

PubMed

Maras, Melissa A; Weston, Karen J; Blacksmith, Jennifer; Brophy, Chelsey

2015-03-01

Schools must possess a variety of capacities to effectively support comprehensive and coordinated school health promotion activities, and researchers have developed a district-level capacity-building framework specific to school health promotion. State-level school health coalitions often support such capacity-building efforts and should embed this work within a data-based, decision-making model. However, there is a lack of guidance for state school health coalitions on how they should collect and use data. This article uses a district-level capacity-building framework to interpret findings from a statewide coordinated school health needs/resource assessment in order to examine statewide capacity for school health promotion. Participants included school personnel (N = 643) from one state. Descriptive statistics were calculated for survey items, with further examination of subgroup differences among school administrators and nurses. Results were then interpreted via a post hoc application of a district-level capacity-building framework. Findings across districts revealed statewide strengths and gaps with regard to leadership and management capacities, internal and external supports, and an indicator of global capacity. Findings support the utility of using a common framework across local and state levels to align efforts and embed capacity-building activities within a data-driven, continuous improvement model. © 2014 Society for Public Health Education.
A framework for streamlining research workflow in neuroscience and psychology

PubMed Central

Kubilius, Jonas

2014-01-01

Successful accumulation of knowledge is critically dependent on the ability to verify and replicate every part of scientific conduct. However, such principles are difficult to enact when researchers continue to resort on ad-hoc workflows and with poorly maintained code base. In this paper I examine the needs of neuroscience and psychology community, and introduce psychopy_ext, a unifying framework that seamlessly integrates popular experiment building, analysis and manuscript preparation tools by choosing reasonable defaults and implementing relatively rigid patterns of workflow. This structure allows for automation of multiple tasks, such as generated user interfaces, unit testing, control analyses of stimuli, single-command access to descriptive statistics, and publication quality plotting. Taken together, psychopy_ext opens an exciting possibility for a faster, more robust code development and collaboration for researchers. PMID:24478691
Sequential Inverse Problems Bayesian Principles and the Logistic Map Example

NASA Astrophysics Data System (ADS)

Duan, Lian; Farmer, Chris L.; Moroz, Irene M.

2010-09-01

Bayesian statistics provides a general framework for solving inverse problems, but is not without interpretation and implementation problems. This paper discusses difficulties arising from the fact that forward models are always in error to some extent. Using a simple example based on the one-dimensional logistic map, we argue that, when implementation problems are minimal, the Bayesian framework is quite adequate. In this paper the Bayesian Filter is shown to be able to recover excellent state estimates in the perfect model scenario (PMS) and to distinguish the PMS from the imperfect model scenario (IMS). Through a quantitative comparison of the way in which the observations are assimilated in both the PMS and the IMS scenarios, we suggest that one can, sometimes, measure the degree of imperfection.

Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database

NASA Astrophysics Data System (ADS)

Anagnostopoulos, Christos Nikolaos; Vovoli, Eftichia

An emotion recognition framework based on sound processing could improve services in human-computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.
Domain generality vs. modality specificity: The paradox of statistical learning

PubMed Central

Frost, Ram; Armstrong, Blair C.; Siegelman, Noam; Christiansen, Morten H.

2015-01-01

Statistical learning is typically considered to be a domain-general mechanism by which cognitive systems discover the underlying distributional properties of the input. Recent studies examining whether there are commonalities in the learning of distributional information across different domains or modalities consistently reveal, however, modality and stimulus specificity. An important question is, therefore, how and why a hypothesized domain-general learning mechanism systematically produces such effects. We offer a theoretical framework according to which statistical learning is not a unitary mechanism, but a set of domain-general computational principles, that operate in different modalities and therefore are subject to the specific constraints characteristic of their respective brain regions. This framework offers testable predictions and we discuss its computational and neurobiological plausibility. PMID:25631249
Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility.

PubMed

Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H

2014-04-11

Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.
Large Ensemble Analytic Framework for Consequence-Driven Discovery of Climate Change Scenarios

NASA Astrophysics Data System (ADS)

Lamontagne, Jonathan R.; Reed, Patrick M.; Link, Robert; Calvin, Katherine V.; Clarke, Leon E.; Edmonds, James A.

2018-03-01

An analytic scenario generation framework is developed based on the idea that the same climate outcome can result from very different socioeconomic and policy drivers. The framework builds on the Scenario Matrix Framework's abstraction of "challenges to mitigation" and "challenges to adaptation" to facilitate the flexible discovery of diverse and consequential scenarios. We combine visual and statistical techniques for interrogating a large factorial data set of 33,750 scenarios generated using the Global Change Assessment Model. We demonstrate how the analytic framework can aid in identifying which scenario assumptions are most tied to user-specified measures for policy relevant outcomes of interest, specifically for our example high or low mitigation costs. We show that the current approach for selecting reference scenarios can miss policy relevant scenario narratives that often emerge as hybrids of optimistic and pessimistic scenario assumptions. We also show that the same scenario assumption can be associated with both high and low mitigation costs depending on the climate outcome of interest and the mitigation policy context. In the illustrative example, we show how agricultural productivity, population growth, and economic growth are most predictive of the level of mitigation costs. Formulating policy relevant scenarios of deeply and broadly uncertain futures benefits from large ensemble-based exploration of quantitative measures of consequences. To this end, we have contributed a large database of climate change futures that can support "bottom-up" scenario generation techniques that capture a broader array of consequences than those that emerge from limited sampling of a few reference scenarios.
PRECISE:PRivacy-prEserving Cloud-assisted quality Improvement Service in hEalthcare

PubMed Central

Chen, Feng; Wang, Shuang; Mohammed, Noman; Cheng, Samuel; Jiang, Xiaoqian

2015-01-01

Quality improvement (QI) requires systematic and continuous efforts to enhance healthcare services. A healthcare provider might wish to compare local statistics with those from other institutions in order to identify problems and develop intervention to improve the quality of care. However, the sharing of institution information may be deterred by institutional privacy as publicizing such statistics could lead to embarrassment and even financial damage. In this article, we propose a PRivacy-prEserving Cloud-assisted quality Improvement Service in hEalthcare (PRECISE), which aims at enabling cross-institution comparison of healthcare statistics while protecting privacy. The proposed framework relies on a set of state-of-the-art cryptographic protocols including homomorphic encryption and Yao’s garbled circuit schemes. By securely pooling data from different institutions, PRECISE can rank the encrypted statistics to facilitate QI among participating institutes. We conducted experiments using MIMIC II database and demonstrated the feasibility of the proposed PRECISE framework. PMID:26146645
PRECISE:PRivacy-prEserving Cloud-assisted quality Improvement Service in hEalthcare.

PubMed

Chen, Feng; Wang, Shuang; Mohammed, Noman; Cheng, Samuel; Jiang, Xiaoqian

2014-10-01

Quality improvement (QI) requires systematic and continuous efforts to enhance healthcare services. A healthcare provider might wish to compare local statistics with those from other institutions in order to identify problems and develop intervention to improve the quality of care. However, the sharing of institution information may be deterred by institutional privacy as publicizing such statistics could lead to embarrassment and even financial damage. In this article, we propose a PRivacy-prEserving Cloud-assisted quality Improvement Service in hEalthcare (PRECISE), which aims at enabling cross-institution comparison of healthcare statistics while protecting privacy. The proposed framework relies on a set of state-of-the-art cryptographic protocols including homomorphic encryption and Yao's garbled circuit schemes. By securely pooling data from different institutions, PRECISE can rank the encrypted statistics to facilitate QI among participating institutes. We conducted experiments using MIMIC II database and demonstrated the feasibility of the proposed PRECISE framework.
Generating action descriptions from statistically integrated representations of human motions and sentences.

PubMed

Takano, Wataru; Kusajima, Ikuo; Nakamura, Yoshihiko

2016-08-01

It is desirable for robots to be able to linguistically understand human actions during human-robot interactions. Previous research has developed frameworks for encoding human full body motion into model parameters and for classifying motion into specific categories. For full understanding, the motion categories need to be connected to the natural language such that the robots can interpret human motions as linguistic expressions. This paper proposes a novel framework for integrating observation of human motion with that of natural language. This framework consists of two models; the first model statistically learns the relations between motions and their relevant words, and the second statistically learns sentence structures as word n-grams. Integration of these two models allows robots to generate sentences from human motions by searching for words relevant to the motion using the first model and then arranging these words in appropriate order using the second model. This allows making sentences that are the most likely to be generated from the motion. The proposed framework was tested on human full body motion measured by an optical motion capture system. In this, descriptive sentences were manually attached to the motions, and the validity of the system was demonstrated. Copyright © 2016 Elsevier Ltd. All rights reserved.
Marginal discrepancy of CAD-CAM complete-arch fixed implant-supported frameworks.

PubMed

Yilmaz, Burak; Kale, Ediz; Johnston, William M

2018-02-21

Computer-aided design and computer-aided manufacturing (CAD-CAM) high-density polymers (HDPs) have recently been marketed for the fabrication of long-term interim implant-supported fixed prostheses. However, information regarding the precision of fit of CAD-CAM HDP implant-supported complete-arch screw-retained prostheses is scarce. The purpose of this in vitro study was to evaluate the marginal discrepancy of CAD-CAM HDP complete-arch implant-supported screw-retained fixed prosthesis frameworks and compare them with conventional titanium (Ti) and zirconia (Zir) frameworks. A screw-retained complete-arch acrylic resin prototype with multiunit abutments was fabricated on a typodont model with 2 straight implants in the anterior region and 2 implants with a 30-degree distal tilt in the posterior region. A 3-dimensional (3D) laboratory laser scanner was used to digitize the typodont model with scan bodies and the resin prototype to generate a virtual 3D CAD framework. A CAM milling unit was used to fabricate 5 frameworks from HDP, Ti, and Zir blocks. The 1-screw test was performed by tightening the prosthetic screw in the maxillary left first molar abutment (terminal location) when the frameworks were on the typodont model, and the marginal discrepancy of frameworks was evaluated using an industrial computed tomographic scanner and a 3D volumetric software. The 3D marginal discrepancy at the abutment-framework interface of the maxillary left canine (L1), right canine (L2), and right first molar (L3) sites was measured. The mean values for 3D marginal discrepancy were calculated for each location in a group with 95% confidence limits. The results were analyzed by repeated-measures 2-way ANOVA using the restricted maximum likelihood estimation and the Satterthwaite degrees of freedom methods, which do not require normality and homoscedasticity in the data. The between-subjects factor was material, the within-subjects factor was location, and the interaction was included in the model. Tukey tests were applied to resolve any statistically significant source of variation (overall α=.05). The 3D marginal discrepancy measurement was possible only for L2 and L3 because the L1 values were too small to detect. The mean discrepancy values at L2 were 60 μm for HDP, 74 μm for Ti, and 84 μm for Zir. At the L3 location, the mean discrepancy values were 55 μm for HDP, 102 μm for Ti, and 94 μm for Zir. The ANOVA did not find a statistically significant overall effect for implant location (P=.072) or a statistically significant interaction of location and material (P=.078), but it did find a statistically significant overall effect of material (P=.019). Statistical differences were found overall between HDP and the other 2 materials (P≤.037). When the tested materials were used with the CAD-CAM system, the 3D marginal discrepancy of CAD-CAM HDP frameworks was smaller than that of titanium or zirconia frameworks. Copyright © 2017 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
A Framework for Restructuring the Military Retirement System

DTIC Science & Technology

2013-07-01

Associate Professor of Economics in the Social Sciences Department at West Point where he teaches econometrics and labor economics. His areas of...others worth considering, but each should be carefully benchmarked against our proposed framework. 25 ENDNOTES 1. Office of the Actuary , Statistical
A Nonlinear Framework of Delayed Particle Smoothing Method for Vehicle Localization under Non-Gaussian Environment.

PubMed

Xiao, Zhu; Havyarimana, Vincent; Li, Tong; Wang, Dong

2016-05-13

In this paper, a novel nonlinear framework of smoothing method, non-Gaussian delayed particle smoother (nGDPS), is proposed, which enables vehicle state estimation (VSE) with high accuracy taking into account the non-Gaussianity of the measurement and process noises. Within the proposed method, the multivariate Student's t-distribution is adopted in order to compute the probability distribution function (PDF) related to the process and measurement noises, which are assumed to be non-Gaussian distributed. A computation approach based on Ensemble Kalman Filter (EnKF) is designed to cope with the mean and the covariance matrix of the proposal non-Gaussian distribution. A delayed Gibbs sampling algorithm, which incorporates smoothing of the sampled trajectories over a fixed-delay, is proposed to deal with the sample degeneracy of particles. The performance is investigated based on the real-world data, which is collected by low-cost on-board vehicle sensors. The comparison study based on the real-world experiments and the statistical analysis demonstrates that the proposed nGDPS has significant improvement on the vehicle state accuracy and outperforms the existing filtering and smoothing methods.
PNNL Report on the Development of Bench-scale CFD Simulations for Gas Absorption across a Wetted Wall Column

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Chao; Xu, Zhijie; Lai, Canhai

This report is prepared for the demonstration of hierarchical prediction of carbon capture efficiency of a solvent-based absorption column. A computational fluid dynamics (CFD) model is first developed to simulate the core phenomena of solvent-based carbon capture, i.e., the CO2 physical absorption and chemical reaction, on a simplified geometry of wetted wall column (WWC) at bench scale. Aqueous solutions of ethanolamine (MEA) are commonly selected as a CO2 stream scrubbing liquid. CO2 is captured by both physical and chemical absorption using highly CO2 soluble and reactive solvent, MEA, during the scrubbing process. In order to provide confidence bound on themore » computational predictions of this complex engineering system, a hierarchical calibration and validation framework is proposed. The overall goal of this effort is to provide a mechanism-based predictive framework with confidence bound for overall mass transfer coefficient of the wetted wall column (WWC) with statistical analyses of the corresponding WWC experiments with increasing physical complexity.« less
A Semiparametric Approach for Composite Functional Mapping of Dynamic Quantitative Traits

PubMed Central

Yang, Runqing; Gao, Huijiang; Wang, Xin; Zhang, Ji; Zeng, Zhao-Bang; Wu, Rongling

2007-01-01

Functional mapping has emerged as a powerful tool for mapping quantitative trait loci (QTL) that control developmental patterns of complex dynamic traits. Original functional mapping has been constructed within the context of simple interval mapping, without consideration of separate multiple linked QTL for a dynamic trait. In this article, we present a statistical framework for mapping QTL that affect dynamic traits by capitalizing on the strengths of functional mapping and composite interval mapping. Within this so-called composite functional-mapping framework, functional mapping models the time-dependent genetic effects of a QTL tested within a marker interval using a biologically meaningful parametric function, whereas composite interval mapping models the time-dependent genetic effects of the markers outside the test interval to control the genome background using a flexible nonparametric approach based on Legendre polynomials. Such a semiparametric framework was formulated by a maximum-likelihood model and implemented with the EM algorithm, allowing for the estimation and the test of the mathematical parameters that define the QTL effects and the regression coefficients of the Legendre polynomials that describe the marker effects. Simulation studies were performed to investigate the statistical behavior of composite functional mapping and compare its advantage in separating multiple linked QTL as compared to functional mapping. We used the new mapping approach to analyze a genetic mapping example in rice, leading to the identification of multiple QTL, some of which are linked on the same chromosome, that control the developmental trajectory of leaf age. PMID:17947431
Mapping and discrimination of networks in the complexity-entropy plane

NASA Astrophysics Data System (ADS)

Wiedermann, Marc; Donges, Jonathan F.; Kurths, Jürgen; Donner, Reik V.

2017-10-01

Complex networks are usually characterized in terms of their topological, spatial, or information-theoretic properties and combinations of the associated metrics are used to discriminate networks into different classes or categories. However, even with the present variety of characteristics at hand it still remains a subject of current research to appropriately quantify a network's complexity and correspondingly discriminate between different types of complex networks, like infrastructure or social networks, on such a basis. Here we explore the possibility to classify complex networks by means of a statistical complexity measure that has formerly been successfully applied to distinguish different types of chaotic and stochastic time series. It is composed of a network's averaged per-node entropic measure characterizing the network's information content and the associated Jenson-Shannon divergence as a measure of disequilibrium. We study 29 real-world networks and show that networks of the same category tend to cluster in distinct areas of the resulting complexity-entropy plane. We demonstrate that within our framework, connectome networks exhibit among the highest complexity while, e.g., transportation and infrastructure networks display significantly lower values. Furthermore, we demonstrate the utility of our framework by applying it to families of random scale-free and Watts-Strogatz model networks. We then show in a second application that the proposed framework is useful to objectively construct threshold-based networks, such as functional climate networks or recurrence networks, by choosing the threshold such that the statistical network complexity is maximized.
Optimal population prediction of sandhill crane recruitment based on climate-mediated habitat limitations

USGS Publications Warehouse

Gerber, Brian D.; Kendall, William L.; Hooten, Mevin B.; Dubovsky, James A.; Drewien, Roderick C.

2015-01-01

Prediction is fundamental to scientific enquiry and application; however, ecologists tend to favour explanatory modelling. We discuss a predictive modelling framework to evaluate ecological hypotheses and to explore novel/unobserved environmental scenarios to assist conservation and management decision-makers. We apply this framework to develop an optimal predictive model for juvenile (<1 year old) sandhill crane Grus canadensis recruitment of the Rocky Mountain Population (RMP). We consider spatial climate predictors motivated by hypotheses of how drought across multiple time-scales and spring/summer weather affects recruitment.Our predictive modelling framework focuses on developing a single model that includes all relevant predictor variables, regardless of collinearity. This model is then optimized for prediction by controlling model complexity using a data-driven approach that marginalizes or removes irrelevant predictors from the model. Specifically, we highlight two approaches of statistical regularization, Bayesian least absolute shrinkage and selection operator (LASSO) and ridge regression.Our optimal predictive Bayesian LASSO and ridge regression models were similar and on average 37% superior in predictive accuracy to an explanatory modelling approach. Our predictive models confirmed a priori hypotheses that drought and cold summers negatively affect juvenile recruitment in the RMP. The effects of long-term drought can be alleviated by short-term wet spring–summer months; however, the alleviation of long-term drought has a much greater positive effect on juvenile recruitment. The number of freezing days and snowpack during the summer months can also negatively affect recruitment, while spring snowpack has a positive effect.Breeding habitat, mediated through climate, is a limiting factor on population growth of sandhill cranes in the RMP, which could become more limiting with a changing climate (i.e. increased drought). These effects are likely not unique to cranes. The alteration of hydrological patterns and water levels by drought may impact many migratory, wetland nesting birds in the Rocky Mountains and beyond.Generalizable predictive models (trained by out-of-sample fit and based on ecological hypotheses) are needed by conservation and management decision-makers. Statistical regularization improves predictions and provides a general framework for fitting models with a large number of predictors, even those with collinearity, to simultaneously identify an optimal predictive model while conducting rigorous Bayesian model selection. Our framework is important for understanding population dynamics under a changing climate and has direct applications for making harvest and habitat management decisions.
Spatial analysis of electricity demand patterns in Greece: Application of a GIS-based methodological framework

NASA Astrophysics Data System (ADS)

Tyralis, Hristos; Mamassis, Nikos; Photis, Yorgos N.

2016-04-01

We investigate various uses of electricity demand in Greece (agricultural, commercial, domestic, industrial use as well as use for public and municipal authorities and street lightning) and we examine their relation with variables such as population, total area, population density and the Gross Domestic Product. The analysis is performed on data which span from 2008 to 2012 and have annual temporal resolution and spatial resolution down to the level of prefecture. We both visualize the results of the analysis and we perform cluster and outlier analysis using the Anselin local Moran's I statistic as well as hot spot analysis using the Getis-Ord Gi* statistic. The definition of the spatial patterns and relationships of the aforementioned variables in a GIS environment provides meaningful insight and better understanding of the regional development model in Greece and justifies the basis for an energy demand forecasting methodology. Acknowledgement: This research has been partly financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: ARISTEIA II: Reinforcement of the interdisciplinary and/ or inter-institutional research and innovation (CRESSENDO project; grant number 5145).
A model-based approach to wildland fire reconstruction using sediment charcoal records

USGS Publications Warehouse

Itter, Malcolm S.; Finley, Andrew O.; Hooten, Mevin B.; Higuera, Philip E.; Marlon, Jennifer R.; Kelly, Ryan; McLachlan, Jason S.

2017-01-01

Lake sediment charcoal records are used in paleoecological analyses to reconstruct fire history, including the identification of past wildland fires. One challenge of applying sediment charcoal records to infer fire history is the separation of charcoal associated with local fire occurrence and charcoal originating from regional fire activity. Despite a variety of methods to identify local fires from sediment charcoal records, an integrated statistical framework for fire reconstruction is lacking. We develop a Bayesian point process model to estimate the probability of fire associated with charcoal counts from individual-lake sediments and estimate mean fire return intervals. A multivariate extension of the model combines records from multiple lakes to reduce uncertainty in local fire identification and estimate a regional mean fire return interval. The univariate and multivariate models are applied to 13 lakes in the Yukon Flats region of Alaska. Both models resulted in similar mean fire return intervals (100–350 years) with reduced uncertainty under the multivariate model due to improved estimation of regional charcoal deposition. The point process model offers an integrated statistical framework for paleofire reconstruction and extends existing methods to infer regional fire history from multiple lake records with uncertainty following directly from posterior distributions.
Molecular Monte Carlo Simulations Using Graphics Processing Units: To Waste Recycle or Not?

PubMed

Kim, Jihan; Rodgers, Jocelyn M; Athènes, Manuel; Smit, Berend

2011-10-11

In the waste recycling Monte Carlo (WRMC) algorithm, (1) multiple trial states may be simultaneously generated and utilized during Monte Carlo moves to improve the statistical accuracy of the simulations, suggesting that such an algorithm may be well posed for implementation in parallel on graphics processing units (GPUs). In this paper, we implement two waste recycling Monte Carlo algorithms in CUDA (Compute Unified Device Architecture) using uniformly distributed random trial states and trial states based on displacement random-walk steps, and we test the methods on a methane-zeolite MFI framework system to evaluate their utility. We discuss the specific implementation details of the waste recycling GPU algorithm and compare the methods to other parallel algorithms optimized for the framework system. We analyze the relationship between the statistical accuracy of our simulations and the CUDA block size to determine the efficient allocation of the GPU hardware resources. We make comparisons between the GPU and the serial CPU Monte Carlo implementations to assess speedup over conventional microprocessors. Finally, we apply our optimized GPU algorithms to the important problem of determining free energy landscapes, in this case for molecular motion through the zeolite LTA.
A Sensor Driven Probabilistic Method for Enabling Hyper Resolution Flood Simulations

NASA Astrophysics Data System (ADS)

Fries, K. J.; Salas, F.; Kerkez, B.

2016-12-01

A reduction in the cost of sensors and wireless communications is now enabling researchers and local governments to make flow, stage and rain measurements at locations that are not covered by existing USGS or state networks. We ask the question: how should these new sources of densified, street-level sensor measurements be used to make improved forecasts using the National Water Model (NWM)? Assimilating these data "into" the NWM can be challenging due to computational complexity, as well as heterogeneity of sensor and other input data. Instead, we introduce a machine learning and statistical framework that layers these data "on top" of the NWM outputs to improve high-resolution hydrologic and hydraulic forecasting. By generalizing our approach into a post-processing framework, a rapidly repeatable blueprint is generated for for decision makers who want to improve local forecasts by coupling sensor data with the NWM. We present preliminary results based on case studies in highly instrumented watersheds in the US. Through the use of statistical learning tools and hydrologic routing schemes, we demonstrate the ability of our approach to improve forecasts while simultaneously characterizing bias and uncertainty in the NWM.
Modeling exposure–lag–response associations with distributed lag non-linear models

PubMed Central

Gasparrini, Antonio

2014-01-01

In biomedical research, a health effect is frequently associated with protracted exposures of varying intensity sustained in the past. The main complexity of modeling and interpreting such phenomena lies in the additional temporal dimension needed to express the association, as the risk depends on both intensity and timing of past exposures. This type of dependency is defined here as exposure–lag–response association. In this contribution, I illustrate a general statistical framework for such associations, established through the extension of distributed lag non-linear models, originally developed in time series analysis. This modeling class is based on the definition of a cross-basis, obtained by the combination of two functions to flexibly model linear or nonlinear exposure-responses and the lag structure of the relationship, respectively. The methodology is illustrated with an example application to cohort data and validated through a simulation study. This modeling framework generalizes to various study designs and regression models, and can be applied to study the health effects of protracted exposures to environmental factors, drugs or carcinogenic agents, among others. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24027094
Characterizing the reliability of a bioMEMS-based cantilever sensor

NASA Astrophysics Data System (ADS)

Bhalerao, Kaustubh D.

2004-12-01

The cantilever-based BioMEMS sensor represents one instance from many competing ideas of biosensor technology based on Micro Electro Mechanical Systems. The advancement of BioMEMS from laboratory-scale experiments to applications in the field will require standardization of their components and manufacturing procedures as well as frameworks to evaluate their performance. Reliability, the likelihood with which a system performs its intended task, is a compact mathematical description of its performance. The mathematical and statistical foundation of systems-reliability has been applied to the cantilever-based BioMEMS sensor. The sensor is designed to detect one aspect of human ovarian cancer, namely the over-expression of the folate receptor surface protein (FR-alpha). Even as the application chosen is clinically motivated, the objective of this study was to demonstrate the underlying systems-based methodology used to design, develop and evaluate the sensor. The framework development can be readily extended to other BioMEMS-based devices for disease detection and will have an impact in the rapidly growing $30 bn industry. The Unified Modeling Language (UML) is a systems-based framework for design and development of object-oriented information systems which has potential application for use in systems designed to interact with biological environments. The UML has been used to abstract and describe the application of the biosensor, to identify key components of the biosensor, and the technology needed to link them together in a coherent manner. The use of the framework is also demonstrated in computation of system reliability from first principles as a function of the structure and materials of the biosensor. The outcomes of applying the systems-based framework to the study are the following: (1) Characterizing the cantilever-based MEMS device for disease (cell) detection. (2) Development of a novel chemical interface between the analyte and the sensor that provides a degree of selectivity towards the disease. (3) Demonstrating the performance and measuring the reliability of the biosensor prototype, and (4) Identification of opportunities in technological development in order to further refine the proposed biosensor. Application of the methodology to design develop and evaluate the reliability of BioMEMS devices will be beneficial in the streamlining the growth of the BioMEMS industry, while providing a decision-support tool in comparing and adopting suitable technologies from available competing options.

Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests.

PubMed

Sadiq, Saad; Yan, Yilin; Shyu, Mei-Ling; Chen, Shu-Ching; Ishwaran, Hemant

2016-07-01

Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.
The 4D hyperspherical diffusion wavelet: A new method for the detection of localized anatomical variation.

PubMed

Hosseinbor, Ameer Pasha; Kim, Won Hwa; Adluru, Nagesh; Acharya, Amit; Vorperian, Houri K; Chung, Moo K

2014-01-01

Recently, the HyperSPHARM algorithm was proposed to parameterize multiple disjoint objects in a holistic manner using the 4D hyperspherical harmonics. The HyperSPHARM coefficients are global; they cannot be used to directly infer localized variations in signal. In this paper, we present a unified wavelet framework that links Hyper-SPHARM to the diffusion wavelet transform. Specifically, we will show that the HyperSPHARM basis forms a subset of a wavelet-based multiscale representation of surface-based signals. This wavelet, termed the hyperspherical diffusion wavelet, is a consequence of the equivalence of isotropic heat diffusion smoothing and the diffusion wavelet transform on the hypersphere. Our framework allows for the statistical inference of highly localized anatomical changes, which we demonstrate in the first-ever developmental study on the hyoid bone investigating gender and age effects. We also show that the hyperspherical wavelet successfully picks up group-wise differences that are barely detectable using SPHARM.
The 4D Hyperspherical Diffusion Wavelet: A New Method for the Detection of Localized Anatomical Variation

PubMed Central

Hosseinbor, A. Pasha; Kim, Won Hwa; Adluru, Nagesh; Acharya, Amit; Vorperian, Houri K.; Chung, Moo K.

2014-01-01

Recently, the HyperSPHARM algorithm was proposed to parameterize multiple disjoint objects in a holistic manner using the 4D hyperspherical harmonics. The HyperSPHARM coefficients are global; they cannot be used to directly infer localized variations in signal. In this paper, we present a unified wavelet framework that links HyperSPHARM to the diffusion wavelet transform. Specifically, we will show that the HyperSPHARM basis forms a subset of a wavelet-based multiscale representation of surface-based signals. This wavelet, termed the hyperspherical diffusion wavelet, is a consequence of the equivalence of isotropic heat diffusion smoothing and the diffusion wavelet transform on the hypersphere. Our framework allows for the statistical inference of highly localized anatomical changes, which we demonstrate in the firstever developmental study on the hyoid bone investigating gender and age effects. We also show that the hyperspherical wavelet successfully picks up group-wise differences that are barely detectable using SPHARM. PMID:25320783
A Probabilistic Framework for Peptide and Protein Quantification from Data-Dependent and Data-Independent LC-MS Proteomics Experiments

PubMed Central

Richardson, Keith; Denny, Richard; Hughes, Chris; Skilling, John; Sikora, Jacek; Dadlez, Michał; Manteca, Angel; Jung, Hye Ryung; Jensen, Ole Nørregaard; Redeker, Virginie; Melki, Ronald; Langridge, James I.; Vissers, Johannes P.C.

2013-01-01

A probability-based quantification framework is presented for the calculation of relative peptide and protein abundance in label-free and label-dependent LC-MS proteomics data. The results are accompanied by credible intervals and regulation probabilities. The algorithm takes into account data uncertainties via Poisson statistics modified by a noise contribution that is determined automatically during an initial normalization stage. Protein quantification relies on assignments of component peptides to the acquired data. These assignments are generally of variable reliability and may not be present across all of the experiments comprising an analysis. It is also possible for a peptide to be identified to more than one protein in a given mixture. For these reasons the algorithm accepts a prior probability of peptide assignment for each intensity measurement. The model is constructed in such a way that outliers of any type can be automatically reweighted. Two discrete normalization methods can be employed. The first method is based on a user-defined subset of peptides, while the second method relies on the presence of a dominant background of endogenous peptides for which the concentration is assumed to be unaffected. Normalization is performed using the same computational and statistical procedures employed by the main quantification algorithm. The performance of the algorithm will be illustrated on example data sets, and its utility demonstrated for typical proteomics applications. The quantification algorithm supports relative protein quantification based on precursor and product ion intensities acquired by means of data-dependent methods, originating from all common isotopically-labeled approaches, as well as label-free ion intensity-based data-independent methods. PMID:22871168
From Sub-basin to Grid Scale Soil Moisture Disaggregation in SMART, A Semi-distributed Hydrologic Modeling Framework

NASA Astrophysics Data System (ADS)

Ajami, H.; Sharma, A.

2016-12-01

A computationally efficient, semi-distributed hydrologic modeling framework is developed to simulate water balance at a catchment scale. The Soil Moisture and Runoff simulation Toolkit (SMART) is based upon the delineation of contiguous and topologically connected Hydrologic Response Units (HRUs). In SMART, HRUs are delineated using thresholds obtained from topographic and geomorphic analysis of a catchment, and simulation elements are distributed cross sections or equivalent cross sections (ECS) delineated in first order sub-basins. ECSs are formulated by aggregating topographic and physiographic properties of the part or entire first order sub-basins to further reduce computational time in SMART. Previous investigations using SMART have shown that temporal dynamics of soil moisture are well captured at a HRU level using the ECS delineation approach. However, spatial variability of soil moisture within a given HRU is ignored. Here, we examined a number of disaggregation schemes for soil moisture distribution in each HRU. The disaggregation schemes are either based on topographic based indices or a covariance matrix obtained from distributed soil moisture simulations. To assess the performance of the disaggregation schemes, soil moisture simulations from an integrated land surface-groundwater model, ParFlow.CLM in Baldry sub-catchment, Australia are used. ParFlow is a variably saturated sub-surface flow model that is coupled to the Common Land Model (CLM). Our results illustrate that the statistical disaggregation scheme performs better than the methods based on topographic data in approximating soil moisture distribution at a 60m scale. Moreover, the statistical disaggregation scheme maintains temporal correlation of simulated daily soil moisture while preserves the mean sub-basin soil moisture. Future work is focused on assessing the performance of this scheme in catchments with various topographic and climate settings.
Remarks on a financial inverse problem by means of Monte Carlo Methods

NASA Astrophysics Data System (ADS)

Cuomo, Salvatore; Di Somma, Vittorio; Sica, Federica

2017-10-01

Estimating the price of a barrier option is a typical inverse problem. In this paper we present a numerical and statistical framework for a market with risk-free interest rate and a risk asset, described by a Geometric Brownian Motion (GBM). After approximating the risk asset with a numerical method, we find the final option price by following an approach based on sequential Monte Carlo methods. All theoretical results are applied to the case of an option whose underlying is a real stock.
A Subband Coding Method for HDTV

NASA Technical Reports Server (NTRS)

Chung, Wilson; Kossentini, Faouzi; Smith, Mark J. T.

1995-01-01

This paper introduces a new HDTV coder based on motion compensation, subband coding, and high order conditional entropy coding. The proposed coder exploits the temporal and spatial statistical dependencies inherent in the HDTV signal by using intra- and inter-subband conditioning for coding both the motion coordinates and the residual signal. The new framework provides an easy way to control the system complexity and performance, and inherently supports multiresolution transmission. Experimental results show that the coder outperforms MPEG-2, while still maintaining relatively low complexity.
PharmML in Action: an Interoperable Language for Modeling and Simulation

PubMed Central

Bizzotto, R; Smith, G; Yvon, F; Kristensen, NR; Swat, MJ

2017-01-01

PharmML1 is an XML‐based exchange format2, 3, 4 created with a focus on nonlinear mixed‐effect (NLME) models used in pharmacometrics,5, 6 but providing a very general framework that also allows describing mathematical and statistical models such as single‐subject or nonlinear and multivariate regression models. This tutorial provides an overview of the structure of this language, brief suggestions on how to work with it, and use cases demonstrating its power and flexibility. PMID:28575551
Statistical Extremes of Turbulence and a Cascade Generalisation of Euler's Gyroscope Equation

NASA Astrophysics Data System (ADS)

Tchiguirinskaia, Ioulia; Scherzer, Daniel

2016-04-01

Turbulence refers to a rather well defined hydrodynamical phenomenon uncovered by Reynolds. Nowadays, the word turbulence is used to designate the loss of order in many different geophysical fields and the related fundamental extreme variability of environmental data over a wide range of scales. Classical statistical techniques for estimating the extremes, being largely limited to statistical distributions, do not take into account the mechanisms generating such extreme variability. An alternative approaches to nonlinear variability are based on a fundamental property of the non-linear equations: scale invariance, which means that these equations are formally invariant under given scale transforms. Its specific framework is that of multifractals. In this framework extreme variability builds up scale by scale leading to non-classical statistics. Although multifractals are increasingly understood as a basic framework for handling such variability, there is still a gap between their potential and their actual use. In this presentation we discuss how to dealt with highly theoretical problems of mathematical physics together with a wide range of geophysical applications. We use Euler's gyroscope equation as a basic element in constructing a complex deterministic system that preserves not only the scale symmetry of the Navier-Stokes equations, but some more of their symmetries. Euler's equation has been not only the object of many theoretical investigations of the gyroscope device, but also generalised enough to become the basic equation of fluid mechanics. Therefore, there is no surprise that a cascade generalisation of this equation can be used to characterise the intermittency of turbulence, to better understand the links between the multifractal exponents and the structure of a simplified, but not simplistic, version of the Navier-Stokes equations. In a given way, this approach is similar to that of Lorenz, who studied how the flap of a butterfly wing could generate a cyclone with the help of a 3D ordinary differential system. Being well supported by the extensive numerical results, the cascade generalisation of Euler's gyroscope equation opens new horizons for predictability and predictions of processes having long-range dependences.
Putting Cognitive Science behind a Statistics Teacher's Intuition

ERIC Educational Resources Information Center

Jones, Karrie A.; Jones, Jennifer L.; Vermette, Paul J.

2011-01-01

Recent advances in cognitive science have led to an enriched understanding of how people learn. Using a framework presented by Willingham, this article examines instructional best practice from the perspective of conceptual understanding and its implications on statistics education.
Scale-based fuzzy connectivity: a novel image segmentation methodology and its validation

NASA Astrophysics Data System (ADS)

Saha, Punam K.; Udupa, Jayaram K.

1999-05-01

This paper extends a previously reported theory and algorithms for fuzzy connected object definition. It introduces `object scale' for determining the neighborhood size for defining affinity, the degree of local hanging togetherness between image elements. Object scale allows us to use a varying neighborhood size in different parts of the image. This paper argues that scale-based fuzzy connectivity is natural in object definition and demonstrates that this leads to a more effective object segmentation than without using scale in fuzzy concentrations. Affinity is described as consisting of a homogeneity-based and an object-feature- based component. Families of non scale-based and scale-based affinity relations are constructed. An effective method for giving a rough estimate of scale at different locations in the image is presented. The original theoretical and algorithmic framework remains more-or-less the same but considerably improved segmentations result. A quantitative statistical comparison between the non scale-based and the scale-based methods was made based on phantom images generated from patient MR brain studies by first segmenting the objects, and then by adding noise and blurring, and background component. Both the statistical and the subjective tests clearly indicate the superiority of scale- based method in capturing details and in robustness to noise.
Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases

PubMed Central

Schadt, Eric E.; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H.; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A.; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

2013-01-01

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720
Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

PubMed

Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

2013-01-01

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
UNC-Utah NA-MIC framework for DTI fiber tract analysis.

PubMed

Verde, Audrey R; Budin, Francois; Berger, Jean-Baptiste; Gupta, Aditya; Farzinfar, Mahshid; Kaiser, Adrien; Ahn, Mihye; Johnson, Hans; Matsui, Joy; Hazlett, Heather C; Sharma, Anuja; Goodlett, Casey; Shi, Yundi; Gouttard, Sylvain; Vachet, Clement; Piven, Joseph; Zhu, Hongtu; Gerig, Guido; Styner, Martin

2014-01-01

Diffusion tensor imaging has become an important modality in the field of neuroimaging to capture changes in micro-organization and to assess white matter integrity or development. While there exists a number of tractography toolsets, these usually lack tools for preprocessing or to analyze diffusion properties along the fiber tracts. Currently, the field is in critical need of a coherent end-to-end toolset for performing an along-fiber tract analysis, accessible to non-technical neuroimaging researchers. The UNC-Utah NA-MIC DTI framework represents a coherent, open source, end-to-end toolset for atlas fiber tract based DTI analysis encompassing DICOM data conversion, quality control, atlas building, fiber tractography, fiber parameterization, and statistical analysis of diffusion properties. Most steps utilize graphical user interfaces (GUI) to simplify interaction and provide an extensive DTI analysis framework for non-technical researchers/investigators. We illustrate the use of our framework on a small sample, cross sectional neuroimaging study of eight healthy 1-year-old children from the Infant Brain Imaging Study (IBIS) Network. In this limited test study, we illustrate the power of our method by quantifying the diffusion properties at 1 year of age on the genu and splenium fiber tracts.
A Machine Learning Framework for Plan Payment Risk Adjustment.

PubMed

Rose, Sherri

2016-12-01

To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.
UNC-Utah NA-MIC framework for DTI fiber tract analysis

PubMed Central

Verde, Audrey R.; Budin, Francois; Berger, Jean-Baptiste; Gupta, Aditya; Farzinfar, Mahshid; Kaiser, Adrien; Ahn, Mihye; Johnson, Hans; Matsui, Joy; Hazlett, Heather C.; Sharma, Anuja; Goodlett, Casey; Shi, Yundi; Gouttard, Sylvain; Vachet, Clement; Piven, Joseph; Zhu, Hongtu; Gerig, Guido; Styner, Martin

2014-01-01

Diffusion tensor imaging has become an important modality in the field of neuroimaging to capture changes in micro-organization and to assess white matter integrity or development. While there exists a number of tractography toolsets, these usually lack tools for preprocessing or to analyze diffusion properties along the fiber tracts. Currently, the field is in critical need of a coherent end-to-end toolset for performing an along-fiber tract analysis, accessible to non-technical neuroimaging researchers. The UNC-Utah NA-MIC DTI framework represents a coherent, open source, end-to-end toolset for atlas fiber tract based DTI analysis encompassing DICOM data conversion, quality control, atlas building, fiber tractography, fiber parameterization, and statistical analysis of diffusion properties. Most steps utilize graphical user interfaces (GUI) to simplify interaction and provide an extensive DTI analysis framework for non-technical researchers/investigators. We illustrate the use of our framework on a small sample, cross sectional neuroimaging study of eight healthy 1-year-old children from the Infant Brain Imaging Study (IBIS) Network. In this limited test study, we illustrate the power of our method by quantifying the diffusion properties at 1 year of age on the genu and splenium fiber tracts. PMID:24409141
Ceramic Defects in Metal-Ceramic Fixed Dental Prostheses Made from Co-Cr and Au-Pt Alloys: A Retrospective Study.

PubMed

Mikeli, Aikaterini; Boening, Klaus W; Lißke, Benjamin

2015-01-01

Ceramic defects in porcelain-fused-to-metal (PFM) restorations may depend on framework alloy type. This study assessed ceramic defects on cobalt-chromium- (Co-Cr-) and gold-platinum- (Au-Pt-) based PFM restorations. In this study, 147 Co-Cr-based and 168 Au-Pt-based PFM restorations inserted between 1998 and 2010 (139 patients) were examined for ceramic defects. Detected defects were assigned to three groups according to clinical defect relevance. Ceramic defect rates (Co-Cr-based: 12.9%; Au-Pt-based: 7.2%) revealed no significant difference but a strong statistical trend (U test, P = .082). Most defects were of little clinical relevance. Co-Cr PFM restorations may be at higher risk for ceramic defects compared to Au-Pt-based restorations.
Assessing Cultural Competence in Graduating Students

ERIC Educational Resources Information Center

Kohli, Hermeet K.; Kohli, Amarpreet S.; Huber, Ruth; Faul, Anna C.

2010-01-01

Twofold purpose of this study was to develop a framework to understand cultural competence in graduating social work students, and test that framework for appropriateness and predictability using multivariate statistics. Scale and predictor variables were collected using an online instrument from a nationwide convenience sample of graduating…
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

PubMed

Kosinski, Andrzej S

2013-03-15

Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests

PubMed Central

Kosinski, Andrzej S.

2013-01-01

Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343

Some links on this page may take you to non-federal websites. Their policies may differ from this site.