Shlizerman, Eli; Riffell, Jeffrey A.; Kutz, J. Nathan
2014-01-01
The antennal lobe (AL), olfactory processing center in insects, is able to process stimuli into distinct neural activity patterns, called olfactory neural codes. To model their dynamics we perform multichannel recordings from the projection neurons in the AL driven by different odorants. We then derive a dynamic neuronal network from the electrophysiological data. The network consists of lateral-inhibitory neurons and excitatory neurons (modeled as firing-rate units), and is capable of producing unique olfactory neural codes for the tested odorants. To construct the network, we (1) design a projection, an odor space, for the neural recording from the AL, which discriminates between distinct odorants trajectories (2) characterize scent recognition, i.e., decision-making based on olfactory signals and (3) infer the wiring of the neural circuit, the connectome of the AL. We show that the constructed model is consistent with biological observations, such as contrast enhancement and robustness to noise. The study suggests a data-driven approach to answer a key biological question in identifying how lateral inhibitory neurons can be wired to excitatory neurons to permit robust activity patterns. PMID:25165442
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.
The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networksmore » and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.« less
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Bellugi, D. G.; Tennant, C.; Larsen, L.
2016-12-01
Catchment and climate heterogeneity complicate prediction of runoff across time and space, and resulting parameter uncertainty can lead to large accumulated errors in hydrologic models, particularly in ungauged basins. Recently, data-driven modeling approaches have been shown to avoid the accumulated uncertainty associated with many physically-based models, providing an appealing alternative for hydrologic prediction. However, the effectiveness of different methods in hydrologically and geomorphically distinct catchments, and the robustness of these methods to changing climate and changing hydrologic processes remain to be tested. Here, we evaluate the use of machine learning techniques to predict daily runoff across time and space using only essential climatic forcing (e.g. precipitation, temperature, and potential evapotranspiration) time series as model input. Model training and testing was done using a high quality dataset of daily runoff and climate forcing data for 25+ years for 600+ minimally-disturbed catchments (drainage area range 5-25,000 km2, median size 336 km2) that cover a wide range of climatic and physical characteristics. Preliminary results using Support Vector Regression (SVR) suggest that in some catchments this nonlinear-based regression technique can accurately predict daily runoff, while the same approach fails in other catchments, indicating that the representation of climate inputs and/or catchment filter characteristics in the model structure need further refinement to increase performance. We bolster this analysis by using Sparse Identification of Nonlinear Dynamics (a sparse symbolic regression technique) to uncover the governing equations that describe runoff processes in catchments where SVR performed well and for ones where it performed poorly, thereby enabling inference about governing processes. This provides a robust means of examining how catchment complexity influences runoff prediction skill, and represents a contribution towards the integration of data-driven inference and physically-based models.
Inference on periodicity of circadian time series.
Costa, Maria J; Finkenstädt, Bärbel; Roche, Véronique; Lévi, Francis; Gould, Peter D; Foreman, Julia; Halliday, Karen; Hall, Anthony; Rand, David A
2013-09-01
Estimation of the period length of time-course data from cyclical biological processes, such as those driven by the circadian pacemaker, is crucial for inferring the properties of the biological clock found in many living organisms. We propose a methodology for period estimation based on spectrum resampling (SR) techniques. Simulation studies show that SR is superior and more robust to non-sinusoidal and noisy cycles than a currently used routine based on Fourier approximations. In addition, a simple fit to the oscillations using linear least squares is available, together with a non-parametric test for detecting changes in period length which allows for period estimates with different variances, as frequently encountered in practice. The proposed methods are motivated by and applied to various data examples from chronobiology.
Robust inference under the beta regression model with application to health care studies.
Ghosh, Abhik
2017-01-01
Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.
Che-Castaldo, Christian; Jenouvrier, Stephanie; Youngflesh, Casey; Shoemaker, Kevin T; Humphries, Grant; McDowall, Philip; Landrum, Laura; Holland, Marika M; Li, Yun; Ji, Rubao; Lynch, Heather J
2017-10-10
Colonially-breeding seabirds have long served as indicator species for the health of the oceans on which they depend. Abundance and breeding data are repeatedly collected at fixed study sites in the hopes that changes in abundance and productivity may be useful for adaptive management of marine resources, but their suitability for this purpose is often unknown. To address this, we fit a Bayesian population dynamics model that includes process and observation error to all known Adélie penguin abundance data (1982-2015) in the Antarctic, covering >95% of their population globally. We find that process error exceeds observation error in this system, and that continent-wide "year effects" strongly influence population growth rates. Our findings have important implications for the use of Adélie penguins in Southern Ocean feedback management, and suggest that aggregating abundance across space provides the fastest reliable signal of true population change for species whose dynamics are driven by stochastic processes.Adélie penguins are a key Antarctic indicator species, but data patchiness has challenged efforts to link population dynamics to key drivers. Che-Castaldo et al. resolve this issue using a pan-Antarctic Bayesian model to infer missing data, and show that spatial aggregation leads to more robust inference regarding dynamics.
Wisdom of crowds for robust gene network inference
Marbach, Daniel; Costello, James C.; Küffner, Robert; Vega, Nicci; Prill, Robert J.; Camacho, Diogo M.; Allison, Kyle R.; Kellis, Manolis; Collins, James J.; Stolovitzky, Gustavo
2012-01-01
Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks. PMID:22796662
Identifying Seizure Onset Zone From the Causal Connectivity Inferred Using Directed Information
NASA Astrophysics Data System (ADS)
Malladi, Rakesh; Kalamangalam, Giridhar; Tandon, Nitin; Aazhang, Behnaam
2016-10-01
In this paper, we developed a model-based and a data-driven estimator for directed information (DI) to infer the causal connectivity graph between electrocorticographic (ECoG) signals recorded from brain and to identify the seizure onset zone (SOZ) in epileptic patients. Directed information, an information theoretic quantity, is a general metric to infer causal connectivity between time-series and is not restricted to a particular class of models unlike the popular metrics based on Granger causality or transfer entropy. The proposed estimators are shown to be almost surely convergent. Causal connectivity between ECoG electrodes in five epileptic patients is inferred using the proposed DI estimators, after validating their performance on simulated data. We then proposed a model-based and a data-driven SOZ identification algorithm to identify SOZ from the causal connectivity inferred using model-based and data-driven DI estimators respectively. The data-driven SOZ identification outperforms the model-based SOZ identification algorithm when benchmarked against visual analysis by neurologist, the current clinical gold standard. The causal connectivity analysis presented here is the first step towards developing novel non-surgical treatments for epilepsy.
Robust functional regression model for marginal mean and subject-specific inferences.
Cao, Chunzheng; Shi, Jian Qing; Lee, Youngjo
2017-01-01
We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well as interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals (PIs) for conditional mean curves. Numerical studies show that the proposed model provides a robust approach against data contamination or distribution misspecification, and the proposed PIs maintain the nominal confidence levels. A real data application is presented as an illustrative example.
Doubly robust nonparametric inference on the average treatment effect.
Benkeser, D; Carone, M; Laan, M J Van Der; Gilbert, P B
2017-12-01
Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. We present a general theoretical study of the behaviour of doubly robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different methods for constructing such estimators and investigate the extent to which they may be modified to also allow doubly robust inference. We find that while targeted minimum loss-based estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need for and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly robust inference in other problems.
Bayesian functional integral method for inferring continuous data from discrete measurements.
Heuett, William J; Miller, Bernard V; Racette, Susan B; Holloszy, John O; Chow, Carson C; Periwal, Vipul
2012-02-08
Inference of the insulin secretion rate (ISR) from C-peptide measurements as a quantification of pancreatic β-cell function is clinically important in diseases related to reduced insulin sensitivity and insulin action. ISR derived from C-peptide concentration is an example of nonparametric Bayesian model selection where a proposed ISR time-course is considered to be a "model". An inferred value of inaccessible continuous variables from discrete observable data is often problematic in biology and medicine, because it is a priori unclear how robust the inference is to the deletion of data points, and a closely related question, how much smoothness or continuity the data actually support. Predictions weighted by the posterior distribution can be cast as functional integrals as used in statistical field theory. Functional integrals are generally difficult to evaluate, especially for nonanalytic constraints such as positivity of the estimated parameters. We propose a computationally tractable method that uses the exact solution of an associated likelihood function as a prior probability distribution for a Markov-chain Monte Carlo evaluation of the posterior for the full model. As a concrete application of our method, we calculate the ISR from actual clinical C-peptide measurements in human subjects with varying degrees of insulin sensitivity. Our method demonstrates the feasibility of functional integral Bayesian model selection as a practical method for such data-driven inference, allowing the data to determine the smoothing timescale and the width of the prior probability distribution on the space of models. In particular, our model comparison method determines the discrete time-step for interpolation of the unobservable continuous variable that is supported by the data. Attempts to go to finer discrete time-steps lead to less likely models. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Evaluating data-driven causal inference techniques in noisy physical and ecological systems
NASA Astrophysics Data System (ADS)
Tennant, C.; Larsen, L.
2016-12-01
Causal inference from observational time series challenges traditional approaches for understanding processes and offers exciting opportunities to gain new understanding of complex systems where nonlinearity, delayed forcing, and emergent behavior are common. We present a formal evaluation of the performance of convergent cross-mapping (CCM) and transfer entropy (TE) for data-driven causal inference under real-world conditions. CCM is based on nonlinear state-space reconstruction, and causality is determined by the convergence of prediction skill with an increasing number of observations of the system. TE is the uncertainty reduction based on transition probabilities of a pair of time-lagged variables. With TE, causal inference is based on asymmetry in information flow between the variables. Observational data and numerical simulations from a number of classical physical and ecological systems: atmospheric convection (the Lorenz system), species competition (patch-tournaments), and long-term climate change (Vostok ice core) were used to evaluate the ability of CCM and TE to infer causal-relationships as data series become increasingly corrupted by observational (instrument-driven) or process (model-or -stochastic-driven) noise. While both techniques show promise for causal inference, TE appears to be applicable to a wider range of systems, especially when the data series are of sufficient length to reliably estimate transition probabilities of system components. Both techniques also show a clear effect of observational noise on causal inference. For example, CCM exhibits a negative logarithmic decline in prediction skill as the noise level of the system increases. Changes in TE strongly depend on noise type and which variable the noise was added to. The ability of CCM and TE to detect driving influences suggest that their application to physical and ecological systems could be transformative for understanding driving mechanisms as Earth systems undergo change.
Tsiatis, Anastasios A.; Davidian, Marie; Cao, Weihua
2010-01-01
Summary A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial. PMID:20731640
Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals.
Chen, Daizhuo; Fraiberger, Samuel P; Moakler, Robert; Provost, Foster
2017-09-01
Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from "Likes" on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the "cloaking device"-a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users.
Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals
Chen, Daizhuo; Fraiberger, Samuel P.; Moakler, Robert; Provost, Foster
2017-01-01
Abstract Recent studies show the remarkable power of fine-grained information disclosed by users on social network sites to infer users' personal characteristics via predictive modeling. Similar fine-grained data are being used successfully in other commercial applications. In response, attention is turning increasingly to the transparency that organizations provide to users as to what inferences are drawn and why, as well as to what sort of control users can be given over inferences that are drawn about them. In this article, we focus on inferences about personal characteristics based on information disclosed by users' online actions. As a use case, we explore personal inferences that are made possible from “Likes” on Facebook. We first present a means for providing transparency into the information responsible for inferences drawn by data-driven models. We then introduce the “cloaking device”—a mechanism for users to inhibit the use of particular pieces of information in inference. Using these analytical tools we ask two main questions: (1) How much information must users cloak to significantly affect inferences about their personal traits? We find that usually users must cloak only a small portion of their actions to inhibit inference. We also find that, encouragingly, false-positive inferences are significantly easier to cloak than true-positive inferences. (2) Can firms change their modeling behavior to make cloaking more difficult? The answer is a definitive yes. We demonstrate a simple modeling change that requires users to cloak substantially more information to affect the inferences drawn. The upshot is that organizations can provide transparency and control even into complicated, predictive model-driven inferences, but they also can make control easier or harder for their users. PMID:28933942
Schema-driven facilitation of new hierarchy learning in the transitive inference paradigm
Kumaran, Dharshan
2013-01-01
Prior knowledge, in the form of a mental schema or framework, is viewed to facilitate the learning of new information in a range of experimental and everyday scenarios. Despite rising interest in the cognitive and neural mechanisms underlying schema-driven facilitation of new learning, few paradigms have been developed to examine this issue in humans. Here we develop a multiphase experimental scenario aimed at characterizing schema-based effects in the context of a paradigm that has been very widely used across species, the transitive inference task. We show that an associative schema, comprised of prior knowledge of the rank positions of familiar items in the hierarchy, has a marked effect on transitivity performance and the development of relational knowledge of the hierarchy that cannot be accounted for by more general changes in task strategy. Further, we show that participants are capable of deploying prior knowledge to successful effect under surprising conditions (i.e., when corrective feedback is totally absent), but only when the associative schema is robust. Finally, our results provide insights into the cognitive mechanisms underlying such schema-driven effects, and suggest that new hierarchy learning in the transitive inference task can occur through a contextual transfer mechanism that exploits the structure of associative experiences. PMID:23782509
Schema-driven facilitation of new hierarchy learning in the transitive inference paradigm.
Kumaran, Dharshan
2013-06-19
Prior knowledge, in the form of a mental schema or framework, is viewed to facilitate the learning of new information in a range of experimental and everyday scenarios. Despite rising interest in the cognitive and neural mechanisms underlying schema-driven facilitation of new learning, few paradigms have been developed to examine this issue in humans. Here we develop a multiphase experimental scenario aimed at characterizing schema-based effects in the context of a paradigm that has been very widely used across species, the transitive inference task. We show that an associative schema, comprised of prior knowledge of the rank positions of familiar items in the hierarchy, has a marked effect on transitivity performance and the development of relational knowledge of the hierarchy that cannot be accounted for by more general changes in task strategy. Further, we show that participants are capable of deploying prior knowledge to successful effect under surprising conditions (i.e., when corrective feedback is totally absent), but only when the associative schema is robust. Finally, our results provide insights into the cognitive mechanisms underlying such schema-driven effects, and suggest that new hierarchy learning in the transitive inference task can occur through a contextual transfer mechanism that exploits the structure of associative experiences.
Wen, Fu-Lai; Wang, Yu-Chiun; Shibata, Tatsuo
2017-06-20
During embryonic development, epithelial sheets fold into complex structures required for tissue and organ functions. Although substantial efforts have been devoted to identifying molecular mechanisms underlying epithelial folding, far less is understood about how forces deform individual cells to sculpt the overall sheet morphology. Here we describe a simple and general theoretical model for the autonomous folding of monolayered epithelial sheets. We show that active modulation of intracellular mechanics along the basal-lateral as well as the apical surfaces is capable of inducing fold formation in the absence of buckling instability. Apical modulation sculpts epithelia into shallow and V-shaped folds, whereas basal-lateral modulation generates deep and U-shaped folds. These characteristic tissue shapes remain unchanged when subject to mechanical perturbations from the surroundings, illustrating that the autonomous folding is robust against environmental variabilities. At the cellular scale, how cells change shape depends on their initial aspect ratios and the modulation mechanisms. Such cell deformation characteristics are verified via experimental measurements for a canonical folding process driven by apical modulation, indicating that our theory could be used to infer the underlying folding mechanisms based on experimental data. The mechanical principles revealed in our model could potentially guide future studies on epithelial folding in diverse systems. Copyright © 2017. Published by Elsevier Inc.
Generalized species sampling priors with latent Beta reinforcements
Airoldi, Edoardo M.; Costa, Thiago; Bassetti, Federico; Leisen, Fabrizio; Guindani, Michele
2014-01-01
Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data. PMID:25870462
A Review of Some Aspects of Robust Inference for Time Series.
1984-09-01
REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53
Kapun, Martin; van Schalkwyk, Hester; McAllister, Bryant; Flatt, Thomas; Schlötterer, Christian
2014-04-01
Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations. © 2013 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Kapun, Martin; van Schalkwyk, Hester; McAllister, Bryant; Flatt, Thomas; Schlötterer, Christian
2014-01-01
Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost-effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool-Seq data from diverse D. melanogaster populations. PMID:24372777
Bayesian Inference and Application of Robust Growth Curve Models Using Student's "t" Distribution
ERIC Educational Resources Information Center
Zhang, Zhiyong; Lai, Keke; Lu, Zhenqiu; Tong, Xin
2013-01-01
Despite the widespread popularity of growth curve analysis, few studies have investigated robust growth curve models. In this article, the "t" distribution is applied to model heavy-tailed data and contaminated normal data with outliers for growth curve analysis. The derived robust growth curve models are estimated through Bayesian…
Evaluation of respondent-driven sampling.
McCreesh, Nicky; Frost, Simon D W; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda N; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G
2012-01-01
Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total population data. Total population data on age, tribe, religion, socioeconomic status, sexual activity, and HIV status were available on a population of 2402 male household heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, using current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). We recruited 927 household heads. Full and small RDS samples were largely representative of the total population, but both samples underrepresented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven sampling statistical inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven sampling bootstrap 95% confidence intervals included the population proportion. Respondent-driven sampling produced a generally representative sample of this well-connected nonhidden population. However, current respondent-driven sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required when interpreting findings based on the sampling method.
Inferring Binary and Trinary Stellar Populations in Photometric and Astrometric Surveys
NASA Astrophysics Data System (ADS)
Widmark, Axel; Leistedt, Boris; Hogg, David W.
2018-04-01
Multiple stellar systems are ubiquitous in the Milky Way but are often unresolved and seen as single objects in spectroscopic, photometric, and astrometric surveys. However, modeling them is essential for developing a full understanding of large surveys such as Gaia and connecting them to stellar and Galactic models. In this paper, we address this problem by jointly fitting the Gaia and Two Micron All Sky Survey photometric and astrometric data using a data-driven Bayesian hierarchical model that includes populations of binary and trinary systems. This allows us to classify observations into singles, binaries, and trinaries, in a robust and efficient manner, without resorting to external models. We are able to identify multiple systems and, in some cases, make strong predictions for the properties of their unresolved stars. We will be able to compare such predictions with Gaia Data Release 4, which will contain astrometric identification and analysis of binary systems.
Mumford, Jeanette A.
2017-01-01
Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782
Recurrence measure of conditional dependence and applications.
Ramos, Antônio M T; Builes-Jaramillo, Alejandro; Poveda, Germán; Goswami, Bedartha; Macau, Elbert E N; Kurths, Jürgen; Marwan, Norbert
2017-05-01
Identifying causal relations from observational data sets has posed great challenges in data-driven causality inference studies. One of the successful approaches to detect direct coupling in the information theory framework is transfer entropy. However, the core of entropy-based tools lies on the probability estimation of the underlying variables. Here we propose a data-driven approach for causality inference that incorporates recurrence plot features into the framework of information theory. We define it as the recurrence measure of conditional dependence (RMCD), and we present some applications. The RMCD quantifies the causal dependence between two processes based on joint recurrence patterns between the past of the possible driver and present of the potentially driven, excepting the contribution of the contemporaneous past of the driven variable. Finally, it can unveil the time scale of the influence of the sea-surface temperature of the Pacific Ocean on the precipitation in the Amazonia during recent major droughts.
Recurrence measure of conditional dependence and applications
NASA Astrophysics Data System (ADS)
Ramos, Antônio M. T.; Builes-Jaramillo, Alejandro; Poveda, Germán; Goswami, Bedartha; Macau, Elbert E. N.; Kurths, Jürgen; Marwan, Norbert
2017-05-01
Identifying causal relations from observational data sets has posed great challenges in data-driven causality inference studies. One of the successful approaches to detect direct coupling in the information theory framework is transfer entropy. However, the core of entropy-based tools lies on the probability estimation of the underlying variables. Here we propose a data-driven approach for causality inference that incorporates recurrence plot features into the framework of information theory. We define it as the recurrence measure of conditional dependence (RMCD), and we present some applications. The RMCD quantifies the causal dependence between two processes based on joint recurrence patterns between the past of the possible driver and present of the potentially driven, excepting the contribution of the contemporaneous past of the driven variable. Finally, it can unveil the time scale of the influence of the sea-surface temperature of the Pacific Ocean on the precipitation in the Amazonia during recent major droughts.
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model
Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P.; Terrace, Herbert S.
2015-01-01
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort’s success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models. PMID:26407227
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.
Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P; Terrace, Herbert S
2015-01-01
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ping; Lv, Youbin; Wang, Hong
Optimal operation of a practical blast furnace (BF) ironmaking process depends largely on a good measurement of molten iron quality (MIQ) indices. However, measuring the MIQ online is not feasible using the available techniques. In this paper, a novel data-driven robust modeling is proposed for online estimation of MIQ using improved random vector functional-link networks (RVFLNs). Since the output weights of traditional RVFLNs are obtained by the least squares approach, a robustness problem may occur when the training dataset is contaminated with outliers. This affects the modeling accuracy of RVFLNs. To solve this problem, a Cauchy distribution weighted M-estimation basedmore » robust RFVLNs is proposed. Since the weights of different outlier data are properly determined by the Cauchy distribution, their corresponding contribution on modeling can be properly distinguished. Thus robust and better modeling results can be achieved. Moreover, given that the BF is a complex nonlinear system with numerous coupling variables, the data-driven canonical correlation analysis is employed to identify the most influential components from multitudinous factors that affect the MIQ indices to reduce the model dimension. Finally, experiments using industrial data and comparative studies have demonstrated that the obtained model produces a better modeling and estimating accuracy and stronger robustness than other modeling methods.« less
Robust model-based analysis of single-particle tracking experiments with Spot-On
Grimm, Jonathan B; Lavis, Luke D
2018-01-01
Single-particle tracking (SPT) has become an important method to bridge biochemistry and cell biology since it allows direct observation of protein binding and diffusion dynamics in live cells. However, accurately inferring information from SPT studies is challenging due to biases in both data analysis and experimental design. To address analysis bias, we introduce ‘Spot-On’, an intuitive web-interface. Spot-On implements a kinetic modeling framework that accounts for known biases, including molecules moving out-of-focus, and robustly infers diffusion constants and subpopulations from pooled single-molecule trajectories. To minimize inherent experimental biases, we implement and validate stroboscopic photo-activation SPT (spaSPT), which minimizes motion-blur bias and tracking errors. We validate Spot-On using experimentally realistic simulations and show that Spot-On outperforms other methods. We then apply Spot-On to spaSPT data from live mammalian cells spanning a wide range of nuclear dynamics and demonstrate that Spot-On consistently and robustly infers subpopulation fractions and diffusion constants. PMID:29300163
Robust model-based analysis of single-particle tracking experiments with Spot-On.
Hansen, Anders S; Woringer, Maxime; Grimm, Jonathan B; Lavis, Luke D; Tjian, Robert; Darzacq, Xavier
2018-01-04
Single-particle tracking (SPT) has become an important method to bridge biochemistry and cell biology since it allows direct observation of protein binding and diffusion dynamics in live cells. However, accurately inferring information from SPT studies is challenging due to biases in both data analysis and experimental design. To address analysis bias, we introduce 'Spot-On', an intuitive web-interface. Spot-On implements a kinetic modeling framework that accounts for known biases, including molecules moving out-of-focus, and robustly infers diffusion constants and subpopulations from pooled single-molecule trajectories. To minimize inherent experimental biases, we implement and validate stroboscopic photo-activation SPT (spaSPT), which minimizes motion-blur bias and tracking errors. We validate Spot-On using experimentally realistic simulations and show that Spot-On outperforms other methods. We then apply Spot-On to spaSPT data from live mammalian cells spanning a wide range of nuclear dynamics and demonstrate that Spot-On consistently and robustly infers subpopulation fractions and diffusion constants. © 2018, Hansen et al.
Network Model-Assisted Inference from Respondent-Driven Sampling Data
Gile, Krista J.; Handcock, Mark S.
2015-01-01
Summary Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population. PMID:26640328
Network Model-Assisted Inference from Respondent-Driven Sampling Data.
Gile, Krista J; Handcock, Mark S
2015-06-01
Respondent-Driven Sampling is a widely-used method for sampling hard-to-reach human populations by link-tracing over their social networks. Inference from such data requires specialized techniques because the sampling process is both partially beyond the control of the researcher, and partially implicitly defined. Therefore, it is not generally possible to directly compute the sampling weights for traditional design-based inference, and likelihood inference requires modeling the complex sampling process. As an alternative, we introduce a model-assisted approach, resulting in a design-based estimator leveraging a working network model. We derive a new class of estimators for population means and a corresponding bootstrap standard error estimator. We demonstrate improved performance compared to existing estimators, including adjustment for an initial convenience sample. We also apply the method and an extension to the estimation of HIV prevalence in a high-risk population.
NASA Astrophysics Data System (ADS)
Badrzadeh, Honey; Sarukkalige, Ranjan; Jayawardena, A. W.
2015-10-01
Reliable river flow forecasts play a key role in flood risk mitigation. Among different approaches of river flow forecasting, data driven approaches have become increasingly popular in recent years due to their minimum information requirements and ability to simulate nonlinear and non-stationary characteristics of hydrological processes. In this study, attempts are made to apply four different types of data driven approaches, namely traditional artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), wavelet neural networks (WNN), and, hybrid ANFIS with multi resolution analysis using wavelets (WNF). Developed models applied for real time flood forecasting at Casino station on Richmond River, Australia which is highly prone to flooding. Hourly rainfall and runoff data were used to drive the models which have been used for forecasting with 1, 6, 12, 24, 36 and 48 h lead-time. The performance of models further improved by adding an upstream river flow data (Wiangaree station), as another effective input. All models perform satisfactorily up to 12 h lead-time. However, the hybrid wavelet-based models significantly outperforming the ANFIS and ANN models in the longer lead-time forecasting. The results confirm the robustness of the proposed structure of the hybrid models for real time runoff forecasting in the study area.
Quantum theory as plausible reasoning applied to data obtained by robust experiments.
De Raedt, H; Katsnelson, M I; Michielsen, K
2016-05-28
We review recent work that employs the framework of logical inference to establish a bridge between data gathered through experiments and their objective description in terms of human-made concepts. It is shown that logical inference applied to experiments for which the observed events are independent and for which the frequency distribution of these events is robust with respect to small changes of the conditions under which the experiments are carried out yields, without introducing any concept of quantum theory, the quantum theoretical description in terms of the Schrödinger or the Pauli equation, the Stern-Gerlach or Einstein-Podolsky-Rosen-Bohm experiments. The extraordinary descriptive power of quantum theory then follows from the fact that it is plausible reasoning, that is common sense, applied to reproducible and robust experimental data. © 2016 The Author(s).
Arciszewski, Tim J; Munkittrick, Kelly R; Scrimgeour, Garry J; Dubé, Monique G; Wrona, Fred J; Hazewinkel, Rod R
2017-09-01
The primary goals of environmental monitoring are to indicate whether unexpected changes related to development are occurring in the physical, chemical, and biological attributes of ecosystems and to inform meaningful management intervention. Although achieving these objectives is conceptually simple, varying scientific and social challenges often result in their breakdown. Conceptualizing, designing, and operating programs that better delineate monitoring, management, and risk assessment processes supported by hypothesis-driven approaches, strong inference, and adverse outcome pathways can overcome many of the challenges. Generally, a robust monitoring program is characterized by hypothesis-driven questions associated with potential adverse outcomes and feedback loops informed by data. Specifically, key and basic features are predictions of future observations (triggers) and mechanisms to respond to success or failure of those predictions (tiers). The adaptive processes accelerate or decelerate the effort to highlight and overcome ignorance while preventing the potentially unnecessary escalation of unguided monitoring and management. The deployment of the mutually reinforcing components can allow for more meaningful and actionable monitoring programs that better associate activities with consequences. Integr Environ Assess Manag 2017;13:877-891. © 2017 The Authors. Integrated Environmental Assessment and Management Published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC). © 2017 The Authors. Integrated Environmental Assessment and Management Published by Wiley Periodicals, Inc. on behalf of Society of Environmental Toxicology & Chemistry (SETAC).
Evaluation of Respondent-Driven Sampling
McCreesh, Nicky; Frost, Simon; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda Ndagire; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G
2012-01-01
Background Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex-workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total-population data. Methods Total-population data on age, tribe, religion, socioeconomic status, sexual activity and HIV status were available on a population of 2402 male household-heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, employing current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). Results We recruited 927 household-heads. Full and small RDS samples were largely representative of the total population, but both samples under-represented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven-sampling statistical-inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven-sampling bootstrap 95% confidence intervals included the population proportion. Conclusions Respondent-driven sampling produced a generally representative sample of this well-connected non-hidden population. However, current respondent-driven-sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience-sampling method, and caution is required when interpreting findings based on the sampling method. PMID:22157309
Hettling, Hannes; Condamine, Fabien L.; Vos, Karin; Nilsson, R. Henrik; Sanderson, Michael J.; Sauquet, Hervé; Scharn, Ruud; Silvestro, Daniele; Töpel, Mats; Bacon, Christine D.; Oxelman, Bengt; Vos, Rutger A.
2017-01-01
Abstract Rapidly growing biological data—including molecular sequences and fossils—hold an unprecedented potential to reveal how evolutionary processes generate and maintain biodiversity. However, researchers often have to develop their own idiosyncratic workflows to integrate and analyze these data for reconstructing time-calibrated phylogenies. In addition, divergence times estimated under different methods and assumptions, and based on data of various quality and reliability, should not be combined without proper correction. Here we introduce a modular framework termed SUPERSMART (Self-Updating Platform for Estimating Rates of Speciation and Migration, Ages, and Relationships of Taxa), and provide a proof of concept for dealing with the moving targets of evolutionary and biogeographical research. This framework assembles comprehensive data sets of molecular and fossil data for any taxa and infers dated phylogenies using robust species tree methods, also allowing for the inclusion of genomic data produced through next-generation sequencing techniques. We exemplify the application of our method by presenting phylogenetic and dating analyses for the mammal order Primates and for the plant family Arecaceae (palms). We believe that this framework will provide a valuable tool for a wide range of hypothesis-driven research questions in systematics, biogeography, and evolution. SUPERSMART will also accelerate the inference of a “Dated Tree of Life” where all node ages are directly comparable. PMID:27616324
Evaluation of Anomaly Detection Capability for Ground-Based Pre-Launch Shuttle Operations. Chapter 8
NASA Technical Reports Server (NTRS)
Martin, Rodney Alexander
2010-01-01
This chapter will provide a thorough end-to-end description of the process for evaluation of three different data-driven algorithms for anomaly detection to select the best candidate for deployment as part of a suite of IVHM (Integrated Vehicle Health Management) technologies. These algorithms were deemed to be sufficiently mature enough to be considered viable candidates for deployment in support of the maiden launch of Ares I-X, the successor to the Space Shuttle for NASA's Constellation program. Data-driven algorithms are just one of three different types being deployed. The other two types of algorithms being deployed include a "nile-based" expert system, and a "model-based" system. Within these two categories, the deployable candidates have already been selected based upon qualitative factors such as flight heritage. For the rule-based system, SHINE (Spacecraft High-speed Inference Engine) has been selected for deployment, which is a component of BEAM (Beacon-based Exception Analysis for Multimissions), a patented technology developed at NASA's JPL (Jet Propulsion Laboratory) and serves to aid in the management and identification of operational modes. For the "model-based" system, a commercially available package developed by QSI (Qualtech Systems, Inc.), TEAMS (Testability Engineering and Maintenance System) has been selected for deployment to aid in diagnosis. In the context of this particular deployment, distinctions among the use of the terms "data-driven," "rule-based," and "model-based," can be found in. Although there are three different categories of algorithms that have been selected for deployment, our main focus in this chapter will be on the evaluation of three candidates for data-driven anomaly detection. These algorithms will be evaluated upon their capability for robustly detecting incipient faults or failures in the ground-based phase of pre-launch space shuttle operations, rather than based oil heritage as performed in previous studies. Robust detection will allow for the achievement of pre-specified minimum false alarm and/or missed detection rates in the selection of alert thresholds. All algorithms will also be optimized with respect to an aggregation of these same criteria. Our study relies upon the use of Shuttle data to act as was a proxy for and in preparation for application to Ares I-X data, which uses a very similar hardware platform for the subsystems that are being targeted (TVC - Thrust Vector Control subsystem for the SRB (Solid Rocket Booster)).
Non-linear Interactions between Niño region 3 and the Southern Amazon
NASA Astrophysics Data System (ADS)
Ramos, A. M. D. T.; Builes-Jaramillo, L. A.; Poveda, G.; Goswami, B.; Macau, E. E. N.; Kurths, J.; Marwan, N.
2016-12-01
Identifying causal relations from the observational dataset has posed great challenges in data-driven inference study. However, complex system framework offers promising approaches to tackle such problems. Here we propose a new data-driven causality inference method using the framework of recurrence plots. We present the Recurrence Measure of Conditional Dependence (RMCD) and its applications. The RMCD incorporates the recurrence behavior into the transfer entropy theory. Therefore, it quantifies the causal dependence between two processes based on joint recurrence patterns between the past of the potential driver and present on the potential driven, except for any contribution that has already been in the past of the driven. We apply this methodology to some paradigmatic models and to investigate the possible influence of the Pacific Ocean temperatures on the South West Amazon for the 2010 and 2005 droughts. The results reveal that for the 2005 drought there is not a significant signal of dependence from the Pacific Ocean and that for 2010 there is a signal of dependence of around 200 days. These outcomes are confirmed by the traditional climatological analysis of these episodes available in the literature and show the accuracy of RMCD inferring causal relations in climate systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humberto E. Garcia
This paper illustrates safeguards benefits that process monitoring (PM) can have as a diversion deterrent and as a complementary safeguards measure to nuclear material accountancy (NMA). In order to infer the possible existence of proliferation-driven activities, the objective of NMA-based methods is often to statistically evaluate materials unaccounted for (MUF) computed by solving a given mass balance equation related to a material balance area (MBA) at every material balance period (MBP), a particular objective for a PM-based approach may be to statistically infer and evaluate anomalies unaccounted for (AUF) that may have occurred within a MBP. Although possibly being indicativemore » of proliferation-driven activities, the detection and tracking of anomaly patterns is not trivial because some executed events may be unobservable or unreliably observed as others. The proposed similarity between NMA- and PM-based approaches is important as performance metrics utilized for evaluating NMA-based methods, such as detection probability (DP) and false alarm probability (FAP), can also be applied for assessing PM-based safeguards solutions. To this end, AUF count estimates can be translated into significant quantity (SQ) equivalents that may have been diverted within a given MBP. A diversion alarm is reported if this mass estimate is greater than or equal to the selected value for alarm level (AL), appropriately chosen to optimize DP and FAP based on the particular characteristics of the monitored MBA, the sensors utilized, and the data processing method employed for integrating and analyzing collected measurements. To illustrate the application of the proposed PM approach, a protracted diversion of Pu in a waste stream was selected based on incomplete fuel dissolution in a dissolver unit operation, as this diversion scenario is considered to be problematic for detection using NMA-based methods alone. Results demonstrate benefits of conducting PM under a system-centric strategy that utilizes data collected from a system of sensors and that effectively exploits known characterizations of sensors and facility operations in order to significantly improve anomaly detection, reduce false alarm, and enhance assessment robustness under unreliable partial sensor information.« less
Liao, J. G.; Mcmurry, Timothy; Berg, Arthur
2014-01-01
Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method’s prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank.Shrinkage provides a ready-to-use implementation of the proposed methodology. PMID:23934072
OncoNEM: inferring tumor evolution from single-cell sequencing data.
Ross, Edith M; Markowetz, Florian
2016-04-15
Single-cell sequencing promises a high-resolution view of genetic heterogeneity and clonal evolution in cancer. However, methods to infer tumor evolution from single-cell sequencing data lag behind methods developed for bulk-sequencing data. Here, we present OncoNEM, a probabilistic method for inferring intra-tumor evolutionary lineage trees from somatic single nucleotide variants of single cells. OncoNEM identifies homogeneous cellular subpopulations and infers their genotypes as well as a tree describing their evolutionary relationships. In simulation studies, we assess OncoNEM's robustness and benchmark its performance against competing methods. Finally, we show its applicability in case studies of muscle-invasive bladder cancer and essential thrombocythemia.
IMNN: Information Maximizing Neural Networks
NASA Astrophysics Data System (ADS)
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
This software trains artificial neural networks to find non-linear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). As compressing large data sets vastly simplifies both frequentist and Bayesian inference, important information may be inadvertently missed. Likelihood-free inference based on automatically derived IMNN summaries produces summaries that are good approximations to sufficient statistics. IMNNs are robustly capable of automatically finding optimal, non-linear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima.
Strong Inference in Mathematical Modeling: A Method for Robust Science in the Twenty-First Century.
Ganusov, Vitaly V
2016-01-01
While there are many opinions on what mathematical modeling in biology is, in essence, modeling is a mathematical tool, like a microscope, which allows consequences to logically follow from a set of assumptions. Only when this tool is applied appropriately, as microscope is used to look at small items, it may allow to understand importance of specific mechanisms/assumptions in biological processes. Mathematical modeling can be less useful or even misleading if used inappropriately, for example, when a microscope is used to study stars. According to some philosophers (Oreskes et al., 1994), the best use of mathematical models is not when a model is used to confirm a hypothesis but rather when a model shows inconsistency of the model (defined by a specific set of assumptions) and data. Following the principle of strong inference for experimental sciences proposed by Platt (1964), I suggest "strong inference in mathematical modeling" as an effective and robust way of using mathematical modeling to understand mechanisms driving dynamics of biological systems. The major steps of strong inference in mathematical modeling are (1) to develop multiple alternative models for the phenomenon in question; (2) to compare the models with available experimental data and to determine which of the models are not consistent with the data; (3) to determine reasons why rejected models failed to explain the data, and (4) to suggest experiments which would allow to discriminate between remaining alternative models. The use of strong inference is likely to provide better robustness of predictions of mathematical models and it should be strongly encouraged in mathematical modeling-based publications in the Twenty-First century.
Strong Inference in Mathematical Modeling: A Method for Robust Science in the Twenty-First Century
Ganusov, Vitaly V.
2016-01-01
While there are many opinions on what mathematical modeling in biology is, in essence, modeling is a mathematical tool, like a microscope, which allows consequences to logically follow from a set of assumptions. Only when this tool is applied appropriately, as microscope is used to look at small items, it may allow to understand importance of specific mechanisms/assumptions in biological processes. Mathematical modeling can be less useful or even misleading if used inappropriately, for example, when a microscope is used to study stars. According to some philosophers (Oreskes et al., 1994), the best use of mathematical models is not when a model is used to confirm a hypothesis but rather when a model shows inconsistency of the model (defined by a specific set of assumptions) and data. Following the principle of strong inference for experimental sciences proposed by Platt (1964), I suggest “strong inference in mathematical modeling” as an effective and robust way of using mathematical modeling to understand mechanisms driving dynamics of biological systems. The major steps of strong inference in mathematical modeling are (1) to develop multiple alternative models for the phenomenon in question; (2) to compare the models with available experimental data and to determine which of the models are not consistent with the data; (3) to determine reasons why rejected models failed to explain the data, and (4) to suggest experiments which would allow to discriminate between remaining alternative models. The use of strong inference is likely to provide better robustness of predictions of mathematical models and it should be strongly encouraged in mathematical modeling-based publications in the Twenty-First century. PMID:27499750
Topographically driven groundwater flow and the San Andreas heat flow paradox revisited
Saffer, D.M.; Bekins, B.A.; Hickman, S.
2003-01-01
Evidence for a weak San Andreas Fault includes (1) borehole heat flow measurements that show no evidence for a frictionally generated heat flow anomaly and (2) the inferred orientation of ??1 nearly perpendicular to the fault trace. Interpretations of the stress orientation data remain controversial, at least in close proximity to the fault, leading some researchers to hypothesize that the San Andreas Fault is, in fact, strong and that its thermal signature may be removed or redistributed by topographically driven groundwater flow in areas of rugged topography, such as typify the San Andreas Fault system. To evaluate this scenario, we use a steady state, two-dimensional model of coupled heat and fluid flow within cross sections oriented perpendicular to the fault and to the primary regional topography. Our results show that existing heat flow data near Parkfield, California, do not readily discriminate between the expected thermal signature of a strong fault and that of a weak fault. In contrast, for a wide range of groundwater flow scenarios in the Mojave Desert, models that include frictional heat generation along a strong fault are inconsistent with existing heat flow data, suggesting that the San Andreas Fault at this location is indeed weak. In both areas, comparison of modeling results and heat flow data suggest that advective redistribution of heat is minimal. The robust results for the Mojave region demonstrate that topographically driven groundwater flow, at least in two dimensions, is inadequate to obscure the frictionally generated heat flow anomaly from a strong fault. However, our results do not preclude the possibility of transient advective heat transport associated with earthquakes.
A Robust Adaptive Autonomous Approach to Optimal Experimental Design
NASA Astrophysics Data System (ADS)
Gu, Hairong
Experimentation is the fundamental tool of scientific inquiries to understand the laws governing the nature and human behaviors. Many complex real-world experimental scenarios, particularly in quest of prediction accuracy, often encounter difficulties to conduct experiments using an existing experimental procedure for the following two reasons. First, the existing experimental procedures require a parametric model to serve as the proxy of the latent data structure or data-generating mechanism at the beginning of an experiment. However, for those experimental scenarios of concern, a sound model is often unavailable before an experiment. Second, those experimental scenarios usually contain a large number of design variables, which potentially leads to a lengthy and costly data collection cycle. Incompetently, the existing experimental procedures are unable to optimize large-scale experiments so as to minimize the experimental length and cost. Facing the two challenges in those experimental scenarios, the aim of the present study is to develop a new experimental procedure that allows an experiment to be conducted without the assumption of a parametric model while still achieving satisfactory prediction, and performs optimization of experimental designs to improve the efficiency of an experiment. The new experimental procedure developed in the present study is named robust adaptive autonomous system (RAAS). RAAS is a procedure for sequential experiments composed of multiple experimental trials, which performs function estimation, variable selection, reverse prediction and design optimization on each trial. Directly addressing the challenges in those experimental scenarios of concern, function estimation and variable selection are performed by data-driven modeling methods to generate a predictive model from data collected during the course of an experiment, thus exempting the requirement of a parametric model at the beginning of an experiment; design optimization is performed to select experimental designs on the fly of an experiment based on their usefulness so that fewest designs are needed to reach useful inferential conclusions. Technically, function estimation is realized by Bayesian P-splines, variable selection is realized by Bayesian spike-and-slab prior, reverse prediction is realized by grid-search and design optimization is realized by the concepts of active learning. The present study demonstrated that RAAS achieves statistical robustness by making accurate predictions without the assumption of a parametric model serving as the proxy of latent data structure while the existing procedures can draw poor statistical inferences if a misspecified model is assumed; RAAS also achieves inferential efficiency by taking fewer designs to acquire useful statistical inferences than non-optimal procedures. Thus, RAAS is expected to be a principled solution to real-world experimental scenarios pursuing robust prediction and efficient experimentation.
An algebra-based method for inferring gene regulatory networks.
Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard
2014-03-26
The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html.
Data Driven Model Development for the Supersonic Semispan Transport (S(sup 4)T)
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.
2011-01-01
We investigate two common approaches to model development for robust control synthesis in the aerospace community; namely, reduced order aeroservoelastic modelling based on structural finite-element and computational fluid dynamics based aerodynamic models and a data-driven system identification procedure. It is shown via analysis of experimental Super- Sonic SemiSpan Transport (S4T) wind-tunnel data using a system identification approach it is possible to estimate a model at a fixed Mach, which is parsimonious and robust across varying dynamic pressures.
Data-driven Inference and Investigation of Thermosphere Dynamics and Variations
NASA Astrophysics Data System (ADS)
Mehta, P. M.; Linares, R.
2017-12-01
This paper presents a methodology for data-driven inference and investigation of thermosphere dynamics and variations. The approach uses data-driven modal analysis to extract the most energetic modes of variations for neutral thermospheric species using proper orthogonal decomposition, where the time-independent modes or basis represent the dynamics and the time-depedent coefficients or amplitudes represent the model parameters. The data-driven modal analysis approach combined with sparse, discrete observations is used to infer amplitues for the dynamic modes and to calibrate the energy content of the system. In this work, two different data-types, namely the number density measurements from TIMED/GUVI and the mass density measurements from CHAMP/GRACE are simultaneously ingested for an accurate and self-consistent specification of the thermosphere. The assimilation process is achieved with a non-linear least squares solver and allows estimation/tuning of the model parameters or amplitudes rather than the driver. In this work, we use the Naval Research Lab's MSIS model to derive the most energetic modes for six different species, He, O, N2, O2, H, and N. We examine the dominant drivers of variations for helium in MSIS and observe that seasonal latitudinal variation accounts for about 80% of the dynamic energy with a strong preference of helium for the winter hemisphere. We also observe enhanced helium presence near the poles at GRACE altitudes during periods of low solar activity (Feb 2007) as previously deduced. We will also examine the storm-time response of helium derived from observations. The results are expected to be useful in tuning/calibration of the physics-based models.
Supervised dictionary learning for inferring concurrent brain networks.
Zhao, Shijie; Han, Junwei; Lv, Jinglei; Jiang, Xi; Hu, Xintao; Zhao, Yu; Ge, Bao; Guo, Lei; Liu, Tianming
2015-10-01
Task-based fMRI (tfMRI) has been widely used to explore functional brain networks via predefined stimulus paradigm in the fMRI scan. Traditionally, the general linear model (GLM) has been a dominant approach to detect task-evoked networks. However, GLM focuses on task-evoked or event-evoked brain responses and possibly ignores the intrinsic brain functions. In comparison, dictionary learning and sparse coding methods have attracted much attention recently, and these methods have shown the promise of automatically and systematically decomposing fMRI signals into meaningful task-evoked and intrinsic concurrent networks. Nevertheless, two notable limitations of current data-driven dictionary learning method are that the prior knowledge of task paradigm is not sufficiently utilized and that the establishment of correspondences among dictionary atoms in different brains have been challenging. In this paper, we propose a novel supervised dictionary learning and sparse coding method for inferring functional networks from tfMRI data, which takes both of the advantages of model-driven method and data-driven method. The basic idea is to fix the task stimulus curves as predefined model-driven dictionary atoms and only optimize the other portion of data-driven dictionary atoms. Application of this novel methodology on the publicly available human connectome project (HCP) tfMRI datasets has achieved promising results.
A robust interrupted time series model for analyzing complex health care intervention data.
Cruz, Maricela; Bender, Miriam; Ombao, Hernando
2017-12-20
Current health policy calls for greater use of evidence-based care delivery services to improve patient quality and safety outcomes. Care delivery is complex, with interacting and interdependent components that challenge traditional statistical analytic techniques, in particular, when modeling a time series of outcomes data that might be "interrupted" by a change in a particular method of health care delivery. Interrupted time series (ITS) is a robust quasi-experimental design with the ability to infer the effectiveness of an intervention that accounts for data dependency. Current standardized methods for analyzing ITS data do not model changes in variation and correlation following the intervention. This is a key limitation since it is plausible for data variability and dependency to change because of the intervention. Moreover, present methodology either assumes a prespecified interruption time point with an instantaneous effect or removes data for which the effect of intervention is not fully realized. In this paper, we describe and develop a novel robust interrupted time series (robust-ITS) model that overcomes these omissions and limitations. The robust-ITS model formally performs inference on (1) identifying the change point; (2) differences in preintervention and postintervention correlation; (3) differences in the outcome variance preintervention and postintervention; and (4) differences in the mean preintervention and postintervention. We illustrate the proposed method by analyzing patient satisfaction data from a hospital that implemented and evaluated a new nursing care delivery model as the intervention of interest. The robust-ITS model is implemented in an R Shiny toolbox, which is freely available to the community. Copyright © 2017 John Wiley & Sons, Ltd.
Causal inference with missing exposure information: Methods and applications to an obstetric study.
Zhang, Zhiwei; Liu, Wei; Zhang, Bo; Tang, Li; Zhang, Jun
2016-10-01
Causal inference in observational studies is frequently challenged by the occurrence of missing data, in addition to confounding. Motivated by the Consortium on Safe Labor, a large observational study of obstetric labor practice and birth outcomes, this article focuses on the problem of missing exposure information in a causal analysis of observational data. This problem can be approached from different angles (i.e. missing covariates and causal inference), and useful methods can be obtained by drawing upon the available techniques and insights in both areas. In this article, we describe and compare a collection of methods based on different modeling assumptions, under standard assumptions for missing data (i.e. missing-at-random and positivity) and for causal inference with complete data (i.e. no unmeasured confounding and another positivity assumption). These methods involve three models: one for treatment assignment, one for the dependence of outcome on treatment and covariates, and one for the missing data mechanism. In general, consistent estimation of causal quantities requires correct specification of at least two of the three models, although there may be some flexibility as to which two models need to be correct. Such flexibility is afforded by doubly robust estimators adapted from the missing covariates literature and the literature on causal inference with complete data, and by a newly developed triply robust estimator that is consistent if any two of the three models are correct. The methods are applied to the Consortium on Safe Labor data and compared in a simulation study mimicking the Consortium on Safe Labor. © The Author(s) 2013.
High Accuracy Monocular SFM and Scale Correction for Autonomous Driving.
Song, Shiyu; Chandraker, Manmohan; Guest, Clark C
2016-04-01
We present a real-time monocular visual odometry system that achieves high accuracy in real-world autonomous driving applications. First, we demonstrate robust monocular SFM that exploits multithreading to handle driving scenes with large motions and rapidly changing imagery. To correct for scale drift, we use known height of the camera from the ground plane. Our second contribution is a novel data-driven mechanism for cue combination that allows highly accurate ground plane estimation by adapting observation covariances of multiple cues, such as sparse feature matching and dense inter-frame stereo, based on their relative confidences inferred from visual data on a per-frame basis. Finally, we demonstrate extensive benchmark performance and comparisons on the challenging KITTI dataset, achieving accuracy comparable to stereo and exceeding prior monocular systems. Our SFM system is optimized to output pose within 50 ms in the worst case, while average case operation is over 30 fps. Our framework also significantly boosts the accuracy of applications like object localization that rely on the ground plane.
Dinov, Ivo D
2016-01-01
Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be 'team science'.
Bayesian Approaches to Imputation, Hypothesis Testing, and Parameter Estimation
ERIC Educational Resources Information Center
Ross, Steven J.; Mackey, Beth
2015-01-01
This chapter introduces three applications of Bayesian inference to common and novel issues in second language research. After a review of the critiques of conventional hypothesis testing, our focus centers on ways Bayesian inference can be used for dealing with missing data, for testing theory-driven substantive hypotheses without a default null…
Johnston, Iain G; Williams, Ben P
2016-02-24
Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Kyri; Dall'Anese, Emiliano; Summers, Tyler
This paper outlines a data-driven, distributionally robust approach to solve chance-constrained AC optimal power flow problems in distribution networks. Uncertain forecasts for loads and power generated by photovoltaic (PV) systems are considered, with the goal of minimizing PV curtailment while meeting power flow and voltage regulation constraints. A data- driven approach is utilized to develop a distributionally robust conservative convex approximation of the chance-constraints; particularly, the mean and covariance matrix of the forecast errors are updated online, and leveraged to enforce voltage regulation with predetermined probability via Chebyshev-based bounds. By combining an accurate linear approximation of the AC power flowmore » equations with the distributionally robust chance constraint reformulation, the resulting optimization problem becomes convex and computationally tractable.« less
Data Driven Model Development for the SuperSonic SemiSpan Transport (S(sup 4)T)
NASA Technical Reports Server (NTRS)
Kukreja, Sunil L.
2011-01-01
In this report, we will investigate two common approaches to model development for robust control synthesis in the aerospace community; namely, reduced order aeroservoelastic modelling based on structural finite-element and computational fluid dynamics based aerodynamic models, and a data-driven system identification procedure. It is shown via analysis of experimental SuperSonic SemiSpan Transport (S4T) wind-tunnel data that by using a system identification approach it is possible to estimate a model at a fixed Mach, which is parsimonious and robust across varying dynamic pressures.
A 1985-2015 data-driven global reconstruction of GRACE total water storage
NASA Astrophysics Data System (ADS)
Humphrey, Vincent; Gudmundsson, Lukas; Isabelle Seneviratne, Sonia
2016-04-01
After thirteen years of measurements, the Gravity Recovery and Climate Experiment (GRACE) mission has enabled for an unprecedented view on total water storage (TWS) variability. However, the relatively short record length, irregular time steps and multiple data gaps since 2011 still represent important limitations to a wider use of this dataset within the hydrological and climatological community especially for applications such as model evaluation or assimilation of GRACE in land surface models. To address this issue, we make use of the available GRACE record (2002-2015) to infer local statistical relationships between detrended monthly TWS anomalies and the main controlling atmospheric drivers (e.g. daily precipitation and temperature) at 1 degree resolution (Humphrey et al., in revision). Long-term and homogeneous monthly time series of detrended anomalies in total water storage are then reconstructed for the period 1985-2015. The quality of this reconstruction is evaluated in two different ways. First we perform a cross-validation experiment to assess the performance and robustness of the statistical model. Second we compare with independent basin-scale estimates of TWS anomalies derived by means of combined atmospheric and terrestrial water-balance using atmospheric water vapor flux convergence and change in atmospheric water vapor content (Mueller et al. 2011). The reconstructed time series are shown to provide robust data-driven estimates of global variations in water storage over large regions of the world. Example applications are provided for illustration, including an analysis of some selected major drought events which occurred before the GRACE era. References Humphrey V, Gudmundsson L, Seneviratne SI (in revision) Assessing global water storage variability from GRACE: trends, seasonal cycle, sub-seasonal anomalies and extremes. Surv Geophys Mueller B, Hirschi M, Seneviratne SI (2011) New diagnostic estimates of variations in terrestrial water storage based on ERA-Interim data. Hydrol Process 25:996-1008
An algebra-based method for inferring gene regulatory networks
2014-01-01
Background The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. Results This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. Conclusions Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html. PMID:24669835
USDA-ARS?s Scientific Manuscript database
Information embodied in ecological site descriptions and their state-and-transition models is crucial to effective land management, and as such is needed now. There is not time (or money) to employ a traditional research-based approach (i.e., inductive/deductive, hypothesis driven inference) to addr...
NASA Astrophysics Data System (ADS)
Paasche, Hendrik
2018-01-01
Site characterization requires detailed and ideally spatially continuous information about the subsurface. Geophysical tomographic experiments allow for spatially continuous imaging of physical parameter variations, e.g., seismic wave propagation velocities. Such physical parameters are often related to typical geotechnical or hydrological target parameters, e.g. as achieved from 1D direct push or borehole logging. Here, the probabilistic inference of 2D tip resistance, sleeve friction, and relative dielectric permittivity distributions in near-surface sediments is constrained by ill-posed cross-borehole seismic P- and S-wave and radar wave traveltime tomography. In doing so, we follow a discovery science strategy employing a fully data-driven approach capable of accounting for tomographic ambiguity and differences in spatial resolution between the geophysical tomograms and the geotechnical logging data used for calibration. We compare the outcome to results achieved employing classical hypothesis-driven approaches, i.e., deterministic transfer functions derived empirically for the inference of 2D sleeve friction from S-wave velocity tomograms and theoretically for the inference of 2D dielectric permittivity from radar wave velocity tomograms. The data-driven approach offers maximal flexibility in combination with very relaxed considerations about the character of the expected links. This makes it a versatile tool applicable to almost any combination of data sets. However, error propagation may be critical and justify thinking about a hypothesis-driven pre-selection of an optimal database going along with the risk of excluding relevant information from the analyses. Results achieved by transfer function rely on information about the nature of the link and optimal calibration settings drawn as retrospective hypothesis by other authors. Applying such transfer functions at other sites turns them into a priori valid hypothesis, which can, particularly for empirically derived transfer functions, result in poor predictions. However, a mindful utilization and critical evaluation of the consequences of turning a retrospectively drawn hypothesis into an a priori valid hypothesis can also result in good results for inference and prediction problems when using classical transfer function concepts.
Jin, Suoqin; MacLean, Adam L; Peng, Tao; Nie, Qing
2018-02-05
Single-cell RNA-sequencing (scRNA-seq) offers unprecedented resolution for studying cellular decision-making processes. Robust inference of cell state transition paths and probabilities is an important yet challenging step in the analysis of these data. Here we present scEpath, an algorithm that calculates energy landscapes and probabilistic directed graphs in order to reconstruct developmental trajectories. We quantify the energy landscape using "single-cell energy" and distance-based measures, and find that the combination of these enables robust inference of the transition probabilities and lineage relationships between cell states. We also identify marker genes and gene expression patterns associated with cell state transitions. Our approach produces pseudotemporal orderings that are - in combination - more robust and accurate than current methods, and offers higher resolution dynamics of the cell state transitions, leading to new insight into key transition events during differentiation and development. Moreover, scEpath is robust to variation in the size of the input gene set, and is broadly unsupervised, requiring few parameters to be set by the user. Applications of scEpath led to the identification of a cell-cell communication network implicated in early human embryo development, and novel transcription factors important for myoblast differentiation. scEpath allows us to identify common and specific temporal dynamics and transcriptional factor programs along branched lineages, as well as the transition probabilities that control cell fates. A MATLAB package of scEpath is available at https://github.com/sqjin/scEpath. qnie@uci.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Lasslop, G.; Reichstein, M.; Papale, D.; Richardson, A. D.
2009-12-01
The FLUXNET database provides measurements of the net ecosystem exchange (NEE) of carbon across vegetation types and climate regions. To simplify the interpretation in terms of processes the net exchange is frequently split up into the two main components: gross primary production (GPP) and ecosystem respiration (Reco). A strong relation between these two fluxes related derived from eddy covariance data was found across temporal scales and is to be expected as variation in recent photosynthesis is known to be correlated with root respiration; plants use energy from photosynthesis to drive the metabolism. At long time scales, substrate availability (constrained by past productivity) limits the whole-ecosystem respiration. Previous studies exploring this relationship relied on GPP and Reco estimates derived from the same data, this may lead to spurious correlation that must not be interpreted ecologically. In this study we use two estimates derived from disjunct datasets, one based on daytime data, the other on nighttime data and explore the reliability and robustness of this relationship. We find distinct relationship between the two, varying between vegetation types but also across temporal and spatial scales. We also infer that spatial and temporal variability of net ecosystem exchange is driven by GPP in many cases. Exceptions to this rule include for example disturbed sites. We advocate that for model calibration and evaluation not only the fluxes itself but also robust patterns between fluxes that can be extracted from the database, for instance between the flux components, should be considered.
Deng, Zhimin; Tian, Tianhai
2014-07-29
The advances of systems biology have raised a large number of sophisticated mathematical models for describing the dynamic property of complex biological systems. One of the major steps in developing mathematical models is to estimate unknown parameters of the model based on experimentally measured quantities. However, experimental conditions limit the amount of data that is available for mathematical modelling. The number of unknown parameters in mathematical models may be larger than the number of observation data. The imbalance between the number of experimental data and number of unknown parameters makes reverse-engineering problems particularly challenging. To address the issue of inadequate experimental data, we propose a continuous optimization approach for making reliable inference of model parameters. This approach first uses a spline interpolation to generate continuous functions of system dynamics as well as the first and second order derivatives of continuous functions. The expanded dataset is the basis to infer unknown model parameters using various continuous optimization criteria, including the error of simulation only, error of both simulation and the first derivative, or error of simulation as well as the first and second derivatives. We use three case studies to demonstrate the accuracy and reliability of the proposed new approach. Compared with the corresponding discrete criteria using experimental data at the measurement time points only, numerical results of the ERK kinase activation module show that the continuous absolute-error criteria using both function and high order derivatives generate estimates with better accuracy. This result is also supported by the second and third case studies for the G1/S transition network and the MAP kinase pathway, respectively. This suggests that the continuous absolute-error criteria lead to more accurate estimates than the corresponding discrete criteria. We also study the robustness property of these three models to examine the reliability of estimates. Simulation results show that the models with estimated parameters using continuous fitness functions have better robustness properties than those using the corresponding discrete fitness functions. The inference studies and robustness analysis suggest that the proposed continuous optimization criteria are effective and robust for estimating unknown parameters in mathematical models.
A brief introduction to mixed effects modelling and multi-model inference in ecology
Donaldson, Lynda; Correa-Cano, Maria Eugenia; Goodwin, Cecily E.D.
2018-01-01
The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions. PMID:29844961
A brief introduction to mixed effects modelling and multi-model inference in ecology.
Harrison, Xavier A; Donaldson, Lynda; Correa-Cano, Maria Eugenia; Evans, Julian; Fisher, David N; Goodwin, Cecily E D; Robinson, Beth S; Hodgson, David J; Inger, Richard
2018-01-01
The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.
A Robust Response of the Hadley Circulation to Global Warming
NASA Technical Reports Server (NTRS)
Lau, William K M.; Kim, Kyu-Myong
2014-01-01
Tropical rainfall is expected to increase in a warmer climate. Yet, recent studies have inferred that the Hadley Circulation (HC), which is primarily driven by latent heating from tropical rainfall, is weakened under global warming. Here, we show evidence of a robust intensification of the HC from analyses of 33 CMIP5 model projections under a scenario of 1 per year CO2 emission increase. The intensification is manifested in a deep-tropics squeeze, characterized by a pronounced increase in the zonal mean ascending motion in the mid and upper troposphere, a deepening and narrowing of the convective zone and enhanced rainfall in the deep tropics. These changes occur in conjunction with a rise in the region of maximum outflow of the HC, with accelerated meridional mass outflow in the uppermost branch of the HC away from the equator, coupled to a weakened inflow in the return branches of the HC in the lower troposphere.
Zhou, Ping; Guo, Dongwei; Wang, Hong; Chai, Tianyou
2017-09-29
Optimal operation of an industrial blast furnace (BF) ironmaking process largely depends on a reliable measurement of molten iron quality (MIQ) indices, which are not feasible using the conventional sensors. This paper proposes a novel data-driven robust modeling method for the online estimation and control of MIQ indices. First, a nonlinear autoregressive exogenous (NARX) model is constructed for the MIQ indices to completely capture the nonlinear dynamics of the BF process. Then, considering that the standard least-squares support vector regression (LS-SVR) cannot directly cope with the multioutput problem, a multitask transfer learning is proposed to design a novel multioutput LS-SVR (M-LS-SVR) for the learning of the NARX model. Furthermore, a novel M-estimator is proposed to reduce the interference of outliers and improve the robustness of the M-LS-SVR model. Since the weights of different outlier data are properly given by the weight function, their corresponding contributions on modeling can properly be distinguished, thus a robust modeling result can be achieved. Finally, a novel multiobjective evaluation index on the modeling performance is developed by comprehensively considering the root-mean-square error of modeling and the correlation coefficient on trend fitting, based on which the nondominated sorting genetic algorithm II is used to globally optimize the model parameters. Both experiments using industrial data and industrial applications illustrate that the proposed method can eliminate the adverse effect caused by the fluctuation of data in BF process efficiently. This indicates its stronger robustness and higher accuracy. Moreover, control testing shows that the developed model can be well applied to realize data-driven control of the BF process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ping; Guo, Dongwei; Wang, Hong
Optimal operation of an industrial blast furnace (BF) ironmaking process largely depends on a reliable measurement of molten iron quality (MIQ) indices, which are not feasible using the conventional sensors. This paper proposes a novel data-driven robust modeling method for the online estimation and control of MIQ indices. First, a nonlinear autoregressive exogenous (NARX) model is constructed for the MIQ indices to completely capture the nonlinear dynamics of the BF process. Then, considering that the standard least-squares support vector regression (LS-SVR) cannot directly cope with the multioutput problem, a multitask transfer learning is proposed to design a novel multioutput LS-SVRmore » (M-LS-SVR) for the learning of the NARX model. Furthermore, a novel M-estimator is proposed to reduce the interference of outliers and improve the robustness of the M-LS-SVR model. Since the weights of different outlier data are properly given by the weight function, their corresponding contributions on modeling can properly be distinguished, thus a robust modeling result can be achieved. Finally, a novel multiobjective evaluation index on the modeling performance is developed by comprehensively considering the root-mean-square error of modeling and the correlation coefficient on trend fitting, based on which the nondominated sorting genetic algorithm II is used to globally optimize the model parameters. Both experiments using industrial data and industrial applications illustrate that the proposed method can eliminate the adverse effect caused by the fluctuation of data in BF process efficiently. In conclusion, this indicates its stronger robustness and higher accuracy. Moreover, control testing shows that the developed model can be well applied to realize data-driven control of the BF process.« less
Zhou, Ping; Guo, Dongwei; Wang, Hong; ...
2017-09-29
Optimal operation of an industrial blast furnace (BF) ironmaking process largely depends on a reliable measurement of molten iron quality (MIQ) indices, which are not feasible using the conventional sensors. This paper proposes a novel data-driven robust modeling method for the online estimation and control of MIQ indices. First, a nonlinear autoregressive exogenous (NARX) model is constructed for the MIQ indices to completely capture the nonlinear dynamics of the BF process. Then, considering that the standard least-squares support vector regression (LS-SVR) cannot directly cope with the multioutput problem, a multitask transfer learning is proposed to design a novel multioutput LS-SVRmore » (M-LS-SVR) for the learning of the NARX model. Furthermore, a novel M-estimator is proposed to reduce the interference of outliers and improve the robustness of the M-LS-SVR model. Since the weights of different outlier data are properly given by the weight function, their corresponding contributions on modeling can properly be distinguished, thus a robust modeling result can be achieved. Finally, a novel multiobjective evaluation index on the modeling performance is developed by comprehensively considering the root-mean-square error of modeling and the correlation coefficient on trend fitting, based on which the nondominated sorting genetic algorithm II is used to globally optimize the model parameters. Both experiments using industrial data and industrial applications illustrate that the proposed method can eliminate the adverse effect caused by the fluctuation of data in BF process efficiently. In conclusion, this indicates its stronger robustness and higher accuracy. Moreover, control testing shows that the developed model can be well applied to realize data-driven control of the BF process.« less
Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark
2013-01-01
Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Value Driven Information Processing and Fusion
2016-03-01
consensus approach allows a decentralized approach to achieve the optimal error exponent of the centralized counterpart, a conclusion that is signifi...SECURITY CLASSIFICATION OF: The objective of the project is to develop a general framework for value driven decentralized information processing...including: optimal data reduction in a network setting for decentralized inference with quantization constraint; interactive fusion that allows queries and
Robust Long-Range Coordination of Spontaneous Neural Activity in Waking, Sleep and Anesthesia.
Liu, Xiao; Yanagawa, Toru; Leopold, David A; Fujii, Naotaka; Duyn, Jeff H
2015-09-01
Although the emerging field of functional connectomics relies increasingly on the analysis of spontaneous fMRI signal covariation to infer the spatial fingerprint of the brain's large-scale functional networks, the nature of the underlying neuro-electrical activity remains incompletely understood. In part, this lack in understanding owes to the invasiveness of electrophysiological acquisition, the difficulty in their simultaneous recording over large cortical areas, and the absence of fully established methods for unbiased extraction of network information from these data. Here, we demonstrate a novel, data-driven approach to analyze spontaneous signal variations in electrocorticographic (ECoG) recordings from nearly entire hemispheres of macaque monkeys. Based on both broadband analysis and analysis of specific frequency bands, the ECoG signals were found to co-vary in patterns that resembled the fMRI networks reported in previous studies. The extracted patterns were robust against changes in consciousness associated with sleep and anesthesia, despite profound changes in intrinsic characteristics of the raw signals, including their spectral signatures. These results suggest that the spatial organization of large-scale brain networks results from neural activity with a broadband spectral feature and is a core aspect of the brain's physiology that does not depend on the state of consciousness. Published by Oxford University Press 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines
Neftci, Emre O.; Augustine, Charles; Paul, Somnath; Detorakis, Georgios
2017-01-01
An ongoing challenge in neuromorphic computing is to devise general and computationally efficient models of inference and learning which are compatible with the spatial and temporal constraints of the brain. One increasingly popular and successful approach is to take inspiration from inference and learning algorithms used in deep neural networks. However, the workhorse of deep learning, the gradient descent Gradient Back Propagation (BP) rule, often relies on the immediate availability of network-wide information stored with high-precision memory during learning, and precise operations that are difficult to realize in neuromorphic hardware. Remarkably, recent work showed that exact backpropagated gradients are not essential for learning deep representations. Building on these results, we demonstrate an event-driven random BP (eRBP) rule that uses an error-modulated synaptic plasticity for learning deep representations. Using a two-compartment Leaky Integrate & Fire (I&F) neuron, the rule requires only one addition and two comparisons for each synaptic weight, making it very suitable for implementation in digital or mixed-signal neuromorphic hardware. Our results show that using eRBP, deep representations are rapidly learned, achieving classification accuracies on permutation invariant datasets comparable to those obtained in artificial neural network simulations on GPUs, while being robust to neural and synaptic state quantizations during learning. PMID:28680387
Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines.
Neftci, Emre O; Augustine, Charles; Paul, Somnath; Detorakis, Georgios
2017-01-01
An ongoing challenge in neuromorphic computing is to devise general and computationally efficient models of inference and learning which are compatible with the spatial and temporal constraints of the brain. One increasingly popular and successful approach is to take inspiration from inference and learning algorithms used in deep neural networks. However, the workhorse of deep learning, the gradient descent Gradient Back Propagation (BP) rule, often relies on the immediate availability of network-wide information stored with high-precision memory during learning, and precise operations that are difficult to realize in neuromorphic hardware. Remarkably, recent work showed that exact backpropagated gradients are not essential for learning deep representations. Building on these results, we demonstrate an event-driven random BP (eRBP) rule that uses an error-modulated synaptic plasticity for learning deep representations. Using a two-compartment Leaky Integrate & Fire (I&F) neuron, the rule requires only one addition and two comparisons for each synaptic weight, making it very suitable for implementation in digital or mixed-signal neuromorphic hardware. Our results show that using eRBP, deep representations are rapidly learned, achieving classification accuracies on permutation invariant datasets comparable to those obtained in artificial neural network simulations on GPUs, while being robust to neural and synaptic state quantizations during learning.
Distribution-Agnostic Stochastic Optimal Power Flow for Distribution Grids: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Kyri; Dall'Anese, Emiliano; Summers, Tyler
2016-09-01
This paper outlines a data-driven, distributionally robust approach to solve chance-constrained AC optimal power flow problems in distribution networks. Uncertain forecasts for loads and power generated by photovoltaic (PV) systems are considered, with the goal of minimizing PV curtailment while meeting power flow and voltage regulation constraints. A data- driven approach is utilized to develop a distributionally robust conservative convex approximation of the chance-constraints; particularly, the mean and covariance matrix of the forecast errors are updated online, and leveraged to enforce voltage regulation with predetermined probability via Chebyshev-based bounds. By combining an accurate linear approximation of the AC power flowmore » equations with the distributionally robust chance constraint reformulation, the resulting optimization problem becomes convex and computationally tractable.« less
Free will in Bayesian and inverse Bayesian inference-driven endo-consciousness.
Gunji, Yukio-Pegio; Minoura, Mai; Kojima, Kei; Horry, Yoichi
2017-12-01
How can we link challenging issues related to consciousness and/or qualia with natural science? The introduction of endo-perspective, instead of exo-perspective, as proposed by Matsuno, Rössler, and Gunji, is considered one of the most promising candidate approaches. Here, we distinguish the endo-from the exo-perspective in terms of whether the external is or is not directly operated. In the endo-perspective, the external can be neither perceived nor recognized directly; rather, one can only indirectly summon something outside of the perspective, which can be illustrated by a causation-reversal pair. On one hand, causation logically proceeds from the cause to the effect. On the other hand, a reversal from the effect to the cause is non-logical and is equipped with a metaphorical structure. We argue that the differences in exo- and endo-perspectives result not from the difference between Western and Eastern cultures, but from differences between modernism and animism. Here, a causation-reversal pair described using a pair of upward (from premise to consequence) and downward (from consequence to premise) causation and a pair of Bayesian and inverse Bayesian inference (BIB inference). Accordingly, the notion of endo-consciousness is proposed as an agent equipped with BIB inference. We also argue that BIB inference can yield both highly efficient computations through Bayesian interference and robust computations through inverse Bayesian inference. By adapting a logical model of the free will theorem to the BIB inference, we show that endo-consciousness can explain free will as a regression of the controllability of voluntary action. Copyright © 2017. Published by Elsevier Ltd.
Albert, Carlo; Ulzega, Simone; Stoop, Ruedi
2016-04-01
Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.
Data fusion and classification using a hybrid intrinsic cellular inference network
NASA Astrophysics Data System (ADS)
Woodley, Robert; Walenz, Brett; Seiffertt, John; Robinette, Paul; Wunsch, Donald
2010-04-01
Hybrid Intrinsic Cellular Inference Network (HICIN) is designed for battlespace decision support applications. We developed an automatic method of generating hypotheses for an entity-attribute classifier. The capability and effectiveness of a domain specific ontology was used to generate automatic categories for data classification. Heterogeneous data is clustered using an Adaptive Resonance Theory (ART) inference engine on a sample (unclassified) data set. The data set is the Lahman baseball database. The actual data is immaterial to the architecture, however, parallels in the data can be easily drawn (i.e., "Team" maps to organization, "Runs scored/allowed" to Measure of organization performance (positive/negative), "Payroll" to organization resources, etc.). Results show that HICIN classifiers create known inferences from the heterogonous data. These inferences are not explicitly stated in the ontological description of the domain and are strictly data driven. HICIN uses data uncertainty handling to reduce errors in the classification. The uncertainty handling is based on subjective logic. The belief mass allows evidence from multiple sources to be mathematically combined to increase or discount an assertion. In military operations the ability to reduce uncertainty will be vital in the data fusion operation.
Robust nonlinear system identification: Bayesian mixture of experts using the t-distribution
NASA Astrophysics Data System (ADS)
Baldacchino, Tara; Worden, Keith; Rowson, Jennifer
2017-02-01
A novel variational Bayesian mixture of experts model for robust regression of bifurcating and piece-wise continuous processes is introduced. The mixture of experts model is a powerful model which probabilistically splits the input space allowing different models to operate in the separate regions. However, current methods have no fail-safe against outliers. In this paper, a robust mixture of experts model is proposed which consists of Student-t mixture models at the gates and Student-t distributed experts, trained via Bayesian inference. The Student-t distribution has heavier tails than the Gaussian distribution, and so it is more robust to outliers, noise and non-normality in the data. Using both simulated data and real data obtained from the Z24 bridge this robust mixture of experts performs better than its Gaussian counterpart when outliers are present. In particular, it provides robustness to outliers in two forms: unbiased parameter regression models, and robustness to overfitting/complex models.
Robust Smoothing: Smoothing Parameter Selection and Applications to Fluorescence Spectroscopy∂
Lee, Jong Soo; Cox, Dennis D.
2009-01-01
Fluorescence spectroscopy has emerged in recent years as an effective way to detect cervical cancer. Investigation of the data preprocessing stage uncovered a need for a robust smoothing to extract the signal from the noise. Various robust smoothing methods for estimating fluorescence emission spectra are compared and data driven methods for the selection of smoothing parameter are suggested. The methods currently implemented in R for smoothing parameter selection proved to be unsatisfactory, and a computationally efficient procedure that approximates robust leave-one-out cross validation is presented. PMID:20729976
iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.
Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades
2014-01-01
Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.
NASA Astrophysics Data System (ADS)
El Houda Thabet, Rihab; Combastel, Christophe; Raïssi, Tarek; Zolghadri, Ali
2015-09-01
The paper develops a set membership detection methodology which is applied to the detection of abnormal positions of aircraft control surfaces. Robust and early detection of such abnormal positions is an important issue for early system reconfiguration and overall optimisation of aircraft design. In order to improve fault sensitivity while ensuring a high level of robustness, the method combines a data-driven characterisation of noise and a model-driven approach based on interval prediction. The efficiency of the proposed methodology is illustrated through simulation results obtained based on data recorded in several flight scenarios of a highly representative aircraft benchmark.
Application of AI techniques to infer vegetation characteristics from directional reflectance(s)
NASA Technical Reports Server (NTRS)
Kimes, D. S.; Smith, J. A.; Harrison, P. A.; Harrison, P. R.
1994-01-01
Traditionally, the remote sensing community has relied totally on spectral knowledge to extract vegetation characteristics. However, there are other knowledge bases (KB's) that can be used to significantly improve the accuracy and robustness of inference techniques. Using AI (artificial intelligence) techniques a KB system (VEG) was developed that integrates input spectral measurements with diverse KB's. These KB's consist of data sets of directional reflectance measurements, knowledge from literature, and knowledge from experts which are combined into an intelligent and efficient system for making vegetation inferences. VEG accepts spectral data of an unknown target as input, determines the best techniques for inferring the desired vegetation characteristic(s), applies the techniques to the target data, and provides a rigorous estimate of the accuracy of the inference. VEG was developed to: infer spectral hemispherical reflectance from any combination of nadir and/or off-nadir view angles; infer percent ground cover from any combination of nadir and/or off-nadir view angles; infer unknown view angle(s) from known view angle(s) (known as view angle extension); and discriminate between user defined vegetation classes using spectral and directional reflectance relationships developed from an automated learning algorithm. The errors for these techniques were generally very good ranging between 2 to 15% (proportional root mean square). The system is designed to aid scientists in developing, testing, and applying new inference techniques using directional reflectance data.
Mobility timing for agent communities, a cue for advanced connectionist systems.
Apolloni, Bruno; Bassis, Simone; Pagani, Elena; Rossi, Gian Paolo; Valerio, Lorenzo
2011-12-01
We introduce a wait-and-chase scheme that models the contact times between moving agents within a connectionist construct. The idea that elementary processors move within a network to get a proper position is borne out both by biological neurons in the brain morphogenesis and by agents within social networks. From the former, we take inspiration to devise a medium-term project for new artificial neural network training procedures where mobile neurons exchange data only when they are close to one another in a proper space (are in contact). From the latter, we accumulate mobility tracks experience. We focus on the preliminary step of characterizing the elapsed time between neuron contacts, which results from a spatial process fitting in the family of random processes with memory, where chasing neurons are stochastically driven by the goal of hitting target neurons. Thus, we add an unprecedented mobility model to the literature in the field, introducing a distribution law of the intercontact times that merges features of both negative exponential and Pareto distribution laws. We give a constructive description and implementation of our model, as well as a short analytical form whose parameters are suitably estimated in terms of confidence intervals from experimental data. Numerical experiments show the model and related inference tools to be sufficiently robust to cope with two main requisites for its exploitation in a neural network: the nonindependence of the observed intercontact times and the feasibility of the model inversion problem to infer suitable mobility parameters.
Genie: An Inference Engine with Applications to Vulnerability Analysis.
1986-06-01
Stanford Artifcial intelligence Laboratory, 1976. 15 D. A. Waterman and F. Hayes-Roth, eds. Pattern-Directed Inference Systems. Academic Press, Inc...Continue an reverse aide It nlecessary mid Identify by block rnmbor) ; f Expert Systems Artificial Intelligence % Vulnerability Analysis Knowledge...deduction it is used wherever possible in data -driven mode (forward chaining). Production rules - JIM 0 g79OOFMV55@S I INCLASSTpnF SECURITY CLASSIFICATION OF
Use of Network Inference to Elucidate Common and Chemical-specific Effects on Steoidogenesis
Microarray data is a key source for modeling gene regulatory interactions. Regulatory network models based on multiple datasets are potentially more robust and can provide greater confidence. In this study, we used network modeling on microarray data generated by exposing the fat...
Zhang, Huaguang; Cui, Lili; Zhang, Xin; Luo, Yanhong
2011-12-01
In this paper, a novel data-driven robust approximate optimal tracking control scheme is proposed for unknown general nonlinear systems by using the adaptive dynamic programming (ADP) method. In the design of the controller, only available input-output data is required instead of known system dynamics. A data-driven model is established by a recurrent neural network (NN) to reconstruct the unknown system dynamics using available input-output data. By adding a novel adjustable term related to the modeling error, the resultant modeling error is first guaranteed to converge to zero. Then, based on the obtained data-driven model, the ADP method is utilized to design the approximate optimal tracking controller, which consists of the steady-state controller and the optimal feedback controller. Further, a robustifying term is developed to compensate for the NN approximation errors introduced by implementing the ADP method. Based on Lyapunov approach, stability analysis of the closed-loop system is performed to show that the proposed controller guarantees the system state asymptotically tracking the desired trajectory. Additionally, the obtained control input is proven to be close to the optimal control input within a small bound. Finally, two numerical examples are used to demonstrate the effectiveness of the proposed control scheme.
2017-01-01
The diversity of microbiota is best explored by understanding the phylogenetic structure of the microbial communities. Traditionally, sequence alignment has been used for phylogenetic inference. However, alignment-based approaches come with significant challenges and limitations when massive amounts of data are analyzed. In the recent decade, alignment-free approaches have enabled genome-scale phylogenetic inference. Here we evaluate three alignment-free methods: ACS, CVTree, and Kr for phylogenetic inference with 16s rRNA gene data. We use a taxonomic gold standard to compare the accuracy of alignment-free phylogenetic inference with that of common microbiome-wide phylogenetic inference pipelines based on PyNAST and MUSCLE alignments with FastTree and RAxML. We re-simulate fecal communities from Human Microbiome Project data to evaluate the performance of the methods on datasets with properties of real data. Our comparisons show that alignment-free methods are not inferior to alignment-based methods in giving accurate and robust phylogenic trees. Moreover, consensus ensembles of alignment-free phylogenies are superior to those built from alignment-based methods in their ability to highlight community differences in low power settings. In addition, the overall running times of alignment-based and alignment-free phylogenetic inference are comparable. Taken together our empirical results suggest that alignment-free methods provide a viable approach for microbiome-wide phylogenetic inference. PMID:29136663
Data-driven sensitivity inference for Thomson scattering electron density measurement systems.
Fujii, Keisuke; Yamada, Ichihiro; Hasuo, Masahiro
2017-01-01
We developed a method to infer the calibration parameters of multichannel measurement systems, such as channel variations of sensitivity and noise amplitude, from experimental data. We regard such uncertainties of the calibration parameters as dependent noise. The statistical properties of the dependent noise and that of the latent functions were modeled and implemented in the Gaussian process kernel. Based on their statistical difference, both parameters were inferred from the data. We applied this method to the electron density measurement system by Thomson scattering for the Large Helical Device plasma, which is equipped with 141 spatial channels. Based on the 210 sets of experimental data, we evaluated the correction factor of the sensitivity and noise amplitude for each channel. The correction factor varies by ≈10%, and the random noise amplitude is ≈2%, i.e., the measurement accuracy increases by a factor of 5 after this sensitivity correction. The certainty improvement in the spatial derivative inference was demonstrated.
Automatic physical inference with information maximizing neural networks
NASA Astrophysics Data System (ADS)
Charnock, Tom; Lavaux, Guilhem; Wandelt, Benjamin D.
2018-04-01
Compressing large data sets to a manageable number of summaries that are informative about the underlying parameters vastly simplifies both frequentist and Bayesian inference. When only simulations are available, these summaries are typically chosen heuristically, so they may inadvertently miss important information. We introduce a simulation-based machine learning technique that trains artificial neural networks to find nonlinear functionals of data that maximize Fisher information: information maximizing neural networks (IMNNs). In test cases where the posterior can be derived exactly, likelihood-free inference based on automatically derived IMNN summaries produces nearly exact posteriors, showing that these summaries are good approximations to sufficient statistics. In a series of numerical examples of increasing complexity and astrophysical relevance we show that IMNNs are robustly capable of automatically finding optimal, nonlinear summaries of the data even in cases where linear compression fails: inferring the variance of Gaussian signal in the presence of noise, inferring cosmological parameters from mock simulations of the Lyman-α forest in quasar spectra, and inferring frequency-domain parameters from LISA-like detections of gravitational waveforms. In this final case, the IMNN summary outperforms linear data compression by avoiding the introduction of spurious likelihood maxima. We anticipate that the automatic physical inference method described in this paper will be essential to obtain both accurate and precise cosmological parameter estimates from complex and large astronomical data sets, including those from LSST and Euclid.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Raedt, Hans; Katsnelson, Mikhail I.; Donker, Hylke C.
It is shown that the Pauli equation and the concept of spin naturally emerge from logical inference applied to experiments on a charged particle under the conditions that (i) space is homogeneous (ii) the observed events are logically independent, and (iii) the observed frequency distributions are robust with respect to small changes in the conditions under which the experiment is carried out. The derivation does not take recourse to concepts of quantum theory and is based on the same principles which have already been shown to lead to e.g. the Schrödinger equation and the probability distributions of pairs of particles inmore » the singlet or triplet state. Application to Stern–Gerlach experiments with chargeless, magnetic particles, provides additional support for the thesis that quantum theory follows from logical inference applied to a well-defined class of experiments. - Highlights: • The Pauli equation is obtained through logical inference applied to robust experiments on a charged particle. • The concept of spin appears as an inference resulting from the treatment of two-valued data. • The same reasoning yields the quantum theoretical description of neutral magnetic particles. • Logical inference provides a framework to establish a bridge between objective knowledge gathered through experiments and their description in terms of concepts.« less
Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E
2015-09-29
In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
Caley, Peter; Ramsey, David S L; Barry, Simon C
2015-01-01
A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes--the last of which occurred in 2006, or hunter killed foxes--the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought.
Caley, Peter; Ramsey, David S. L.; Barry, Simon C.
2015-01-01
A recent study has inferred that the red fox (Vulpes vulpes) is now widespread in Tasmania as of 2010, based on the extraction of fox DNA from predator scats. Heuristically, this inference appears at first glance to be at odds with the lack of recent confirmed discoveries of either road-killed foxes—the last of which occurred in 2006, or hunter killed foxes—the most recent in 2001. This paper demonstrates a method to codify this heuristic analysis and produce inferences consistent with assumptions and data. It does this by formalising the analysis in a transparent and repeatable manner to make inference on the past, present and future distribution of an invasive species. It utilizes Approximate Bayesian Computation to make inferences. Importantly, the method is able to inform management of invasive species within realistic time frames, and can be applied widely. We illustrate the technique using the Tasmanian fox data. Based on the pattern of carcass discoveries of foxes in Tasmania, we infer that the population of foxes in Tasmania is most likely extinct, or restricted in distribution and demographically weak as of 2013. It is possible, though unlikely, that that population is widespread and/or demographically robust. This inference is largely at odds with the inference from the predator scat survey data. Our results suggest the chances of successfully eradicating the introduced red fox population in Tasmania may be significantly higher than previously thought. PMID:25602618
Robust Inference of Risks of Large Portfolios
Fan, Jianqing; Han, Fang; Liu, Han; Vickers, Byron
2016-01-01
We propose a bootstrap-based robust high-confidence level upper bound (Robust H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits rank-based and quantile-based estimators, and can be viewed as a robust extension of the H-CLUB procedure (Fan et al., 2015). Such an extension allows us to handle possibly misspecified models and heavy-tailed data, which are stylized features in financial returns. Under mixing conditions, we analyze the proposed approach and demonstrate its advantage over H-CLUB. We further provide thorough numerical results to back up the developed theory, and also apply the proposed method to analyze a stock market dataset. PMID:27818569
Conomos, Matthew P; Miller, Michael B; Thornton, Timothy A
2015-05-01
Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness. © 2015 WILEY PERIODICALS, INC.
Fisher information framework for time series modeling
NASA Astrophysics Data System (ADS)
Venkatesan, R. C.; Plastino, A.
2017-08-01
A robust prediction model invoking the Takens embedding theorem, whose working hypothesis is obtained via an inference procedure based on the minimum Fisher information principle, is presented. The coefficients of the ansatz, central to the working hypothesis satisfy a time independent Schrödinger-like equation in a vector setting. The inference of (i) the probability density function of the coefficients of the working hypothesis and (ii) the establishing of constraint driven pseudo-inverse condition for the modeling phase of the prediction scheme, is made, for the case of normal distributions, with the aid of the quantum mechanical virial theorem. The well-known reciprocity relations and the associated Legendre transform structure for the Fisher information measure (FIM, hereafter)-based model in a vector setting (with least square constraints) are self-consistently derived. These relations are demonstrated to yield an intriguing form of the FIM for the modeling phase, which defines the working hypothesis, solely in terms of the observed data. Cases for prediction employing time series' obtained from the: (i) the Mackey-Glass delay-differential equation, (ii) one ECG signal from the MIT-Beth Israel Deaconess Hospital (MIT-BIH) cardiac arrhythmia database, and (iii) one ECG signal from the Creighton University ventricular tachyarrhythmia database. The ECG samples were obtained from the Physionet online repository. These examples demonstrate the efficiency of the prediction model. Numerical examples for exemplary cases are provided.
Model-free information-theoretic approach to infer leadership in pairs of zebrafish.
Butail, Sachit; Mwaffo, Violet; Porfiri, Maurizio
2016-04-01
Collective behavior affords several advantages to fish in avoiding predators, foraging, mating, and swimming. Although fish schools have been traditionally considered egalitarian superorganisms, a number of empirical observations suggest the emergence of leadership in gregarious groups. Detecting and classifying leader-follower relationships is central to elucidate the behavioral and physiological causes of leadership and understand its consequences. Here, we demonstrate an information-theoretic approach to infer leadership from positional data of fish swimming. In this framework, we measure social interactions between fish pairs through the mathematical construct of transfer entropy, which quantifies the predictive power of a time series to anticipate another, possibly coupled, time series. We focus on the zebrafish model organism, which is rapidly emerging as a species of choice in preclinical research for its genetic similarity to humans and reduced neurobiological complexity with respect to mammals. To overcome experimental confounds and generate test data sets on which we can thoroughly assess our approach, we adapt and calibrate a data-driven stochastic model of zebrafish motion for the simulation of a coupled dynamical system of zebrafish pairs. In this synthetic data set, the extent and direction of the coupling between the fish are systematically varied across a wide parameter range to demonstrate the accuracy and reliability of transfer entropy in inferring leadership. Our approach is expected to aid in the analysis of collective behavior, providing a data-driven perspective to understand social interactions.
Model-free information-theoretic approach to infer leadership in pairs of zebrafish
NASA Astrophysics Data System (ADS)
Butail, Sachit; Mwaffo, Violet; Porfiri, Maurizio
2016-04-01
Collective behavior affords several advantages to fish in avoiding predators, foraging, mating, and swimming. Although fish schools have been traditionally considered egalitarian superorganisms, a number of empirical observations suggest the emergence of leadership in gregarious groups. Detecting and classifying leader-follower relationships is central to elucidate the behavioral and physiological causes of leadership and understand its consequences. Here, we demonstrate an information-theoretic approach to infer leadership from positional data of fish swimming. In this framework, we measure social interactions between fish pairs through the mathematical construct of transfer entropy, which quantifies the predictive power of a time series to anticipate another, possibly coupled, time series. We focus on the zebrafish model organism, which is rapidly emerging as a species of choice in preclinical research for its genetic similarity to humans and reduced neurobiological complexity with respect to mammals. To overcome experimental confounds and generate test data sets on which we can thoroughly assess our approach, we adapt and calibrate a data-driven stochastic model of zebrafish motion for the simulation of a coupled dynamical system of zebrafish pairs. In this synthetic data set, the extent and direction of the coupling between the fish are systematically varied across a wide parameter range to demonstrate the accuracy and reliability of transfer entropy in inferring leadership. Our approach is expected to aid in the analysis of collective behavior, providing a data-driven perspective to understand social interactions.
Robust inference in the negative binomial regression model with an application to falls data.
Aeberhard, William H; Cantoni, Eva; Heritier, Stephane
2014-12-01
A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.
Boolean dynamics of genetic regulatory networks inferred from microarray time series data
Martin, Shawn; Zhang, Zhaoduo; Martino, Anthony; ...
2007-01-31
Methods available for the inference of genetic regulatory networks strive to produce a single network, usually by optimizing some quantity to fit the experimental observations. In this paper we investigate the possibility that multiple networks can be inferred, all resulting in similar dynamics. This idea is motivated by theoretical work which suggests that biological networks are robust and adaptable to change, and that the overall behavior of a genetic regulatory network might be captured in terms of dynamical basins of attraction. We have developed and implemented a method for inferring genetic regulatory networks for time series microarray data. Our methodmore » first clusters and discretizes the gene expression data using k-means and support vector regression. We then enumerate Boolean activation–inhibition networks to match the discretized data. In conclusion, the dynamics of the Boolean networks are examined. We have tested our method on two immunology microarray datasets: an IL-2-stimulated T cell response dataset and a LPS-stimulated macrophage response dataset. In both cases, we discovered that many networks matched the data, and that most of these networks had similar dynamics.« less
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Huang, Frank T.; Mayr, Hans; Russell, James; Mlynczak, Marty; Reber, Carl A.
2005-01-01
In the Numerical Spectral Model (NSM, Mayr et al., 2003), small-scale gravity waves propagating in the north/south direction can generate zonal mean (m = 0) meridional wind oscillations with periods between 2 and 4 months. These oscillations tend to be confined to low latitudes and have been interpreted to be the meridional counterpart of the wave-driven Quasi Biennial Oscillation in the zonal circulation. Wave driven meridional winds across the equator should generate, due to dynamical heating and cooling, temperature oscillations with opposite phase in the two hemispheres. We have analyzed SABER temperature measurements in the altitude range between 55 and 95 km to investigate the existence such variations. Because there are also strong tidal signatures (up to approximately 20 K) in the data, our algorithm estimates both mean values and tides together from the data. Based on SABER temperature data, the intra-annual variations with periods between 2 and 4 months can have amplitudes up to 5 K or more, depending on the altitude. Their amplitudes are in qualitative agreement with those inferred Erom UARS data (from different years). The SABER temperature variations also reveal pronounced hemispherical asymmetries, which are qualitatively consistent with wave driven meridional wind oscillations across the equator. Oscillations with similar periods have been seen in the meridional winds based on UARS data (Huang and Reber, 2003).
Data-driven reconstruction of directed networks
NASA Astrophysics Data System (ADS)
Hempel, Sabrina; Koseska, Aneta; Nikoloski, Zoran
2013-06-01
We investigate the properties of a recently introduced asymmetric association measure, called inner composition alignment (IOTA), aimed at inferring regulatory links (couplings). We show that the measure can be used to determine the direction of coupling, detect superfluous links, and to account for autoregulation. In addition, the measure can be extended to infer the type of regulation (positive or negative). The capabilities of IOTA to correctly infer couplings together with their directionality are compared against Kendall's rank correlation for time series of different lengths, particularly focussing on biological examples. We demonstrate that an extended version of the measure, bidirectional inner composition alignment (biIOTA), increases the accuracy of the network reconstruction for short time series. Finally, we discuss the applicability of the measure to infer couplings in chaotic systems.
Arenas, Miguel
2015-04-01
NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.
NASA Astrophysics Data System (ADS)
Ameur, Mourad; Derras, Boumédiène; Zendagui, Djawed
2018-03-01
Adaptive neuro-fuzzy inference systems (ANFIS) are used here to obtain the robust ground motion prediction model (GMPM). Avoiding a priori functional form, ANFIS provides fully data-driven predictive models. A large subset of the NGA-West2 database is used, including 2335 records from 580 sites and 137 earthquakes. Only shallow earthquakes and recordings corresponding to stations with measured V s30 properties are selected. Three basics input parameters are chosen: the moment magnitude ( Mw), the Joyner-Boore distance ( R JB) and V s30. ANFIS model output is the peak ground acceleration (PGA), peak ground velocity (PGV) and 5% damped pseudo-spectral acceleration (PSA) at periods from 0.01 to 4 s. A procedure similar to the random-effects approach is developed to provide between- and within-event standard deviations. The total standard deviation (SD) varies between [0.303 and 0.360] (log10 units) depending on the period. The ground motion predictions resulting from such simple three explanatory variables ANFIS models are shown to be comparable to the most recent NGA results (e.g., Boore et al., in Earthquake Spectra 30:1057-1085, 2014; Derras et al., in Earthquake Spectra 32:2027-2056, 2016). The main advantage of ANFIS compared to artificial neuronal network (ANN) is its simple and one-off topology: five layers. Our results exhibit a number of physically sound features: magnitude scaling of the distance dependency, near-fault saturation distance increasing with magnitude and amplification on soft soils. The ability to implement ANFIS model using an analytic equation and Excel is demonstrated.
NASA Astrophysics Data System (ADS)
Ameur, Mourad; Derras, Boumédiène; Zendagui, Djawed
2017-12-01
Adaptive neuro-fuzzy inference systems (ANFIS) are used here to obtain the robust ground motion prediction model (GMPM). Avoiding a priori functional form, ANFIS provides fully data-driven predictive models. A large subset of the NGA-West2 database is used, including 2335 records from 580 sites and 137 earthquakes. Only shallow earthquakes and recordings corresponding to stations with measured V s30 properties are selected. Three basics input parameters are chosen: the moment magnitude (Mw), the Joyner-Boore distance (R JB) and V s30. ANFIS model output is the peak ground acceleration (PGA), peak ground velocity (PGV) and 5% damped pseudo-spectral acceleration (PSA) at periods from 0.01 to 4 s. A procedure similar to the random-effects approach is developed to provide between- and within-event standard deviations. The total standard deviation (SD) varies between [0.303 and 0.360] (log10 units) depending on the period. The ground motion predictions resulting from such simple three explanatory variables ANFIS models are shown to be comparable to the most recent NGA results (e.g., Boore et al., in Earthquake Spectra 30:1057-1085, 2014; Derras et al., in Earthquake Spectra 32:2027-2056, 2016). The main advantage of ANFIS compared to artificial neuronal network (ANN) is its simple and one-off topology: five layers. Our results exhibit a number of physically sound features: magnitude scaling of the distance dependency, near-fault saturation distance increasing with magnitude and amplification on soft soils. The ability to implement ANFIS model using an analytic equation and Excel is demonstrated.
A Systematic Bayesian Integration of Epidemiological and Genetic Data
Lau, Max S. Y.; Marion, Glenn; Streftaris, George; Gibson, Gavin
2015-01-01
Genetic sequence data on pathogens have great potential to inform inference of their transmission dynamics ultimately leading to better disease control. Where genetic change and disease transmission occur on comparable timescales additional information can be inferred via the joint analysis of such genetic sequence data and epidemiological observations based on clinical symptoms and diagnostic tests. Although recently introduced approaches represent substantial progress, for computational reasons they approximate genuine joint inference of disease dynamics and genetic change in the pathogen population, capturing partially the joint epidemiological-evolutionary dynamics. Improved methods are needed to fully integrate such genetic data with epidemiological observations, for achieving a more robust inference of the transmission tree and other key epidemiological parameters such as latent periods. Here, building on current literature, a novel Bayesian framework is proposed that infers simultaneously and explicitly the transmission tree and unobserved transmitted pathogen sequences. Our framework facilitates the use of realistic likelihood functions and enables systematic and genuine joint inference of the epidemiological-evolutionary process from partially observed outbreaks. Using simulated data it is shown that this approach is able to infer accurately joint epidemiological-evolutionary dynamics, even when pathogen sequences and epidemiological data are incomplete, and when sequences are available for only a fraction of exposures. These results also characterise and quantify the value of incomplete and partial sequence data, which has important implications for sampling design, and demonstrate the abilities of the introduced method to identify multiple clusters within an outbreak. The framework is used to analyse an outbreak of foot-and-mouth disease in the UK, enhancing current understanding of its transmission dynamics and evolutionary process. PMID:26599399
Sequential defense against random and intentional attacks in complex networks.
Chen, Pin-Yu; Cheng, Shin-Ming
2015-02-01
Network robustness against attacks is one of the most fundamental researches in network science as it is closely associated with the reliability and functionality of various networking paradigms. However, despite the study on intrinsic topological vulnerabilities to node removals, little is known on the network robustness when network defense mechanisms are implemented, especially for networked engineering systems equipped with detection capabilities. In this paper, a sequential defense mechanism is first proposed in complex networks for attack inference and vulnerability assessment, where the data fusion center sequentially infers the presence of an attack based on the binary attack status reported from the nodes in the network. The network robustness is evaluated in terms of the ability to identify the attack prior to network disruption under two major attack schemes, i.e., random and intentional attacks. We provide a parametric plug-in model for performance evaluation on the proposed mechanism and validate its effectiveness and reliability via canonical complex network models and real-world large-scale network topology. The results show that the sequential defense mechanism greatly improves the network robustness and mitigates the possibility of network disruption by acquiring limited attack status information from a small subset of nodes in the network.
Inferring interventional predictions from observational learning data.
Meder, Bjorn; Hagmayer, York; Waldmann, Michael R
2008-02-01
Previous research has shown that people are capable of deriving correct predictions for previously unseen actions from passive observations of causal systems (Waldmann & Hagmayer, 2005). However, these studies were limited, since learning data were presented as tabulated data only, which may have turned the task more into a reasoning rather than a learning task. In two experiments, we therefore presented learners with trial-by-trial observational learning input referring to a complex causal model consisting of four events. To test the robustness of the capacity to derive correct observational and interventional inferences, we pitted causal order against the temporal order of learning events. The results show that people are, in principle, capable of deriving correct predictions after purely observational trial-by-trial learning, even with relatively complex causal models. However, conflicting temporal information can impair performance, particularly when the inferences require taking alternative causal pathways into account.
Software for Quantifying and Simulating Microsatellite Genotyping Error
Johnson, Paul C.D.; Haydon, Daniel T.
2007-01-01
Microsatellite genetic marker data are exploited in a variety of fields, including forensics, gene mapping, kinship inference and population genetics. In all of these fields, inference can be thwarted by failure to quantify and account for data errors, and kinship inference in particular can benefit from separating errors into two distinct classes: allelic dropout and false alleles. Pedant is MS Windows software for estimating locus-specific maximum likelihood rates of these two classes of error. Estimation is based on comparison of duplicate error-prone genotypes: neither reference genotypes nor pedigree data are required. Other functions include: plotting of error rate estimates and confidence intervals; simulations for performing power analysis and for testing the robustness of error rate estimates to violation of the underlying assumptions; and estimation of expected heterozygosity, which is a required input. The program, documentation and source code are available from http://www.stats.gla.ac.uk/~paulj/pedant.html. PMID:20066126
Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets
2017-07-01
principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for
Whitmore, Roy W; Chen, Wenlin
2013-12-04
The ability to infer human exposure to substances from drinking water using monitoring data helps determine and/or refine potential risks associated with drinking water consumption. We describe a survey sampling approach and its application to an atrazine groundwater monitoring study to adequately characterize upper exposure centiles and associated confidence intervals with predetermined precision. Study design and data analysis included sampling frame definition, sample stratification, sample size determination, allocation to strata, analysis weights, and weighted population estimates. Sampling frame encompassed 15 840 groundwater community water systems (CWS) in 21 states throughout the U. S. Median, and 95th percentile atrazine concentrations were 0.0022 and 0.024 ppb, respectively, for all CWS. Statistical estimates agreed with historical monitoring results, suggesting that the study design was adequate and robust. This methodology makes no assumptions regarding the occurrence distribution (e.g., lognormality); thus analyses based on the design-induced distribution provide the most robust basis for making inferences from the sample to target population.
Schlägel, Ulrike E; Lewis, Mark A
2016-12-01
Discrete-time random walks and their extensions are common tools for analyzing animal movement data. In these analyses, resolution of temporal discretization is a critical feature. Ideally, a model both mirrors the relevant temporal scale of the biological process of interest and matches the data sampling rate. Challenges arise when resolution of data is too coarse due to technological constraints, or when we wish to extrapolate results or compare results obtained from data with different resolutions. Drawing loosely on the concept of robustness in statistics, we propose a rigorous mathematical framework for studying movement models' robustness against changes in temporal resolution. In this framework, we define varying levels of robustness as formal model properties, focusing on random walk models with spatially-explicit component. With the new framework, we can investigate whether models can validly be applied to data across varying temporal resolutions and how we can account for these different resolutions in statistical inference results. We apply the new framework to movement-based resource selection models, demonstrating both analytical and numerical calculations, as well as a Monte Carlo simulation approach. While exact robustness is rare, the concept of approximate robustness provides a promising new direction for analyzing movement models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Donker, H.C., E-mail: h.donker@science.ru.nl; Katsnelson, M.I.; De Raedt, H.
2016-09-15
The logical inference approach to quantum theory, proposed earlier De Raedt et al. (2014), is considered in a relativistic setting. It is shown that the Klein–Gordon equation for a massive, charged, and spinless particle derives from the combination of the requirements that the space–time data collected by probing the particle is obtained from the most robust experiment and that on average, the classical relativistic equation of motion of a particle holds. - Highlights: • Logical inference applied to relativistic, massive, charged, and spinless particle experiments leads to the Klein–Gordon equation. • The relativistic Hamilton–Jacobi is scrutinized by employing a field description formore » the four-velocity. • Logical inference allows analysis of experiments with uncertainty in detection events and experimental conditions.« less
Network inference from multimodal data: A review of approaches from infectious disease transmission.
Ray, Bisakha; Ghedin, Elodie; Chunara, Rumi
2016-12-01
Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, J.; Li, Z.; Mauzerall, D. L.; Fan, S.; Horowitz, L. W.; He, C.; Yi, K.; Tao, S.
2015-12-01
Knowledge on the spatiotemporal distribution of black carbon aerosol over the Northern Pacific is limited by a deficiency of observations. The HIAPER Pole-to-Pole Observation (HIPPO) program from 2009 to 2011 is the most comprehensive data source available and it reveals a 2 to 10 times overestimates of BC by current global models. Incorporation and assimilation of more data sources is needed to increase our understanding of the spatiotemporal distribution of black carbon aerosol and its corresponding climate effects. Based on measurements from aircraft campaigns and satellites, a robust association is observed between BC concentrations and satellite retrieved CO, tropospheric NO2, and aerosol optical depth (AOD) (R2 > 0.7). Such robust relationships indicate that BC aerosols share a similar emission sources, evolution processes and transport characteristics with other pollutants measured by satellite observations. It also establishes a basis to derive a satellite-based proxy (BC*) over remote oceans. The inferred satellite-based BC* shows that Asian export events in spring bring much more BC aerosols to the mid-Pacific than occurs in other seasons. In addition, inter-annual variability of BC* is seen over the Northern Pacific, with abundances correlated to the springtime Pacific/North American (PNA) index. The inferred BC* dataset also indicates a widespread overestimation of BC loadings by models over most remote oceans beyond the Pacific. Our method presents a novel approach to infer BC concentrations by combining satellite and aircraft observations.
Vezér, Martin A
2016-04-01
To study climate change, scientists employ computer models, which approximate target systems with various levels of skill. Given the imperfection of climate models, how do scientists use simulations to generate knowledge about the causes of observed climate change? Addressing a similar question in the context of biological modelling, Levins (1966) proposed an account grounded in robustness analysis. Recent philosophical discussions dispute the confirmatory power of robustness, raising the question of how the results of computer modelling studies contribute to the body of evidence supporting hypotheses about climate change. Expanding on Staley's (2004) distinction between evidential strength and security, and Lloyd's (2015) argument connecting variety-of-evidence inferences and robustness analysis, I address this question with respect to recent challenges to the epistemology robustness analysis. Applying this epistemology to case studies of climate change, I argue that, despite imperfections in climate models, and epistemic constraints on variety-of-evidence reasoning and robustness analysis, this framework accounts for the strength and security of evidence supporting climatological inferences, including the finding that global warming is occurring and its primary causes are anthropogenic. Copyright © 2016 Elsevier Ltd. All rights reserved.
Monitoring of adult Lost River and shortnose suckers in Clear Lake Reservoir, California, 2008–2010
Hewitt, David A.; Hayes, Brian S.
2013-01-01
Problems with inferring status and population dynamics from size composition data can be overcome by a robust capture-recapture program that follows the histories of PIT-tagged individuals. Inferences from such a program are currently hindered by poor detection rates during spawning seasons with low flows in Willow Creek, which indicate that a key assumption of capture-recapture models is violated. We suggest that the most straightforward solution to this issue would be to collect detection data during the spawning season using remote PIT tag antennas in the strait between the west and east lobes of the lake.
EXPLORING DATA-DRIVEN SPECTRAL MODELS FOR APOGEE M DWARFS
NASA Astrophysics Data System (ADS)
Lua Birky, Jessica; Hogg, David; Burgasser, Adam J.; Jessica Birky
2018-01-01
The Cannon (Ness et al. 2015; Casey et al. 2016) is a flexible, data-driven spectral modeling and parameter inference framework, demonstrated on high-resolution Apache Point Galactic Evolution Experiment (APOGEE; λ/Δλ~22,500, 1.5-1.7µm) spectra of giant stars to estimate stellar labels (Teff, logg, [Fe/H], and chemical abundances) to precisions higher than the model-grid pipeline. The lack of reliable stellar parameters reported by the APOGEE pipeline for temperatures less than ~3550K, motivates extension of this approach to M dwarf stars. Using a training set of 51 M dwarfs with spectral types ranging M0-M9 obtained from SDSS optical spectra, we demonstrate that the Cannon can infer spectral types to a precision of +/-0.6 types, making it an effective tool for classifying high-resolution near-infrared spectra. We discuss the potential for extending this work to determine the physical stellar labels Teff, logg, and [Fe/H].This work is supported by the SDSS Faculty and Student (FAST) initiative.
Data-Adaptive Bias-Reduced Doubly Robust Estimation.
Vermeulen, Karel; Vansteelandt, Stijn
2016-05-01
Doubly robust estimators have now been proposed for a variety of target parameters in the causal inference and missing data literature. These consistently estimate the parameter of interest under a semiparametric model when one of two nuisance working models is correctly specified, regardless of which. The recently proposed bias-reduced doubly robust estimation procedure aims to partially retain this robustness in more realistic settings where both working models are misspecified. These so-called bias-reduced doubly robust estimators make use of special (finite-dimensional) nuisance parameter estimators that are designed to locally minimize the squared asymptotic bias of the doubly robust estimator in certain directions of these finite-dimensional nuisance parameters under misspecification of both parametric working models. In this article, we extend this idea to incorporate the use of data-adaptive estimators (infinite-dimensional nuisance parameters), by exploiting the bias reduction estimation principle in the direction of only one nuisance parameter. We additionally provide an asymptotic linearity theorem which gives the influence function of the proposed doubly robust estimator under correct specification of a parametric nuisance working model for the missingness mechanism/propensity score but a possibly misspecified (finite- or infinite-dimensional) outcome working model. Simulation studies confirm the desirable finite-sample performance of the proposed estimators relative to a variety of other doubly robust estimators.
Path-space variational inference for non-equilibrium coarse-grained systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harmandaris, Vagelis, E-mail: harman@uoc.gr; Institute of Applied and Computational Mathematics; Kalligiannaki, Evangelia, E-mail: ekalligian@tem.uoc.gr
In this paper we discuss information-theoretic tools for obtaining optimized coarse-grained molecular models for both equilibrium and non-equilibrium molecular simulations. The latter are ubiquitous in physicochemical and biological applications, where they are typically associated with coupling mechanisms, multi-physics and/or boundary conditions. In general the non-equilibrium steady states are not known explicitly as they do not necessarily have a Gibbs structure. The presented approach can compare microscopic behavior of molecular systems to parametric and non-parametric coarse-grained models using the relative entropy between distributions on the path space and setting up a corresponding path-space variational inference problem. The methods can become entirelymore » data-driven when the microscopic dynamics are replaced with corresponding correlated data in the form of time series. Furthermore, we present connections and generalizations of force matching methods in coarse-graining with path-space information methods. We demonstrate the enhanced transferability of information-based parameterizations to different observables, at a specific thermodynamic point, due to information inequalities. We discuss methodological connections between information-based coarse-graining of molecular systems and variational inference methods primarily developed in the machine learning community. However, we note that the work presented here addresses variational inference for correlated time series due to the focus on dynamics. The applicability of the proposed methods is demonstrated on high-dimensional stochastic processes given by overdamped and driven Langevin dynamics of interacting particles.« less
Boolean network inference from time series data incorporating prior biological knowledge.
Haider, Saad; Pal, Ranadip
2012-01-01
Numerous approaches exist for modeling of genetic regulatory networks (GRNs) but the low sampling rates often employed in biological studies prevents the inference of detailed models from experimental data. In this paper, we analyze the issues involved in estimating a model of a GRN from single cell line time series data with limited time points. We present an inference approach for a Boolean Network (BN) model of a GRN from limited transcriptomic or proteomic time series data based on prior biological knowledge of connectivity, constraints on attractor structure and robust design. We applied our inference approach to 6 time point transcriptomic data on Human Mammary Epithelial Cell line (HMEC) after application of Epidermal Growth Factor (EGF) and generated a BN with a plausible biological structure satisfying the data. We further defined and applied a similarity measure to compare synthetic BNs and BNs generated through the proposed approach constructed from transitions of various paths of the synthetic BNs. We have also compared the performance of our algorithm with two existing BN inference algorithms. Through theoretical analysis and simulations, we showed the rarity of arriving at a BN from limited time series data with plausible biological structure using random connectivity and absence of structure in data. The framework when applied to experimental data and data generated from synthetic BNs were able to estimate BNs with high similarity scores. Comparison with existing BN inference algorithms showed the better performance of our proposed algorithm for limited time series data. The proposed framework can also be applied to optimize the connectivity of a GRN from experimental data when the prior biological knowledge on regulators is limited or not unique.
Data-Driven Neural Network Model for Robust Reconstruction of Automobile Casting
NASA Astrophysics Data System (ADS)
Lin, Jinhua; Wang, Yanjie; Li, Xin; Wang, Lu
2017-09-01
In computer vision system, it is a challenging task to robustly reconstruct complex 3D geometries of automobile castings. However, 3D scanning data is usually interfered by noises, the scanning resolution is low, these effects normally lead to incomplete matching and drift phenomenon. In order to solve these problems, a data-driven local geometric learning model is proposed to achieve robust reconstruction of automobile casting. In order to relieve the interference of sensor noise and to be compatible with incomplete scanning data, a 3D convolution neural network is established to match the local geometric features of automobile casting. The proposed neural network combines the geometric feature representation with the correlation metric function to robustly match the local correspondence. We use the truncated distance field(TDF) around the key point to represent the 3D surface of casting geometry, so that the model can be directly embedded into the 3D space to learn the geometric feature representation; Finally, the training labels is automatically generated for depth learning based on the existing RGB-D reconstruction algorithm, which accesses to the same global key matching descriptor. The experimental results show that the matching accuracy of our network is 92.2% for automobile castings, the closed loop rate is about 74.0% when the matching tolerance threshold τ is 0.2. The matching descriptors performed well and retained 81.6% matching accuracy at 95% closed loop. For the sparse geometric castings with initial matching failure, the 3D matching object can be reconstructed robustly by training the key descriptors. Our method performs 3D reconstruction robustly for complex automobile castings.
Robust estimation procedure in panel data model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shariff, Nurul Sima Mohamad; Hamzah, Nor Aishah
2014-06-19
The panel data modeling has received a great attention in econometric research recently. This is due to the availability of data sources and the interest to study cross sections of individuals observed over time. However, the problems may arise in modeling the panel in the presence of cross sectional dependence and outliers. Even though there are few methods that take into consideration the presence of cross sectional dependence in the panel, the methods may provide inconsistent parameter estimates and inferences when outliers occur in the panel. As such, an alternative method that is robust to outliers and cross sectional dependencemore » is introduced in this paper. The properties and construction of the confidence interval for the parameter estimates are also considered in this paper. The robustness of the procedure is investigated and comparisons are made to the existing method via simulation studies. Our results have shown that robust approach is able to produce an accurate and reliable parameter estimates under the condition considered.« less
Towards a global historical biogeography of Palms
NASA Astrophysics Data System (ADS)
Couvreur, Thomas; Baker, William J.; Frigerio, Jean-Marc; Sepulchre, Pierre; Franc, Alain
2017-04-01
Four mechanisms are at work for deciphering historical biogeography of plants : speciation, extinction, migration, and drift (a sort of neutral speciation). The first three mechanisms are under selection pressure of the environment, mainly the climate and connectivity of land masses. Hence, an accurate history of climate and connectivity or non connectivity between landmasses, as well as orogenesis processes, can shed new light on the most likely speciation events and migration routes driven by paleogeography and paleoclimatology. Currently, some models exist (like DIVA) to infer the most parsimonious history (in the number of migration events) knowing the speciation history given by phylogenies (extinction are mostly unknown), in a given setting of climate and landmass connectivity. In a previous project, we have built in collaboration with LSCE a series of paleogeographic and paleoclimatic maps since the Early Cretaceous. We have developed a program, called Aran, which enables to extend DIVA to a time series of varying paleoclimatic and paleogeogarphic conditions. We apply these new methods and data to unravel the biogeographic history of palms (Arecaceae), a pantropical family of 182 genera and >2600 species whose divergence is dated in Late Cretaceous (100 My). Based on a robust dated molecular phylogeny, novel paleoclimatic and paleogeographic maps, we will generate an updated biogeographic history of Arecaceae inferred from the most parsimonious history using Aran. We will discuss the results, and put them in context with what is known and needed to provide a global biogeographic history of tropical palms.
Inductive reasoning and the understanding of intention in schizophrenia.
Corcoran, Rhiannon
2003-08-01
The study explored the relationship between the understanding of intention in veiled speech acts and the ability to reason inductively. A total of 39 people with DSM-IV-defined schizophrenia with no behavioural signs and 44 healthy participants performed the Hinting Task, a measure of pragmatic language in which the speaker's intention must be inferred, and a measure of inductive reasoning (Aha! Sentences) in which the meaning of ambiguous nonsocial sentences had to be inferred. The participants also completed measures of general intellectual ability, immediate memory for narrative and social problem-solving ability. A substantial correlation was found between performance on the inductive reasoning task and the Hinting Task in the sample of people with schizophrenia. The same relationship was not seen in the normal control sample. The robust relationship between these two measures in this sample survived when the roles of immediate memory for narrative and intellectual ability were controlled for. Furthermore, the relationship was distinctly more compelling for the patients who were currently ill compared to those in remission. These data suggest that people with schizophrenia use a different strategy to infer the meaning behind pragmatic language than that used by normally functioning adults. It is suggested that a reliance on different, possibly less specialised, skills in this group to perform this simple social inference task underlies their deficient performance on this and other measures of social inference. The fact that the relationship between the tasks in patients in remission is not as robust implies that the use of specialised skills to perform social inference tasks may be compromised most significantly during acute phases.
A note on variance estimation in random effects meta-regression.
Sidik, Kurex; Jonkman, Jeffrey N
2005-01-01
For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.
Implicit and explicit social mentalizing: dual processes driven by a shared neural network
Van Overwalle, Frank; Vandekerckhove, Marie
2013-01-01
Recent social neuroscientific evidence indicates that implicit and explicit inferences on the mind of another person (i.e., intentions, attributions or traits), are subserved by a shared mentalizing network. Under both implicit and explicit instructions, ERP studies reveal that early inferences occur at about the same time, and fMRI studies demonstrate an overlap in core mentalizing areas, including the temporo-parietal junction (TPJ) and the medial prefrontal cortex (mPFC). These results suggest a rapid shared implicit intuition followed by a slower explicit verification processes (as revealed by additional brain activation during explicit vs. implicit inferences). These data provide support for a default-adjustment dual-process framework of social mentalizing. PMID:24062663
NASA Astrophysics Data System (ADS)
Shekhar, Karthik; Ruberman, Claire F.; Ferguson, Andrew L.; Barton, John P.; Kardar, Mehran; Chakraborty, Arup K.
2013-12-01
Mutational escape from vaccine-induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus' fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine-induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of nonequilibrium viral evolution driven by patient-specific immune responses and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory á la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.
Inferring the photometric and size evolution of galaxies from image simulations. I. Method
NASA Astrophysics Data System (ADS)
Carassou, Sébastien; de Lapparent, Valérie; Bertin, Emmanuel; Le Borgne, Damien
2017-09-01
Context. Current constraints on models of galaxy evolution rely on morphometric catalogs extracted from multi-band photometric surveys. However, these catalogs are altered by selection effects that are difficult to model, that correlate in non trivial ways, and that can lead to contradictory predictions if not taken into account carefully. Aims: To address this issue, we have developed a new approach combining parametric Bayesian indirect likelihood (pBIL) techniques and empirical modeling with realistic image simulations that reproduce a large fraction of these selection effects. This allows us to perform a direct comparison between observed and simulated images and to infer robust constraints on model parameters. Methods: We use a semi-empirical forward model to generate a distribution of mock galaxies from a set of physical parameters. These galaxies are passed through an image simulator reproducing the instrumental characteristics of any survey and are then extracted in the same way as the observed data. The discrepancy between the simulated and observed data is quantified, and minimized with a custom sampling process based on adaptive Markov chain Monte Carlo methods. Results: Using synthetic data matching most of the properties of a Canada-France-Hawaii Telescope Legacy Survey Deep field, we demonstrate the robustness and internal consistency of our approach by inferring the parameters governing the size and luminosity functions and their evolutions for different realistic populations of galaxies. We also compare the results of our approach with those obtained from the classical spectral energy distribution fitting and photometric redshift approach. Conclusions: Our pipeline infers efficiently the luminosity and size distribution and evolution parameters with a very limited number of observables (three photometric bands). When compared to SED fitting based on the same set of observables, our method yields results that are more accurate and free from systematic biases.
Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun
2014-01-01
As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
NASA Technical Reports Server (NTRS)
Hayashi, Isao; Nomura, Hiroyoshi; Wakami, Noboru
1991-01-01
Whereas conventional fuzzy reasonings are associated with tuning problems, which are lack of membership functions and inference rule designs, a neural network driven fuzzy reasoning (NDF) capable of determining membership functions by neural network is formulated. In the antecedent parts of the neural network driven fuzzy reasoning, the optimum membership function is determined by a neural network, while in the consequent parts, an amount of control for each rule is determined by other plural neural networks. By introducing an algorithm of neural network driven fuzzy reasoning, inference rules for making a pendulum stand up from its lowest suspended point are determined for verifying the usefulness of the algorithm.
Statistical primer: how to deal with missing data in scientific research?
Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M
2018-05-10
Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.
An expert system shell for inferring vegetation characteristics: The learning system (tasks C and D)
NASA Technical Reports Server (NTRS)
Harrison, P. Ann; Harrison, Patrick R.
1992-01-01
This report describes the implementation of a learning system that uses a data base of historical cover type reflectance data taken at different solar zenith angles and wavelengths to learn class descriptions of classes of cover types. It has been integrated with the VEG system and requires that the VEG system be loaded to operate. VEG is the NASA VEGetation workbench - an expert system for inferring vegetation characteristics from reflectance data. The learning system provides three basic options. Using option one, the system learns class descriptions of one or more classes. Using option two, the system learns class descriptions of one or more classes and then uses the learned classes to classify an unknown sample. Using option three, the user can test the system's classification performance. The learning system can also be run in an automatic mode. In this mode, options two and three are executed on each sample from an input file. The system was developed using KEE. It is menu driven and contains a sophisticated window and mouse driven interface which guides the user through various computations. Input and output file management and data formatting facilities are also provided.
2016-01-01
Introduction Inverse dynamics joint kinetics are often used to infer contributions from underlying groups of muscle-tendon units (MTUs). However, such interpretations are confounded by multiarticular (multi-joint) musculature, which can cause inverse dynamics to over- or under-estimate net MTU power. Misestimation of MTU power could lead to incorrect scientific conclusions, or to empirical estimates that misguide musculoskeletal simulations, assistive device designs, or clinical interventions. The objective of this study was to investigate the degree to which ankle joint power overestimates net plantarflexor MTU power during the Push-off phase of walking, due to the behavior of the flexor digitorum and hallucis longus (FDHL)–multiarticular MTUs crossing the ankle and metatarsophalangeal (toe) joints. Methods We performed a gait analysis study on six healthy participants, recording ground reaction forces, kinematics, and electromyography (EMG). Empirical data were input into an EMG-driven musculoskeletal model to estimate ankle power. This model enabled us to parse contributions from mono- and multi-articular MTUs, and required only one scaling and one time delay factor for each subject and speed, which were solved for based on empirical data. Net plantarflexing MTU power was computed by the model and quantitatively compared to inverse dynamics ankle power. Results The EMG-driven model was able to reproduce inverse dynamics ankle power across a range of gait speeds (R2 ≥ 0.97), while also providing MTU-specific power estimates. We found that FDHL dynamics caused ankle power to slightly overestimate net plantarflexor MTU power, but only by ~2–7%. Conclusions During Push-off, FDHL MTU dynamics do not substantially confound the inference of net plantarflexor MTU power from inverse dynamics ankle power. However, other methodological limitations may cause inverse dynamics to overestimate net MTU power; for instance, due to rigid-body foot assumptions. Moving forward, the EMG-driven modeling approach presented could be applied to understand other tasks or larger multiarticular MTUs. PMID:27764110
Honert, Eric C; Zelik, Karl E
2016-01-01
Inverse dynamics joint kinetics are often used to infer contributions from underlying groups of muscle-tendon units (MTUs). However, such interpretations are confounded by multiarticular (multi-joint) musculature, which can cause inverse dynamics to over- or under-estimate net MTU power. Misestimation of MTU power could lead to incorrect scientific conclusions, or to empirical estimates that misguide musculoskeletal simulations, assistive device designs, or clinical interventions. The objective of this study was to investigate the degree to which ankle joint power overestimates net plantarflexor MTU power during the Push-off phase of walking, due to the behavior of the flexor digitorum and hallucis longus (FDHL)-multiarticular MTUs crossing the ankle and metatarsophalangeal (toe) joints. We performed a gait analysis study on six healthy participants, recording ground reaction forces, kinematics, and electromyography (EMG). Empirical data were input into an EMG-driven musculoskeletal model to estimate ankle power. This model enabled us to parse contributions from mono- and multi-articular MTUs, and required only one scaling and one time delay factor for each subject and speed, which were solved for based on empirical data. Net plantarflexing MTU power was computed by the model and quantitatively compared to inverse dynamics ankle power. The EMG-driven model was able to reproduce inverse dynamics ankle power across a range of gait speeds (R2 ≥ 0.97), while also providing MTU-specific power estimates. We found that FDHL dynamics caused ankle power to slightly overestimate net plantarflexor MTU power, but only by ~2-7%. During Push-off, FDHL MTU dynamics do not substantially confound the inference of net plantarflexor MTU power from inverse dynamics ankle power. However, other methodological limitations may cause inverse dynamics to overestimate net MTU power; for instance, due to rigid-body foot assumptions. Moving forward, the EMG-driven modeling approach presented could be applied to understand other tasks or larger multiarticular MTUs.
NASA Astrophysics Data System (ADS)
Bowman, Christopher; Haith, Gary; Steinberg, Alan; Morefield, Charles; Morefield, Michael
2013-05-01
This paper describes methods to affordably improve the robustness of distributed fusion systems by opportunistically leveraging non-traditional data sources. Adaptive methods help find relevant data, create models, and characterize the model quality. These methods also can measure the conformity of this non-traditional data with fusion system products including situation modeling and mission impact prediction. Non-traditional data can improve the quantity, quality, availability, timeliness, and diversity of the baseline fusion system sources and therefore can improve prediction and estimation accuracy and robustness at all levels of fusion. Techniques are described that automatically learn to characterize and search non-traditional contextual data to enable operators integrate the data with the high-level fusion systems and ontologies. These techniques apply the extension of the Data Fusion & Resource Management Dual Node Network (DNN) technical architecture at Level 4. The DNN architecture supports effectively assessment and management of the expanded portfolio of data sources, entities of interest, models, and algorithms including data pattern discovery and context conformity. Affordable model-driven and data-driven data mining methods to discover unknown models from non-traditional and `big data' sources are used to automatically learn entity behaviors and correlations with fusion products, [14 and 15]. This paper describes our context assessment software development, and the demonstration of context assessment of non-traditional data to compare to an intelligence surveillance and reconnaissance fusion product based upon an IED POIs workflow.
Christensen, Hilary B.
2014-01-01
Low-magnification microwear techniques have been used effectively to infer diets within many unrelated mammalian orders, but the extent to which patterns are comparable among such different groups, including long extinct mammal lineages, is unknown. Microwear patterns between ecologically equivalent placental and marsupial mammals are found to be statistically indistinguishable, indicating that microwear can be used to infer diet across the mammals. Microwear data were compared to body size and molar shearing crest length in order to develop a system to distinguish the diet of mammals. Insectivores and carnivores were difficult to distinguish from herbivores using microwear alone, but combining microwear data with body size estimates and tooth morphology provides robust dietary inferences. This approach is a powerful tool for dietary assessment of fossils from extinct lineages and from museum specimens of living species where field study would be difficult owing to the animal’s behavior, habitat, or conservation status. PMID:25099537
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Robust Inference of Cell-to-Cell Expression Variations from Single- and K-Cell Profiling
Narayanan, Manikandan; Martins, Andrew J.; Tsang, John S.
2016-01-01
Quantifying heterogeneity in gene expression among single cells can reveal information inaccessible to cell-population averaged measurements. However, the expression level of many genes in single cells fall below the detection limit of even the most sensitive technologies currently available. One proposed approach to overcome this challenge is to measure random pools of k cells (e.g., 10) to increase sensitivity, followed by computational “deconvolution” of cellular heterogeneity parameters (CHPs), such as the biological variance of single-cell expression levels. Existing approaches infer CHPs using either single-cell or k-cell data alone, and typically within a single population of cells. However, integrating both single- and k-cell data may reap additional benefits, and quantifying differences in CHPs across cell populations or conditions could reveal novel biological information. Here we present a Bayesian approach that can utilize single-cell, k-cell, or both simultaneously to infer CHPs within a single condition or their differences across two conditions. Using simulated as well as experimentally generated single- and k-cell data, we found situations where each data type would offer advantages, but using both together can improve precision and better reconcile CHP information contained in single- and k-cell data. We illustrate the utility of our approach by applying it to jointly generated single- and k-cell data to reveal CHP differences in several key inflammatory genes between resting and inflammatory cytokine-activated human macrophages, delineating differences in the distribution of ‘ON’ versus ‘OFF’ cells and in continuous variation of expression level among cells. Our approach thus offers a practical and robust framework to assess and compare cellular heterogeneity within and across biological conditions using modern multiplexed technologies. PMID:27438699
Vector autoregressive models: A Gini approach
NASA Astrophysics Data System (ADS)
Mussard, Stéphane; Ndiaye, Oumar Hamady
2018-02-01
In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.
Minimal-assumption inference from population-genomic data
NASA Astrophysics Data System (ADS)
Weissman, Daniel; Hallatschek, Oskar
Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.
Deploying digital health data to optimize influenza surveillance at national and local scales
Arab, Ali; Viboud, Cécile; Grenfell, Bryan T.; Bansal, Shweta
2018-01-01
The surveillance of influenza activity is critical to early detection of epidemics and pandemics and the design of disease control strategies. Case reporting through a voluntary network of sentinel physicians is a commonly used method of passive surveillance for monitoring rates of influenza-like illness (ILI) worldwide. Despite its ubiquity, little attention has been given to the processes underlying the observation, collection, and spatial aggregation of sentinel surveillance data, and its subsequent effects on epidemiological understanding. We harnessed the high specificity of diagnosis codes in medical claims from a database that represented 2.5 billion visits from upwards of 120,000 United States healthcare providers each year. Among influenza seasons from 2002-2009 and the 2009 pandemic, we simulated limitations of sentinel surveillance systems such as low coverage and coarse spatial resolution, and performed Bayesian inference to probe the robustness of ecological inference and spatial prediction of disease burden. Our models suggest that a number of socio-environmental factors, in addition to local population interactions, state-specific health policies, as well as sampling effort may be responsible for the spatial patterns in U.S. sentinel ILI surveillance. In addition, we find that biases related to spatial aggregation were accentuated among areas with more heterogeneous disease risk, and sentinel systems designed with fixed reporting locations across seasons provided robust inference and prediction. With the growing availability of health-associated big data worldwide, our results suggest mechanisms for optimizing digital data streams to complement traditional surveillance in developed settings and enhance surveillance opportunities in developing countries. PMID:29513661
Measuring Learning Progressions Using Bayesian Modeling in Complex Assessments
ERIC Educational Resources Information Center
Rutstein, Daisy Wise
2012-01-01
This research examines issues regarding model estimation and robustness in the use of Bayesian Inference Networks (BINs) for measuring Learning Progressions (LPs). It provides background information on LPs and how they might be used in practice. Two simulation studies are performed, along with real data examples. The first study examines the case…
An alternative empirical likelihood method in missing response problems and causal inference.
Ren, Kaili; Drummond, Christopher A; Brewster, Pamela S; Haller, Steven T; Tian, Jiang; Cooper, Christopher J; Zhang, Biao
2016-11-30
Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins et al. proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double-robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood-based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks.
Vlachas, Pantelis R; Byeon, Wonmin; Wan, Zhong Y; Sapsis, Themistoklis P; Koumoutsakos, Petros
2018-05-01
We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks. The proposed LSTM neural networks perform inference of high-dimensional dynamical systems in their reduced order space and are shown to be an effective set of nonlinear approximators of their attractor. We demonstrate the forecasting performance of the LSTM and compare it with Gaussian processes (GPs) in time series obtained from the Lorenz 96 system, the Kuramoto-Sivashinsky equation and a prototype climate model. The LSTM networks outperform the GPs in short-term forecasting accuracy in all applications considered. A hybrid architecture, extending the LSTM with a mean stochastic model (MSM-LSTM), is proposed to ensure convergence to the invariant measure. This novel hybrid method is fully data-driven and extends the forecasting capabilities of LSTM networks.
Reasoning with Vectors: A Continuous Model for Fast Robust Inference.
Widdows, Dominic; Cohen, Trevor
2015-10-01
This paper describes the use of continuous vector space models for reasoning with a formal knowledge base. The practical significance of these models is that they support fast, approximate but robust inference and hypothesis generation, which is complementary to the slow, exact, but sometimes brittle behavior of more traditional deduction engines such as theorem provers. The paper explains the way logical connectives can be used in semantic vector models, and summarizes the development of Predication-based Semantic Indexing, which involves the use of Vector Symbolic Architectures to represent the concepts and relationships from a knowledge base of subject-predicate-object triples. Experiments show that the use of continuous models for formal reasoning is not only possible, but already demonstrably effective for some recognized informatics tasks, and showing promise in other traditional problem areas. Examples described in this paper include: predicting new uses for existing drugs in biomedical informatics; removing unwanted meanings from search results in information retrieval and concept navigation; type-inference from attributes; comparing words based on their orthography; and representing tabular data, including modelling numerical values. The algorithms and techniques described in this paper are all publicly released and freely available in the Semantic Vectors open-source software package.
A Robust Mass Estimator for Dark Matter Subhalo Perturbations in Strong Gravitational Lenses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minor, Quinn E.; Kaplinghat, Manoj; Li, Nan
A few dark matter substructures have recently been detected in strong gravitational lenses through their perturbations of highly magnified images. We derive a characteristic scale for lensing perturbations and show that they are significantly larger than the perturber’s Einstein radius. We show that the perturber’s projected mass enclosed within this radius, scaled by the log-slope of the host galaxy’s density profile, can be robustly inferred even if the inferred density profile and tidal radius of the perturber are biased. We demonstrate the validity of our analytic derivation using several gravitational lens simulations where the tidal radii and the inner log-slopesmore » of the density profile of the perturbing subhalo are allowed to vary. By modeling these simulated data, we find that our mass estimator, which we call the effective subhalo lensing mass, is accurate to within about 10% or smaller in each case, whereas the inferred total subhalo mass can potentially be biased by nearly an order of magnitude. We therefore recommend that the effective subhalo lensing mass be reported in future lensing reconstructions, as this will allow for a more accurate comparison with the results of dark matter simulations.« less
Reasoning with Vectors: A Continuous Model for Fast Robust Inference
Widdows, Dominic; Cohen, Trevor
2015-01-01
This paper describes the use of continuous vector space models for reasoning with a formal knowledge base. The practical significance of these models is that they support fast, approximate but robust inference and hypothesis generation, which is complementary to the slow, exact, but sometimes brittle behavior of more traditional deduction engines such as theorem provers. The paper explains the way logical connectives can be used in semantic vector models, and summarizes the development of Predication-based Semantic Indexing, which involves the use of Vector Symbolic Architectures to represent the concepts and relationships from a knowledge base of subject-predicate-object triples. Experiments show that the use of continuous models for formal reasoning is not only possible, but already demonstrably effective for some recognized informatics tasks, and showing promise in other traditional problem areas. Examples described in this paper include: predicting new uses for existing drugs in biomedical informatics; removing unwanted meanings from search results in information retrieval and concept navigation; type-inference from attributes; comparing words based on their orthography; and representing tabular data, including modelling numerical values. The algorithms and techniques described in this paper are all publicly released and freely available in the Semantic Vectors open-source software package.1 PMID:26582967
Robust PLS approach for KPI-related prediction and diagnosis against outliers and missing data
NASA Astrophysics Data System (ADS)
Yin, Shen; Wang, Guang; Yang, Xu
2014-07-01
In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.
Image segmentation with a novel regularized composite shape prior based on surrogate study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu
Purpose: Incorporating training into image segmentation is a good approach to achieve additional robustness. This work aims to develop an effective strategy to utilize shape prior knowledge, so that the segmentation label evolution can be driven toward the desired global optimum. Methods: In the variational image segmentation framework, a regularization for the composite shape prior is designed to incorporate the geometric relevance of individual training data to the target, which is inferred by an image-based surrogate relevance metric. Specifically, this regularization is imposed on the linear weights of composite shapes and serves as a hyperprior. The overall problem is formulatedmore » in a unified optimization setting and a variational block-descent algorithm is derived. Results: The performance of the proposed scheme is assessed in both corpus callosum segmentation from an MR image set and clavicle segmentation based on CT images. The resulted shape composition provides a proper preference for the geometrically relevant training data. A paired Wilcoxon signed rank test demonstrates statistically significant improvement of image segmentation accuracy, when compared to multiatlas label fusion method and three other benchmark active contour schemes. Conclusions: This work has developed a novel composite shape prior regularization, which achieves superior segmentation performance than typical benchmark schemes.« less
A Bayesian Approach to the Paleomagnetic Conglomerate Test
NASA Astrophysics Data System (ADS)
Heslop, David; Roberts, Andrew P.
2018-02-01
The conglomerate test has served the paleomagnetic community for over 60 years as a means to detect remagnetizations. The test states that if a suite of clasts within a bed have uniformly random paleomagnetic directions, then the conglomerate cannot have experienced a pervasive event that remagnetized the clasts in the same direction. The current form of the conglomerate test is based on null hypothesis testing, which results in a binary "pass" (uniformly random directions) or "fail" (nonrandom directions) outcome. We have recast the conglomerate test in a Bayesian framework with the aim of providing more information concerning the level of support a given data set provides for a hypothesis of uniformly random paleomagnetic directions. Using this approach, we place the conglomerate test in a fully probabilistic framework that allows for inconclusive results when insufficient information is available to draw firm conclusions concerning the randomness or nonrandomness of directions. With our method, sample sets larger than those typically employed in paleomagnetism may be required to achieve strong support for a hypothesis of random directions. Given the potentially detrimental effect of unrecognized remagnetizations on paleomagnetic reconstructions, it is important to provide a means to draw statistically robust data-driven inferences. Our Bayesian analysis provides a means to do this for the conglomerate test.
Using Historical Data to Automatically Identify Air-Traffic Control Behavior
NASA Technical Reports Server (NTRS)
Lauderdale, Todd A.; Wu, Yuefeng; Tretto, Celeste
2014-01-01
This project seeks to develop statistical-based machine learning models to characterize the types of errors present when using current systems to predict future aircraft states. These models will be data-driven - based on large quantities of historical data. Once these models are developed, they will be used to infer situations in the historical data where an air-traffic controller intervened on an aircraft's route, even when there is no direct recording of this action.
Phylotranscriptomic analysis of the origin and early diversification of land plants
Wickett, Norman J.; Mirarab, Siavash; Nguyen, Nam; Warnow, Tandy; Carpenter, Eric; Matasci, Naim; Ayyampalayam, Saravanaraj; Barker, Michael S.; Burleigh, J. Gordon; Gitzendanner, Matthew A.; Ruhfel, Brad R.; Wafula, Eric; Graham, Sean W.; Mathews, Sarah; Melkonian, Michael; Soltis, Douglas E.; Soltis, Pamela S.; Miles, Nicholas W.; Rothfels, Carl J.; Pokorny, Lisa; Shaw, A. Jonathan; DeGironimo, Lisa; Stevenson, Dennis W.; Surek, Barbara; Villarreal, Juan Carlos; Roure, Béatrice; Philippe, Hervé; dePamphilis, Claude W.; Chen, Tao; Deyholos, Michael K.; Baucom, Regina S.; Kutchan, Toni M.; Augustin, Megan M.; Wang, Jun; Zhang, Yong; Tian, Zhijian; Yan, Zhixiang; Wu, Xiaolei; Sun, Xiao; Wong, Gane Ka-Shu; Leebens-Mack, James
2014-01-01
Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly resolved. Inferring deep phylogenies with bouts of rapid diversification can be problematic; however, genome-scale data should significantly increase the number of informative characters for analyses. Recent phylogenomic reconstructions focused on the major divergences of plants have resulted in promising but inconsistent results. One limitation is sparse taxon sampling, likely resulting from the difficulty and cost of data generation. To address this limitation, transcriptome data for 92 streptophyte taxa were generated and analyzed along with 11 published plant genome sequences. Phylogenetic reconstructions were conducted using up to 852 nuclear genes and 1,701,170 aligned sites. Sixty-nine analyses were performed to test the robustness of phylogenetic inferences to permutations of the data matrix or to phylogenetic method, including supermatrix, supertree, and coalescent-based approaches, maximum-likelihood and Bayesian methods, partitioned and unpartitioned analyses, and amino acid versus DNA alignments. Among other results, we find robust support for a sister-group relationship between land plants and one group of streptophyte green algae, the Zygnematophyceae. Strong and robust support for a clade comprising liverworts and mosses is inconsistent with a widely accepted view of early land plant evolution, and suggests that phylogenetic hypotheses used to understand the evolution of fundamental plant traits should be reevaluated. PMID:25355905
Rasmussen, Peter M.; Smith, Amy F.; Sakadžić, Sava; Boas, David A.; Pries, Axel R.; Secomb, Timothy W.; Østergaard, Leif
2017-01-01
Objective In vivo imaging of the microcirculation and network-oriented modeling have emerged as powerful means of studying microvascular function and understanding its physiological significance. Network-oriented modeling may provide the means of summarizing vast amounts of data produced by high-throughput imaging techniques in terms of key, physiological indices. To estimate such indices with sufficient certainty, however, network-oriented analysis must be robust to the inevitable presence of uncertainty due to measurement errors as well as model errors. Methods We propose the Bayesian probabilistic data analysis framework as a means of integrating experimental measurements and network model simulations into a combined and statistically coherent analysis. The framework naturally handles noisy measurements and provides posterior distributions of model parameters as well as physiological indices associated with uncertainty. Results We applied the analysis framework to experimental data from three rat mesentery networks and one mouse brain cortex network. We inferred distributions for more than five hundred unknown pressure and hematocrit boundary conditions. Model predictions were consistent with previous analyses, and remained robust when measurements were omitted from model calibration. Conclusion Our Bayesian probabilistic approach may be suitable for optimizing data acquisition and for analyzing and reporting large datasets acquired as part of microvascular imaging studies. PMID:27987383
The influence of social information on children's statistical and causal inferences.
Sobel, David M; Kirkham, Natasha Z
2012-01-01
Constructivist accounts of learning posit that causal inference is a child-driven process. Recent interpretations of such accounts also suggest that the process children use for causal learning is rational: Children interpret and learn from new evidence in light of their existing beliefs. We argue that such mechanisms are also driven by informative social cues and suggest ways in which such information influences both preschoolers' and infants' inferences. In doing so, we argue that a rational constructivist account should not only focus on describing the child's internal cognitive mechanisms for learning but also on how social information affects the process of learning.
Computational statistics using the Bayesian Inference Engine
NASA Astrophysics Data System (ADS)
Weinberg, Martin D.
2013-09-01
This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.
Johnson, Douglas H.; Cook, R.D.
2013-01-01
In her AAAS News & Notes piece "Can the Southwest manage its thirst?" (26 July, p. 362), K. Wren quotes Ajay Kalra, who advocates a particular method for predicting Colorado River streamflow "because it eschews complex physical climate models for a statistical data-driven modeling approach." A preference for data-driven models may be appropriate in this individual situation, but it is not so generally, Data-driven models often come with a warning against extrapolating beyond the range of the data used to develop the models. When the future is like the past, data-driven models can work well for prediction, but it is easy to over-model local or transient phenomena, often leading to predictive inaccuracy (1). Mechanistic models are built on established knowledge of the process that connects the response variables with the predictors, using information obtained outside of an extant data set. One may shy away from a mechanistic approach when the underlying process is judged to be too complicated, but good predictive models can be constructed with statistical components that account for ingredients missing in the mechanistic analysis. Models with sound mechanistic components are more generally applicable and robust than data-driven models.
Swain, Timothy D
2018-01-01
The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.
BALANCE: Towards a Usable Pervasive Wellness Application with Accurate Activity Inference
Denning, Tamara; Andrew, Adrienne; Chaudhri, Rohit; Hartung, Carl; Lester, Jonathan; Borriello, Gaetano; Duncan, Glen
2010-01-01
Technology offers the potential to objectively monitor people’s eating and activity behaviors and encourage healthier lifestyles. BALANCE is a mobile phone-based system for long term wellness management. The BALANCE system automatically detects the user’s caloric expenditure via sensor data from a Mobile Sensing Platform unit worn on the hip. Users manually enter information on foods eaten via an interface on an N95 mobile phone. Initial validation experiments measuring oxygen consumption during treadmill walking and jogging show that the system’s estimate of caloric output is within 87% of the actual value. Future work will refine and continue to evaluate the system’s efficacy and develop more robust data input and activity inference methods. PMID:20445819
Voltage-Driven Conformational Switching with Distinct Raman Signature in a Single-Molecule Junction.
Bi, Hai; Palma, Carlos-Andres; Gong, Yuxiang; Hasch, Peter; Elbing, Mark; Mayor, Marcel; Reichert, Joachim; Barth, Johannes V
2018-04-11
Precisely controlling well-defined, stable single-molecule junctions represents a pillar of single-molecule electronics. Early attempts to establish computing with molecular switching arrays were partly challenged by limitations in the direct chemical characterization of metal-molecule-metal junctions. While cryogenic scanning probe studies have advanced the mechanistic understanding of current- and voltage-induced conformational switching, metal-molecule-metal conformations are still largely inferred from indirect evidence. Hence, the development of robust, chemically sensitive techniques is instrumental for advancement in the field. Here we probe the conformation of a two-state molecular switch with vibrational spectroscopy, while simultaneously operating it by means of the applied voltage. Our study emphasizes measurements of single-molecule Raman spectra in a room-temperature stable single-molecule switch presenting a signal modulation of nearly 2 orders of magnitude.
Computational Natural Language Inference: Robust and Interpretable Question Answering
ERIC Educational Resources Information Center
Sharp, Rebecca Reynolds
2017-01-01
We address the challenging task of "computational natural language inference," by which we mean bridging two or more natural language texts while also providing an explanation of how they are connected. In the context of question answering (i.e., finding short answers to natural language questions), this inference connects the question…
Goldberg, Tony L; Gillespie, Thomas R; Singer, Randall S
2006-09-01
Repetitive-element PCR (rep-PCR) is a method for genotyping bacteria based on the selective amplification of repetitive genetic elements dispersed throughout bacterial chromosomes. The method has great potential for large-scale epidemiological studies because of its speed and simplicity; however, objective guidelines for inferring relationships among bacterial isolates from rep-PCR data are lacking. We used multilocus sequence typing (MLST) as a "gold standard" to optimize the analytical parameters for inferring relationships among Escherichia coli isolates from rep-PCR data. We chose 12 isolates from a large database to represent a wide range of pairwise genetic distances, based on the initial evaluation of their rep-PCR fingerprints. We conducted MLST with these same isolates and systematically varied the analytical parameters to maximize the correspondence between the relationships inferred from rep-PCR and those inferred from MLST. Methods that compared the shapes of densitometric profiles ("curve-based" methods) yielded consistently higher correspondence values between data types than did methods that calculated indices of similarity based on shared and different bands (maximum correspondences of 84.5% and 80.3%, respectively). Curve-based methods were also markedly more robust in accommodating variations in user-specified analytical parameter values than were "band-sharing coefficient" methods, and they enhanced the reproducibility of rep-PCR. Phylogenetic analyses of rep-PCR data yielded trees with high topological correspondence to trees based on MLST and high statistical support for major clades. These results indicate that rep-PCR yields accurate information for inferring relationships among E. coli isolates and that accuracy can be enhanced with the use of analytical methods that consider the shapes of densitometric profiles.
Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks
Ringbauer, Harald; Coop, Graham
2017-01-01
Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of σ≈50−−100 km/gen during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance. PMID:28108588
2014-01-01
Background Network inference of gene expression data is an important challenge in systems biology. Novel algorithms may provide more detailed gene regulatory networks (GRN) for complex, chronic inflammatory diseases such as rheumatoid arthritis (RA), in which activated synovial fibroblasts (SFBs) play a major role. Since the detailed mechanisms underlying this activation are still unclear, simultaneous investigation of multi-stimuli activation of SFBs offers the possibility to elucidate the regulatory effects of multiple mediators and to gain new insights into disease pathogenesis. Methods A GRN was therefore inferred from RA-SFBs treated with 4 different stimuli (IL-1 β, TNF- α, TGF- β, and PDGF-D). Data from time series microarray experiments (0, 1, 2, 4, 12 h; Affymetrix HG-U133 Plus 2.0) were batch-corrected applying ‘ComBat’, analyzed for differentially expressed genes over time with ‘Limma’, and used for the inference of a robust GRN with NetGenerator V2.0, a heuristic ordinary differential equation-based method with soft integration of prior knowledge. Results Using all genes differentially expressed over time in RA-SFBs for any stimulus, and selecting the genes belonging to the most significant gene ontology (GO) term, i.e., ‘cartilage development’, a dynamic, robust, moderately complex multi-stimuli GRN was generated with 24 genes and 57 edges in total, 31 of which were gene-to-gene edges. Prior literature-based knowledge derived from Pathway Studio or manual searches was reflected in the final network by 25/57 confirmed edges (44%). The model contained known network motifs crucial for dynamic cellular behavior, e.g., cross-talk among pathways, positive feed-back loops, and positive feed-forward motifs (including suppression of the transcriptional repressor OSR2 by all 4 stimuli. Conclusion A multi-stimuli GRN highly concordant with literature data was successfully generated by network inference from the gene expression of stimulated RA-SFBs. The GRN showed high reliability, since 10 predicted edges were independently validated by literature findings post network inference. The selected GO term ‘cartilage development’ contained a number of differentiation markers, growth factors, and transcription factors with potential relevance for RA. Finally, the model provided new insight into the response of RA-SFBs to multiple stimuli implicated in the pathogenesis of RA, in particular to the ‘novel’ potent growth factor PDGF-D. PMID:24989895
A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures
Shiraishi, Yuichi; Tremmel, Georg; Miyano, Satoru; Stephens, Matthew
2015-01-01
Recent advances in sequencing technologies have enabled the production of massive amounts of data on somatic mutations from cancer genomes. These data have led to the detection of characteristic patterns of somatic mutations or “mutation signatures” at an unprecedented resolution, with the potential for new insights into the causes and mechanisms of tumorigenesis. Here we present new methods for modelling, identifying and visualizing such mutation signatures. Our methods greatly simplify mutation signature models compared with existing approaches, reducing the number of parameters by orders of magnitude even while increasing the contextual factors (e.g. the number of flanking bases) that are accounted for. This improves both sensitivity and robustness of inferred signatures. We also provide a new intuitive way to visualize the signatures, analogous to the use of sequence logos to visualize transcription factor binding sites. We illustrate our new method on somatic mutation data from urothelial carcinoma of the upper urinary tract, and a larger dataset from 30 diverse cancer types. The results illustrate several important features of our methods, including the ability of our new visualization tool to clearly highlight the key features of each signature, the improved robustness of signature inferences from small sample sizes, and more detailed inference of signature characteristics such as strand biases and sequence context effects at the base two positions 5′ to the mutated site. The overall framework of our work is based on probabilistic models that are closely connected with “mixed-membership models” which are widely used in population genetic admixture analysis, and in machine learning for document clustering. We argue that recognizing these relationships should help improve understanding of mutation signature extraction problems, and suggests ways to further improve the statistical methods. Our methods are implemented in an R package pmsignature (https://github.com/friend1ws/pmsignature) and a web application available at https://friend1ws.shinyapps.io/pmsignature_shiny/. PMID:26630308
Inference of segmented color and texture description by tensor voting.
Jia, Jiaya; Tang, Chi-Keung
2004-06-01
A robust synthesis method is proposed to automatically infer missing color and texture information from a damaged 2D image by (N)D tensor voting (N > 3). The same approach is generalized to range and 3D data in the presence of occlusion, missing data and noise. Our method translates texture information into an adaptive (N)D tensor, followed by a voting process that infers noniteratively the optimal color values in the (N)D texture space. A two-step method is proposed. First, we perform segmentation based on insufficient geometry, color, and texture information in the input, and extrapolate partitioning boundaries by either 2D or 3D tensor voting to generate a complete segmentation for the input. Missing colors are synthesized using (N)D tensor voting in each segment. Different feature scales in the input are automatically adapted by our tensor scale analysis. Results on a variety of difficult inputs demonstrate the effectiveness of our tensor voting approach.
Causal strength induction from time series data.
Soo, Kevin W; Rottman, Benjamin M
2018-04-01
One challenge when inferring the strength of cause-effect relations from time series data is that the cause and/or effect can exhibit temporal trends. If temporal trends are not accounted for, a learner could infer that a causal relation exists when it does not, or even infer that there is a positive causal relation when the relation is negative, or vice versa. We propose that learners use a simple heuristic to control for temporal trends-that they focus not on the states of the cause and effect at a given instant, but on how the cause and effect change from one observation to the next, which we call transitions. Six experiments were conducted to understand how people infer causal strength from time series data. We found that participants indeed use transitions in addition to states, which helps them to reach more accurate causal judgments (Experiments 1A and 1B). Participants use transitions more when the stimuli are presented in a naturalistic visual format than a numerical format (Experiment 2), and the effect of transitions is not driven by primacy or recency effects (Experiment 3). Finally, we found that participants primarily use the direction in which variables change rather than the magnitude of the change for estimating causal strength (Experiments 4 and 5). Collectively, these studies provide evidence that people often use a simple yet effective heuristic for inferring causal strength from time series data. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Sieve estimation in semiparametric modeling of longitudinal data with informative observation times.
Zhao, Xingqiu; Deng, Shirong; Liu, Li; Liu, Lei
2014-01-01
Analyzing irregularly spaced longitudinal data often involves modeling possibly correlated response and observation processes. In this article, we propose a new class of semiparametric mean models that allows for the interaction between the observation history and covariates, leaving patterns of the observation process to be arbitrary. For inference on the regression parameters and the baseline mean function, a spline-based least squares estimation approach is proposed. The consistency, rate of convergence, and asymptotic normality of the proposed estimators are established. Our new approach is different from the usual approaches relying on the model specification of the observation scheme, and it can be easily used for predicting the longitudinal response. Simulation studies demonstrate that the proposed inference procedure performs well and is more robust. The analyses of bladder tumor data and medical cost data are presented to illustrate the proposed method.
MICCA: a complete and accurate software for taxonomic profiling of metagenomic data.
Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio
2015-05-19
The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project.
MICCA: a complete and accurate software for taxonomic profiling of metagenomic data
Albanese, Davide; Fontana, Paolo; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio
2015-01-01
The introduction of high throughput sequencing technologies has triggered an increase of the number of studies in which the microbiota of environmental and human samples is characterized through the sequencing of selected marker genes. While experimental protocols have undergone a process of standardization that makes them accessible to a large community of scientist, standard and robust data analysis pipelines are still lacking. Here we introduce MICCA, a software pipeline for the processing of amplicon metagenomic datasets that efficiently combines quality filtering, clustering of Operational Taxonomic Units (OTUs), taxonomy assignment and phylogenetic tree inference. MICCA provides accurate results reaching a good compromise among modularity and usability. Moreover, we introduce a de-novo clustering algorithm specifically designed for the inference of Operational Taxonomic Units (OTUs). Tests on real and synthetic datasets shows that thanks to the optimized reads filtering process and to the new clustering algorithm, MICCA provides estimates of the number of OTUs and of other common ecological indices that are more accurate and robust than currently available pipelines. Analysis of public metagenomic datasets shows that the higher consistency of results improves our understanding of the structure of environmental and human associated microbial communities. MICCA is an open source project. PMID:25988396
Data-driven modelling of social forces and collective behaviour in zebrafish.
Zienkiewicz, Adam K; Ladu, Fabrizio; Barton, David A W; Porfiri, Maurizio; Bernardo, Mario Di
2018-04-14
Zebrafish are rapidly emerging as a powerful model organism in hypothesis-driven studies targeting a number of functional and dysfunctional processes. Mathematical models of zebrafish behaviour can inform the design of experiments, through the unprecedented ability to perform pilot trials on a computer. At the same time, in-silico experiments could help refining the analysis of real data, by enabling the systematic investigation of key neurobehavioural factors. Here, we establish a data-driven model of zebrafish social interaction. Specifically, we derive a set of interaction rules to capture the primary response mechanisms which have been observed experimentally. Contrary to previous studies, we include dynamic speed regulation in addition to turning responses, which together provide attractive, repulsive and alignment interactions between individuals. The resulting multi-agent model provides a novel, bottom-up framework to describe both the spontaneous motion and individual-level interaction dynamics of zebrafish, inferred directly from experimental observations. Copyright © 2018 Elsevier Ltd. All rights reserved.
Automated adaptive inference of phenomenological dynamical models.
Daniels, Bryan C; Nemenman, Ilya
2015-08-21
Dynamics of complex systems is often driven by large and intricate networks of microscopic interactions, whose sheer size obfuscates understanding. With limited experimental data, many parameters of such dynamics are unknown, and thus detailed, mechanistic models risk overfitting and making faulty predictions. At the other extreme, simple ad hoc models often miss defining features of the underlying systems. Here we develop an approach that instead constructs phenomenological, coarse-grained models of network dynamics that automatically adapt their complexity to the available data. Such adaptive models produce accurate predictions even when microscopic details are unknown. The approach is computationally tractable, even for a relatively large number of dynamical variables. Using simulated data, it correctly infers the phase space structure for planetary motion, avoids overfitting in a biological signalling system and produces accurate predictions for yeast glycolysis with tens of data points and over half of the interacting species unobserved.
Automated adaptive inference of phenomenological dynamical models
Daniels, Bryan C.; Nemenman, Ilya
2015-01-01
Dynamics of complex systems is often driven by large and intricate networks of microscopic interactions, whose sheer size obfuscates understanding. With limited experimental data, many parameters of such dynamics are unknown, and thus detailed, mechanistic models risk overfitting and making faulty predictions. At the other extreme, simple ad hoc models often miss defining features of the underlying systems. Here we develop an approach that instead constructs phenomenological, coarse-grained models of network dynamics that automatically adapt their complexity to the available data. Such adaptive models produce accurate predictions even when microscopic details are unknown. The approach is computationally tractable, even for a relatively large number of dynamical variables. Using simulated data, it correctly infers the phase space structure for planetary motion, avoids overfitting in a biological signalling system and produces accurate predictions for yeast glycolysis with tens of data points and over half of the interacting species unobserved. PMID:26293508
Robustness of waves with a high phase velocity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tajima, T., E-mail: ttajima@uci.edu; Tri Alpha Energy, Inc., P.O. Box 7010, Rancho Santa Margarita, CA 92688; Necas, A., E-mail: anecas@trialphaenergy.com
Norman Rostoker pioneered research of (1) plasma-driven accelerators and (2) beam-driven fusion reactors. The collective acceleration, coined by Veksler, advocates to drive above-ionization plasma waves by an electron beam to accelerate ions. The research on this, among others, by the Rostoker group incubated the idea that eventually led to the birth of the laser wakefield acceleration (LWFA), by which a large and robust accelerating collective fields may be generated in plasma in which plasma remains robust and undisrupted. Besides the emergence of LWFA, the Rostoker research spawned our lessons learned on the importance of adiabatic acceleration of ions in collectivemore » accelerators, including the recent rebirth in laser-driven ion acceleration efforts in a smooth adiabatic fashion by a variety of ingenious methods. Following Rostoker’s research in (2), the beam-driven Field Reversed Configuration (FRC) has accomplished breakthroughs in recent years. The beam-driven kinetic plasma instabilities have been found to drive the reactivity of deuteron-deuteron fusion beyond the thermonuclear yield in C-2U plasma that Rostoker started. This remarkable result in FRCs as well as the above mentioned LWFA may be understood with the aid of the newly introduced idea of the “robustness hypothesis of waves with a high phase velocity”. It posits that when the wave driven by a particle beam (or laser pulse) has a high phase velocity, its amplitude is high without disrupting the supporting bulk plasma. This hypothesis may guide us into more robust and efficient fusion reactors and more compact accelerators.« less
Robust EM Continual Reassessment Method in Oncology Dose Finding
Yuan, Ying; Yin, Guosheng
2012-01-01
The continual reassessment method (CRM) is a commonly used dose-finding design for phase I clinical trials. Practical applications of this method have been restricted by two limitations: (1) the requirement that the toxicity outcome needs to be observed shortly after the initiation of the treatment; and (2) the potential sensitivity to the prespecified toxicity probability at each dose. To overcome these limitations, we naturally treat the unobserved toxicity outcomes as missing data, and use the expectation-maximization (EM) algorithm to estimate the dose toxicity probabilities based on the incomplete data to direct dose assignment. To enhance the robustness of the design, we propose prespecifying multiple sets of toxicity probabilities, each set corresponding to an individual CRM model. We carry out these multiple CRMs in parallel, across which model selection and model averaging procedures are used to make more robust inference. We evaluate the operating characteristics of the proposed robust EM-CRM designs through simulation studies and show that the proposed methods satisfactorily resolve both limitations of the CRM. Besides improving the MTD selection percentage, the new designs dramatically shorten the duration of the trial, and are robust to the prespecification of the toxicity probabilities. PMID:22375092
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
NASA Astrophysics Data System (ADS)
Xu, T.; Valocchi, A. J.; Ye, M.; Liang, F.
2016-12-01
Due to simplification and/or misrepresentation of the real aquifer system, numerical groundwater flow and solute transport models are usually subject to model structural error. During model calibration, the hydrogeological parameters may be overly adjusted to compensate for unknown structural error. This may result in biased predictions when models are used to forecast aquifer response to new forcing. In this study, we extend a fully Bayesian method [Xu and Valocchi, 2015] to calibrate a real-world, regional groundwater flow model. The method uses a data-driven error model to describe model structural error and jointly infers model parameters and structural error. In this study, Bayesian inference is facilitated using high performance computing and fast surrogate models. The surrogate models are constructed using machine learning techniques to emulate the response simulated by the computationally expensive groundwater model. We demonstrate in the real-world case study that explicitly accounting for model structural error yields parameter posterior distributions that are substantially different from those derived by the classical Bayesian calibration that does not account for model structural error. In addition, the Bayesian with error model method gives significantly more accurate prediction along with reasonable credible intervals.
Smuk, M; Carpenter, J R; Morris, T P
2017-02-06
Within epidemiological and clinical research, missing data are a common issue and often over looked in publications. When the issue of missing observations is addressed it is usually assumed that the missing data are 'missing at random' (MAR). This assumption should be checked for plausibility, however it is untestable, thus inferences should be assessed for robustness to departures from missing at random. We highlight the method of pattern mixture sensitivity analysis after multiple imputation using colorectal cancer data as an example. We focus on the Dukes' stage variable which has the highest proportion of missing observations. First, we find the probability of being in each Dukes' stage given the MAR imputed dataset. We use these probabilities in a questionnaire to elicit prior beliefs from experts on what they believe the probability would be in the missing data. The questionnaire responses are then used in a Dirichlet draw to create a Bayesian 'missing not at random' (MNAR) prior to impute the missing observations. The model of interest is applied and inferences are compared to those from the MAR imputed data. The inferences were largely insensitive to departure from MAR. Inferences under MNAR suggested a smaller association between Dukes' stage and death, though the association remained positive and with similarly low p values. We conclude by discussing the positives and negatives of our method and highlight the importance of making people aware of the need to test the MAR assumption.
de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo
2014-01-01
Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The ‘communities’ of questionnaire items that emerge from our community detection method form possible ‘functional constructs’ inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such ‘functional constructs’ suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling. PMID:25036766
de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo
2014-01-01
Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.
What time is it? Deep learning approaches for circadian rhythms.
Agostinelli, Forest; Ceglia, Nicholas; Shahbaba, Babak; Sassone-Corsi, Paolo; Baldi, Pierre
2016-06-15
Circadian rhythms date back to the origins of life, are found in virtually every species and every cell, and play fundamental roles in functions ranging from metabolism to cognition. Modern high-throughput technologies allow the measurement of concentrations of transcripts, metabolites and other species along the circadian cycle creating novel computational challenges and opportunities, including the problems of inferring whether a given species oscillate in circadian fashion or not, and inferring the time at which a set of measurements was taken. We first curate several large synthetic and biological time series datasets containing labels for both periodic and aperiodic signals. We then use deep learning methods to develop and train BIO_CYCLE, a system to robustly estimate which signals are periodic in high-throughput circadian experiments, producing estimates of amplitudes, periods, phases, as well as several statistical significance measures. Using the curated data, BIO_CYCLE is compared to other approaches and shown to achieve state-of-the-art performance across multiple metrics. We then use deep learning methods to develop and train BIO_CLOCK to robustly estimate the time at which a particular single-time-point transcriptomic experiment was carried. In most cases, BIO_CLOCK can reliably predict time, within approximately 1 h, using the expression levels of only a small number of core clock genes. BIO_CLOCK is shown to work reasonably well across tissue types, and often with only small degradation across conditions. BIO_CLOCK is used to annotate most mouse experiments found in the GEO database with an inferred time stamp. All data and software are publicly available on the CircadiOmics web portal: circadiomics.igb.uci.edu/ fagostin@uci.edu or pfbaldi@uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
What time is it? Deep learning approaches for circadian rhythms
Agostinelli, Forest; Ceglia, Nicholas; Shahbaba, Babak; Sassone-Corsi, Paolo; Baldi, Pierre
2016-01-01
Motivation: Circadian rhythms date back to the origins of life, are found in virtually every species and every cell, and play fundamental roles in functions ranging from metabolism to cognition. Modern high-throughput technologies allow the measurement of concentrations of transcripts, metabolites and other species along the circadian cycle creating novel computational challenges and opportunities, including the problems of inferring whether a given species oscillate in circadian fashion or not, and inferring the time at which a set of measurements was taken. Results: We first curate several large synthetic and biological time series datasets containing labels for both periodic and aperiodic signals. We then use deep learning methods to develop and train BIO_CYCLE, a system to robustly estimate which signals are periodic in high-throughput circadian experiments, producing estimates of amplitudes, periods, phases, as well as several statistical significance measures. Using the curated data, BIO_CYCLE is compared to other approaches and shown to achieve state-of-the-art performance across multiple metrics. We then use deep learning methods to develop and train BIO_CLOCK to robustly estimate the time at which a particular single-time-point transcriptomic experiment was carried. In most cases, BIO_CLOCK can reliably predict time, within approximately 1 h, using the expression levels of only a small number of core clock genes. BIO_CLOCK is shown to work reasonably well across tissue types, and often with only small degradation across conditions. BIO_CLOCK is used to annotate most mouse experiments found in the GEO database with an inferred time stamp. Availability and Implementation: All data and software are publicly available on the CircadiOmics web portal: circadiomics.igb.uci.edu/. Contacts: fagostin@uci.edu or pfbaldi@uci.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307647
Knowledge-guided fuzzy logic modeling to infer cellular signaling networks from proteomic data
Liu, Hui; Zhang, Fan; Mishra, Shital Kumar; Zhou, Shuigeng; Zheng, Jie
2016-01-01
Modeling of signaling pathways is crucial for understanding and predicting cellular responses to drug treatments. However, canonical signaling pathways curated from literature are seldom context-specific and thus can hardly predict cell type-specific response to external perturbations; purely data-driven methods also have drawbacks such as limited biological interpretability. Therefore, hybrid methods that can integrate prior knowledge and real data for network inference are highly desirable. In this paper, we propose a knowledge-guided fuzzy logic network model to infer signaling pathways by exploiting both prior knowledge and time-series data. In particular, the dynamic time warping algorithm is employed to measure the goodness of fit between experimental and predicted data, so that our method can model temporally-ordered experimental observations. We evaluated the proposed method on a synthetic dataset and two real phosphoproteomic datasets. The experimental results demonstrate that our model can uncover drug-induced alterations in signaling pathways in cancer cells. Compared with existing hybrid models, our method can model feedback loops so that the dynamical mechanisms of signaling networks can be uncovered from time-series data. By calibrating generic models of signaling pathways against real data, our method supports precise predictions of context-specific anticancer drug effects, which is an important step towards precision medicine. PMID:27774993
Bayesian reconstruction of transmission within outbreaks using genomic variants.
De Maio, Nicola; Worby, Colin J; Wilson, Daniel J; Stoesser, Nicole
2018-04-01
Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events.
Long-branch attraction bias and inconsistency in Bayesian phylogenetics.
Kolaczkowski, Bryan; Thornton, Joseph W
2009-12-09
Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias--which is apparent under both controlled simulation conditions and in analyses of empirical sequence data--also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages--that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.
Predictive Coding in Area V4: Dynamic Shape Discrimination under Partial Occlusion
Choi, Hannah; Pasupathy, Anitha; Shea-Brown, Eric
2018-01-01
The primate visual system has an exquisite ability to discriminate partially occluded shapes. Recent electrophysiological recordings suggest that response dynamics in intermediate visual cortical area V4, shaped by feedback from prefrontal cortex (PFC), may play a key role. To probe the algorithms that may underlie these findings, we build and test a model of V4 and PFC interactions based on a hierarchical predictive coding framework. We propose that probabilistic inference occurs in two steps. Initially, V4 responses are driven solely by bottom-up sensory input and are thus strongly influenced by the level of occlusion. After a delay, V4 responses combine both feedforward input and feedback signals from the PFC; the latter reflect predictions made by PFC about the visual stimulus underlying V4 activity. We find that this model captures key features of V4 and PFC dynamics observed in experiments. Specifically, PFC responses are strongest for occluded stimuli and delayed responses in V4 are less sensitive to occlusion, supporting our hypothesis that the feedback signals from PFC underlie robust discrimination of occluded shapes. Thus, our study proposes that area V4 and PFC participate in hierarchical inference, with feedback signals encoding top-down predictions about occluded shapes. PMID:29566355
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Data-Driven Anomaly Detection Performance for the Ares I-X Ground Diagnostic Prototype
NASA Technical Reports Server (NTRS)
Martin, Rodney A.; Schwabacher, Mark A.; Matthews, Bryan L.
2010-01-01
In this paper, we will assess the performance of a data-driven anomaly detection algorithm, the Inductive Monitoring System (IMS), which can be used to detect simulated Thrust Vector Control (TVC) system failures. However, the ability of IMS to detect these failures in a true operational setting may be related to the realistic nature of how they are simulated. As such, we will investigate both a low fidelity and high fidelity approach to simulating such failures, with the latter based upon the underlying physics. Furthermore, the ability of IMS to detect anomalies that were previously unknown and not previously simulated will be studied in earnest, as well as apparent deficiencies or misapplications that result from using the data-driven paradigm. Our conclusions indicate that robust detection performance of simulated failures using IMS is not appreciably affected by the use of a high fidelity simulation. However, we have found that the inclusion of a data-driven algorithm such as IMS into a suite of deployable health management technologies does add significant value.
DOE Office of Scientific and Technical Information (OSTI.GOV)
La Russa, D
Purpose: The purpose of this project is to develop a robust method of parameter estimation for a Poisson-based TCP model using Bayesian inference. Methods: Bayesian inference was performed using the PyMC3 probabilistic programming framework written in Python. A Poisson-based TCP regression model that accounts for clonogen proliferation was fit to observed rates of local relapse as a function of equivalent dose in 2 Gy fractions for a population of 623 stage-I non-small-cell lung cancer patients. The Slice Markov Chain Monte Carlo sampling algorithm was used to sample the posterior distributions, and was initiated using the maximum of the posterior distributionsmore » found by optimization. The calculation of TCP with each sample step required integration over the free parameter α, which was performed using an adaptive 24-point Gauss-Legendre quadrature. Convergence was verified via inspection of the trace plot and posterior distribution for each of the fit parameters, as well as with comparisons of the most probable parameter values with their respective maximum likelihood estimates. Results: Posterior distributions for α, the standard deviation of α (σ), the average tumour cell-doubling time (Td), and the repopulation delay time (Tk), were generated assuming α/β = 10 Gy, and a fixed clonogen density of 10{sup 7} cm−{sup 3}. Posterior predictive plots generated from samples from these posterior distributions are in excellent agreement with the observed rates of local relapse used in the Bayesian inference. The most probable values of the model parameters also agree well with maximum likelihood estimates. Conclusion: A robust method of performing Bayesian inference of TCP data using a complex TCP model has been established.« less
Testing for Granger Causality in the Frequency Domain: A Phase Resampling Method.
Liu, Siwei; Molenaar, Peter
2016-01-01
This article introduces phase resampling, an existing but rarely used surrogate data method for making statistical inferences of Granger causality in frequency domain time series analysis. Granger causality testing is essential for establishing causal relations among variables in multivariate dynamic processes. However, testing for Granger causality in the frequency domain is challenging due to the nonlinear relation between frequency domain measures (e.g., partial directed coherence, generalized partial directed coherence) and time domain data. Through a simulation study, we demonstrate that phase resampling is a general and robust method for making statistical inferences even with short time series. With Gaussian data, phase resampling yields satisfactory type I and type II error rates in all but one condition we examine: when a small effect size is combined with an insufficient number of data points. Violations of normality lead to slightly higher error rates but are mostly within acceptable ranges. We illustrate the utility of phase resampling with two empirical examples involving multivariate electroencephalography (EEG) and skin conductance data.
MultiNest: Efficient and Robust Bayesian Inference
NASA Astrophysics Data System (ADS)
Feroz, F.; Hobson, M. P.; Bridges, M.
2011-09-01
We present further development and the first public release of our multimodal nested sampling algorithm, called MultiNest. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson (2008), which itself significantly outperformed existing MCMC techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MultiNest algorithm is demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla LambdaCDM model to include spatial curvature and a varying equation of state for dark energy. The MultiNest software is fully parallelized using MPI and includes an interface to CosmoMC. It will also be released as part of the SuperBayeS package, for the analysis of supersymmetric theories of particle physics, at this http URL.
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
Urrestarazu, Jorge; Royo, José B.; Santesteban, Luis G.; Miranda, Carlos
2015-01-01
Fingerprinting information can be used to elucidate in a robust manner the genetic structure of germplasm collections, allowing a more rational and fine assessment of genetic resources. Bayesian model-based approaches are nowadays majorly preferred to infer genetic structure, but it is still largely unresolved how marker sets should be built in order to obtain a robust inference. The objective was to evaluate, in Pyrus germplasm collections, the influence of the SSR marker set size on the genetic structure inferred, also evaluating the influence of the criterion used to select those markers. Inferences were performed considering an increasing number of SSR markers that ranged from just two up to 25, incorporated one at a time into the analysis. The influence of the number of SSR markers used was evaluated comparing the number of populations and the strength of the signal detected, and also the similarity of the genotype assignments to populations between analyses. In order to test if those results were influenced by the criterion used to select the SSRs, several choosing scenarios based on the discrimination power or the fixation index values of the SSRs were tested. Our results indicate that population structure could be inferred accurately once a certain SSR number threshold was reached, which depended on the underlying structure within the genotypes, but the method used to select the markers included on each set appeared not to be very relevant. The minimum number of SSRs required to provide robust structure inferences and adequate measurements of the differentiation, even when low differentiation levels exist within populations, was proved similar to that of the complete list of recommended markers for fingerprinting. When a SSR set size similar to the minimum marker sets recommended for fingerprinting it is used, only major divisions or moderate (F ST>0.05) differentiation of the germplasm are detected. PMID:26382618
Mistaking geography for biology: inferring processes from species distributions.
Warren, Dan L; Cardillo, Marcel; Rosauer, Dan F; Bolnick, Daniel I
2014-10-01
Over the past few decades, there has been a rapid proliferation of statistical methods that infer evolutionary and ecological processes from data on species distributions. These methods have led to considerable new insights, but they often fail to account for the effects of historical biogeography on present-day species distributions. Because the geography of speciation can lead to patterns of spatial and temporal autocorrelation in the distributions of species within a clade, this can result in misleading inferences about the importance of deterministic processes in generating spatial patterns of biodiversity. In this opinion article, we discuss ways in which patterns of species distributions driven by historical biogeography are often interpreted as evidence of particular evolutionary or ecological processes. We focus on three areas that are especially prone to such misinterpretations: community phylogenetics, environmental niche modelling, and analyses of beta diversity (compositional turnover of biodiversity). Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Characterizing wood-plastic composites via data-driven methodologies
John G. Michopoulos; John C. Hermanson; Robert Badaliance
2007-01-01
The recent increase of wood-plastic composite materials in various application areas has underlined the need for an efficient and robust methodology to characterize their nonlinear anisotropic constitutive behavior. In addition, the multiplicity of various loading conditions in structures utilizing these materials further increases the need for a characterization...
Data-driven reverse engineering of signaling pathways using ensembles of dynamic models.
Henriques, David; Villaverde, Alejandro F; Rocha, Miguel; Saez-Rodriguez, Julio; Banga, Julio R
2017-02-01
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM's ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge.
Data-driven reverse engineering of signaling pathways using ensembles of dynamic models
Henriques, David; Villaverde, Alejandro F.; Banga, Julio R.
2017-01-01
Despite significant efforts and remarkable progress, the inference of signaling networks from experimental data remains very challenging. The problem is particularly difficult when the objective is to obtain a dynamic model capable of predicting the effect of novel perturbations not considered during model training. The problem is ill-posed due to the nonlinear nature of these systems, the fact that only a fraction of the involved proteins and their post-translational modifications can be measured, and limitations on the technologies used for growing cells in vitro, perturbing them, and measuring their variations. As a consequence, there is a pervasive lack of identifiability. To overcome these issues, we present a methodology called SELDOM (enSEmbLe of Dynamic lOgic-based Models), which builds an ensemble of logic-based dynamic models, trains them to experimental data, and combines their individual simulations into an ensemble prediction. It also includes a model reduction step to prune spurious interactions and mitigate overfitting. SELDOM is a data-driven method, in the sense that it does not require any prior knowledge of the system: the interaction networks that act as scaffolds for the dynamic models are inferred from data using mutual information. We have tested SELDOM on a number of experimental and in silico signal transduction case-studies, including the recent HPN-DREAM breast cancer challenge. We found that its performance is highly competitive compared to state-of-the-art methods for the purpose of recovering network topology. More importantly, the utility of SELDOM goes beyond basic network inference (i.e. uncovering static interaction networks): it builds dynamic (based on ordinary differential equation) models, which can be used for mechanistic interpretations and reliable dynamic predictions in new experimental conditions (i.e. not used in the training). For this task, SELDOM’s ensemble prediction is not only consistently better than predictions from individual models, but also often outperforms the state of the art represented by the methods used in the HPN-DREAM challenge. PMID:28166222
A Hybrid Neuro-Fuzzy Model For Integrating Large Earth-Science Datasets
NASA Astrophysics Data System (ADS)
Porwal, A.; Carranza, J.; Hale, M.
2004-12-01
A GIS-based hybrid neuro-fuzzy approach to integration of large earth-science datasets for mineral prospectivity mapping is described. It implements a Takagi-Sugeno type fuzzy inference system in the framework of a four-layered feed-forward adaptive neural network. Each unique combination of the datasets is considered a feature vector whose components are derived by knowledge-based ordinal encoding of the constituent datasets. A subset of feature vectors with a known output target vector (i.e., unique conditions known to be associated with either a mineralized or a barren location) is used for the training of an adaptive neuro-fuzzy inference system. Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure for mapping each training vector to its output target vector with minimum sum of squared error. The trained adaptive neuro-fuzzy inference system is used to process all feature vectors. The output for each feature vector is a value that indicates the extent to which a feature vector belongs to the mineralized class or the barren class. These values are used to generate a prospectivity map. The procedure is demonstrated by an application to regional-scale base metal prospectivity mapping in a study area located in the Aravalli metallogenic province (western India). A comparison of the hybrid neuro-fuzzy approach with pure knowledge-driven fuzzy and pure data-driven neural network approaches indicates that the former offers a superior method for integrating large earth-science datasets for predictive spatial mathematical modelling.
Robust biological parametric mapping: an improved technique for multimodal brain image analysis
NASA Astrophysics Data System (ADS)
Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.
2011-03-01
Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.
Data-driven outbreak forecasting with a simple nonlinear growth model
Lega, Joceline; Brown, Heidi E.
2016-01-01
Recent events have thrown the spotlight on infectious disease outbreak response. We developed a data-driven method, EpiGro, which can be applied to cumulative case reports to estimate the order of magnitude of the duration, peak and ultimate size of an ongoing outbreak. It is based on a surprisingly simple mathematical property of many epidemiological data sets, does not require knowledge or estimation of disease transmission parameters, is robust to noise and to small data sets, and runs quickly due to its mathematical simplicity. Using data from historic and ongoing epidemics, we present the model. We also provide modeling considerations that justify this approach and discuss its limitations. In the absence of other information or in conjunction with other models, EpiGro may be useful to public health responders. PMID:27770752
Bounded-Influence Inference in Regression.
1984-02-01
be viewed as generalization of the classical F-test. By means of the influence function their robustness properties are investigated and optimally...robust tests that maximize the asymptotic power within each class, under the side condition of a bounded influence function , are constructed. Finally, an
Inferring the mode of origin of polyploid species from next-generation sequence data.
Roux, Camille; Pannell, John R
2015-03-01
Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels. © 2015 John Wiley & Sons Ltd.
A CORONAL HOLE'S EFFECTS ON CORONAL MASS EJECTION SHOCK MORPHOLOGY IN THE INNER HELIOSPHERE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wood, B. E.; Wu, C.-C.; Howard, R. A.
2012-08-10
We use STEREO imagery to study the morphology of a shock driven by a fast coronal mass ejection (CME) launched from the Sun on 2011 March 7. The source region of the CME is located just to the east of a coronal hole. The CME ejecta is deflected away from the hole, in contrast with the shock, which readily expands into the fast outflow from the coronal hole. The result is a CME with ejecta not well centered within the shock surrounding it. The shock shape inferred from the imaging is compared with in situ data at 1 AU, wheremore » the shock is observed near Earth by the Wind spacecraft, and at STEREO-A. Shock normals computed from the in situ data are consistent with the shock morphology inferred from imaging.« less
Generative inference for cultural evolution.
Kandler, Anne; Powell, Adam
2018-04-05
One of the major challenges in cultural evolution is to understand why and how various forms of social learning are used in human populations, both now and in the past. To date, much of the theoretical work on social learning has been done in isolation of data, and consequently many insights focus on revealing the learning processes or the distributions of cultural variants that are expected to have evolved in human populations. In population genetics, recent methodological advances have allowed a greater understanding of the explicit demographic and/or selection mechanisms that underlie observed allele frequency distributions across the globe, and their change through time. In particular, generative frameworks-often using coalescent-based simulation coupled with approximate Bayesian computation (ABC)-have provided robust inferences on the human past, with no reliance on a priori assumptions of equilibrium. Here, we demonstrate the applicability and utility of generative inference approaches to the field of cultural evolution. The framework advocated here uses observed population-level frequency data directly to establish the likely presence or absence of particular hypothesized learning strategies. In this context, we discuss the problem of equifinality and argue that, in the light of sparse cultural data and the multiplicity of possible social learning processes, the exclusion of those processes inconsistent with the observed data might be the most instructive outcome. Finally, we summarize the findings of generative inference approaches applied to a number of case studies.This article is part of the theme issue 'Bridging cultural gaps: interdisciplinary studies in human cultural evolution'. © 2018 The Author(s).
Mwakanyamale, Kisa; Slater, Lee; Day-Lewis, Frederick D.; Elwaseif, Mehrez; Johnson, Carole D.
2012-01-01
Characterization of groundwater-surface water exchange is essential for improving understanding of contaminant transport between aquifers and rivers. Fiber-optic distributed temperature sensing (FODTS) provides rich spatiotemporal datasets for quantitative and qualitative analysis of groundwater-surface water exchange. We demonstrate how time-frequency analysis of FODTS and synchronous river stage time series from the Columbia River adjacent to the Hanford 300-Area, Richland, Washington, provides spatial information on the strength of stage-driven exchange of uranium contaminated groundwater in response to subsurface heterogeneity. Although used in previous studies, the stage-temperature correlation coefficient proved an unreliable indicator of the stage-driven forcing on groundwater discharge in the presence of other factors influencing river water temperature. In contrast, S-transform analysis of the stage and FODTS data definitively identifies the spatial distribution of discharge zones and provided information on the dominant forcing periods (≥2 d) of the complex dam operations driving stage fluctuations and hence groundwater-surface water exchange at the 300-Area.
Van Landeghem, Sofie; Van Parys, Thomas; Dubois, Marieke; Inzé, Dirk; Van de Peer, Yves
2016-01-05
Differential networks have recently been introduced as a powerful way to study the dynamic rewiring capabilities of an interactome in response to changing environmental conditions or stimuli. Currently, such differential networks are generated and visualised using ad hoc methods, and are often limited to the analysis of only one condition-specific response or one interaction type at a time. In this work, we present a generic, ontology-driven framework to infer, visualise and analyse an arbitrary set of condition-specific responses against one reference network. To this end, we have implemented novel ontology-based algorithms that can process highly heterogeneous networks, accounting for both physical interactions and regulatory associations, symmetric and directed edges, edge weights and negation. We propose this integrative framework as a standardised methodology that allows a unified view on differential networks and promotes comparability between differential network studies. As an illustrative application, we demonstrate its usefulness on a plant abiotic stress study and we experimentally confirmed a predicted regulator. Diffany is freely available as open-source java library and Cytoscape plugin from http://bioinformatics.psb.ugent.be/supplementary_data/solan/diffany/.
Learning and Information Approaches for Inference in Dynamic Data-Driven Geophysical Applications
NASA Astrophysics Data System (ADS)
Ravela, S.
2015-12-01
Many Geophysical inference problems are characterized by non-linear processes, high-dimensional models and complex uncertainties. A dynamic coupling between models, estimation, and sampling is typically sought to efficiently characterize and reduce uncertainty. This process is however fraught with several difficulties. Among them, the key difficulties are the ability to deal with model errors, efficacy of uncertainty quantification and data assimilation. In this presentation, we present three key ideas from learning and intelligent systems theory and apply them to two geophysical applications. The first idea is the use of Ensemble Learning to compensate for model error, the second is to develop tractable Information Theoretic Learning to deal with non-Gaussianity in inference, and the third is a Manifold Resampling technique for effective uncertainty quantification. We apply these methods, first to the development of a cooperative autonomous observing system using sUAS for studying coherent structures. We apply this to Second, we apply this to the problem of quantifying risk from hurricanes and storm surges in a changing climate. Results indicate that learning approaches can enable new effectiveness in cases where standard approaches to model reduction, uncertainty quantification and data assimilation fail.
A defect-driven diagnostic method for machine tool spindles
Vogl, Gregory W.; Donmez, M. Alkan
2016-01-01
Simple vibration-based metrics are, in many cases, insufficient to diagnose machine tool spindle condition. These metrics couple defect-based motion with spindle dynamics; diagnostics should be defect-driven. A new method and spindle condition estimation device (SCED) were developed to acquire data and to separate system dynamics from defect geometry. Based on this method, a spindle condition metric relying only on defect geometry is proposed. Application of the SCED on various milling and turning spindles shows that the new approach is robust for diagnosing the machine tool spindle condition. PMID:28065985
Maximum caliber inference of nonequilibrium processes
NASA Astrophysics Data System (ADS)
Otten, Moritz; Stock, Gerhard
2010-07-01
Thirty years ago, Jaynes suggested a general theoretical approach to nonequilibrium statistical mechanics, called maximum caliber (MaxCal) [Annu. Rev. Phys. Chem. 31, 579 (1980)]. MaxCal is a variational principle for dynamics in the same spirit that maximum entropy is a variational principle for equilibrium statistical mechanics. Motivated by the success of maximum entropy inference methods for equilibrium problems, in this work the MaxCal formulation is applied to the inference of nonequilibrium processes. That is, given some time-dependent observables of a dynamical process, one constructs a model that reproduces these input data and moreover, predicts the underlying dynamics of the system. For example, the observables could be some time-resolved measurements of the folding of a protein, which are described by a few-state model of the free energy landscape of the system. MaxCal then calculates the probabilities of an ensemble of trajectories such that on average the data are reproduced. From this probability distribution, any dynamical quantity of the system can be calculated, including population probabilities, fluxes, or waiting time distributions. After briefly reviewing the formalism, the practical numerical implementation of MaxCal in the case of an inference problem is discussed. Adopting various few-state models of increasing complexity, it is demonstrated that the MaxCal principle indeed works as a practical method of inference: The scheme is fairly robust and yields correct results as long as the input data are sufficient. As the method is unbiased and general, it can deal with any kind of time dependency such as oscillatory transients and multitime decays.
Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics
Kolaczkowski, Bryan; Thornton, Joseph W.
2009-01-01
Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis. PMID:20011052
Dynamic modelling of microRNA regulation during mesenchymal stem cell differentiation.
Weber, Michael; Sotoca, Ana M; Kupfer, Peter; Guthke, Reinhard; van Zoelen, Everardus J
2013-11-12
Network inference from gene expression data is a typical approach to reconstruct gene regulatory networks. During chondrogenic differentiation of human mesenchymal stem cells (hMSCs), a complex transcriptional network is active and regulates the temporal differentiation progress. As modulators of transcriptional regulation, microRNAs (miRNAs) play a critical role in stem cell differentiation. Integrated network inference aimes at determining interrelations between miRNAs and mRNAs on the basis of expression data as well as miRNA target predictions. We applied the NetGenerator tool in order to infer an integrated gene regulatory network. Time series experiments were performed to measure mRNA and miRNA abundances of TGF-beta1+BMP2 stimulated hMSCs. Network nodes were identified by analysing temporal expression changes, miRNA target gene predictions, time series correlation and literature knowledge. Network inference was performed using NetGenerator to reconstruct a dynamical regulatory model based on the measured data and prior knowledge. The resulting model is robust against noise and shows an optimal trade-off between fitting precision and inclusion of prior knowledge. It predicts the influence of miRNAs on the expression of chondrogenic marker genes and therefore proposes novel regulatory relations in differentiation control. By analysing the inferred network, we identified a previously unknown regulatory effect of miR-524-5p on the expression of the transcription factor SOX9 and the chondrogenic marker genes COL2A1, ACAN and COL10A1. Genome-wide exploration of miRNA-mRNA regulatory relationships is a reasonable approach to identify miRNAs which have so far not been associated with the investigated differentiation process. The NetGenerator tool is able to identify valid gene regulatory networks on the basis of miRNA and mRNA time series data.
Minimally Informative Prior Distributions for PSA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dana L. Kelly; Robert W. Youngblood; Kurt G. Vedros
2010-06-01
A salient feature of Bayesian inference is its ability to incorporate information from a variety of sources into the inference model, via the prior distribution (hereafter simply “the prior”). However, over-reliance on old information can lead to priors that dominate new data. Some analysts seek to avoid this by trying to work with a minimally informative prior distribution. Another reason for choosing a minimally informative prior is to avoid the often-voiced criticism of subjectivity in the choice of prior. Minimally informative priors fall into two broad classes: 1) so-called noninformative priors, which attempt to be completely objective, in that themore » posterior distribution is determined as completely as possible by the observed data, the most well known example in this class being the Jeffreys prior, and 2) priors that are diffuse over the region where the likelihood function is nonnegligible, but that incorporate some information about the parameters being estimated, such as a mean value. In this paper, we compare four approaches in the second class, with respect to their practical implications for Bayesian inference in Probabilistic Safety Assessment (PSA). The most commonly used such prior, the so-called constrained noninformative prior, is a special case of the maximum entropy prior. This is formulated as a conjugate distribution for the most commonly encountered aleatory models in PSA, and is correspondingly mathematically convenient; however, it has a relatively light tail and this can cause the posterior mean to be overly influenced by the prior in updates with sparse data. A more informative prior that is capable, in principle, of dealing more effectively with sparse data is a mixture of conjugate priors. A particular diffuse nonconjugate prior, the logistic-normal, is shown to behave similarly for some purposes. Finally, we review the so-called robust prior. Rather than relying on the mathematical abstraction of entropy, as does the constrained noninformative prior, the robust prior places a heavy-tailed Cauchy prior on the canonical parameter of the aleatory model.« less
A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification
Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; ...
2013-01-01
Background . The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective . To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods . The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expertmore » knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results . The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions . Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.« less
A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification
Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Varnum, Susan M.; Brown, Joseph N.; Riensche, Roderick M.; Adkins, Joshua N.; Jacobs, Jon M.; Hoidal, John R.; Scholand, Mary Beth; Pounds, Joel G.; Blackburn, Michael R.; Rodland, Karin D.; McDermott, Jason E.
2013-01-01
Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integrated into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification. PMID:24223463
A Semiautomated Framework for Integrating Expert Knowledge into Disease Marker Identification
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jing; Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.
2013-10-01
Background. The availability of large complex data sets generated by high throughput technologies has enabled the recent proliferation of disease biomarker studies. However, a recurring problem in deriving biological information from large data sets is how to best incorporate expert knowledge into the biomarker selection process. Objective. To develop a generalizable framework that can incorporate expert knowledge into data-driven processes in a semiautomated way while providing a metric for optimization in a biomarker selection scheme. Methods. The framework was implemented as a pipeline consisting of five components for the identification of signatures from integrated clustering (ISIC). Expert knowledge was integratedmore » into the biomarker identification process using the combination of two distinct approaches; a distance-based clustering approach and an expert knowledge-driven functional selection. Results. The utility of the developed framework ISIC was demonstrated on proteomics data from a study of chronic obstructive pulmonary disease (COPD). Biomarker candidates were identified in a mouse model using ISIC and validated in a study of a human cohort. Conclusions. Expert knowledge can be introduced into a biomarker discovery process in different ways to enhance the robustness of selected marker candidates. Developing strategies for extracting orthogonal and robust features from large data sets increases the chances of success in biomarker identification.« less
Helaers, Raphaël; Milinkovitch, Michel C
2010-07-15
The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org.
2010-01-01
Background The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Results Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. Conclusions The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org. PMID:20633263
NASA Technical Reports Server (NTRS)
Bakhtiari-Nejad, Maryam; Nguyen, Nhan T.; Krishnakumar, Kalmanje Srinvas
2009-01-01
This paper presents the application of Bounded Linear Stability Analysis (BLSA) method for metrics driven adaptive control. The bounded linear stability analysis method is used for analyzing stability of adaptive control models, without linearizing the adaptive laws. Metrics-driven adaptive control introduces a notion that adaptation should be driven by some stability metrics to achieve robustness. By the application of bounded linear stability analysis method the adaptive gain is adjusted during the adaptation in order to meet certain phase margin requirements. Analysis of metrics-driven adaptive control is evaluated for a linear damaged twin-engine generic transport model of aircraft. The analysis shows that the system with the adjusted adaptive gain becomes more robust to unmodeled dynamics or time delay.
NASA Astrophysics Data System (ADS)
He, Zhibin; Wen, Xiaohu; Liu, Hu; Du, Jun
2014-02-01
Data driven models are very useful for river flow forecasting when the underlying physical relationships are not fully understand, but it is not clear whether these data driven models still have a good performance in the small river basin of semiarid mountain regions where have complicated topography. In this study, the potential of three different data driven methods, artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS) and support vector machine (SVM) were used for forecasting river flow in the semiarid mountain region, northwestern China. The models analyzed different combinations of antecedent river flow values and the appropriate input vector has been selected based on the analysis of residuals. The performance of the ANN, ANFIS and SVM models in training and validation sets are compared with the observed data. The model which consists of three antecedent values of flow has been selected as the best fit model for river flow forecasting. To get more accurate evaluation of the results of ANN, ANFIS and SVM models, the four quantitative standard statistical performance evaluation measures, the coefficient of correlation (R), root mean squared error (RMSE), Nash-Sutcliffe efficiency coefficient (NS) and mean absolute relative error (MARE), were employed to evaluate the performances of various models developed. The results indicate that the performance obtained by ANN, ANFIS and SVM in terms of different evaluation criteria during the training and validation period does not vary substantially; the performance of the ANN, ANFIS and SVM models in river flow forecasting was satisfactory. A detailed comparison of the overall performance indicated that the SVM model performed better than ANN and ANFIS in river flow forecasting for the validation data sets. The results also suggest that ANN, ANFIS and SVM method can be successfully applied to establish river flow with complicated topography forecasting models in the semiarid mountain regions.
NASA Astrophysics Data System (ADS)
Underwood, Kristen L.; Rizzo, Donna M.; Schroth, Andrew W.; Dewoolkar, Mandar M.
2017-12-01
Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously as the default breakpoint in segmented regression models to discern differences between pre and post-threshold export regimes. We then applied a Self-Organizing Map (SOM), which partitioned the watersheds into clusters of TSS, PP, and DP export regimes using watershed characteristics, as well as Bayesian regression intercepts and slopes. A SOM defined two clusters of high-flux basins, one where PP flux was predominantly episodic and hydrologically driven; and another in which the sediment and nutrient sourcing and mobilization were more bimodal, resulting from both hydrologic processes at post-threshold discharges and reactive processes (e.g., nutrient cycling or lateral/vertical exchanges of fine sediment) at prethreshold discharges. A separate DP SOM defined two high-flux clusters exhibiting a bimodal concentration-discharge response, but driven by differing land use. Our novel framework shows promise as a tool with broad management application that provides insights into landscape drivers of riverine solute and sediment export.
Distributed Sensing and Processing for Multi-Camera Networks
NASA Astrophysics Data System (ADS)
Sankaranarayanan, Aswin C.; Chellappa, Rama; Baraniuk, Richard G.
Sensor networks with large numbers of cameras are becoming increasingly prevalent in a wide range of applications, including video conferencing, motion capture, surveillance, and clinical diagnostics. In this chapter, we identify some of the fundamental challenges in designing such systems: robust statistical inference, computationally efficiency, and opportunistic and parsimonious sensing. We show that the geometric constraints induced by the imaging process are extremely useful for identifying and designing optimal estimators for object detection and tracking tasks. We also derive pipelined and parallelized implementations of popular tools used for statistical inference in non-linear systems, of which multi-camera systems are examples. Finally, we highlight the use of the emerging theory of compressive sensing in reducing the amount of data sensed and communicated by a camera network.
Visualizing time-related data in biology, a review
Secrier, Maria; Schneider, Reinhard
2014-01-01
Time is of the essence in biology as in so much else. For example, monitoring disease progression or the timing of developmental defects is important for the processes of drug discovery and therapy trials. Furthermore, an understanding of the basic dynamics of biological phenomena that are often strictly time regulated (e.g. circadian rhythms) is needed to make accurate inferences about the evolution of biological processes. Recent advances in technologies have enabled us to measure timing effects more accurately and in more detail. This has driven related advances in visualization and analysis tools that try to effectively exploit this data. Beyond timeline plots, notable attempts at more involved temporal interpretation have been made in recent years, but awareness of the available resources is still limited within the scientific community. Here, we review some advances in biological visualization of time-driven processes and consider how they aid data analysis and interpretation. PMID:23585583
Inferring microbial interaction networks from metagenomic data using SgLV-EKF algorithm.
Alshawaqfeh, Mustafa; Serpedin, Erchin; Younes, Ahmad Bani
2017-03-27
Inferring the microbial interaction networks (MINs) and modeling their dynamics are critical in understanding the mechanisms of the bacterial ecosystem and designing antibiotic and/or probiotic therapies. Recently, several approaches were proposed to infer MINs using the generalized Lotka-Volterra (gLV) model. Main drawbacks of these models include the fact that these models only consider the measurement noise without taking into consideration the uncertainties in the underlying dynamics. Furthermore, inferring the MIN is characterized by the limited number of observations and nonlinearity in the regulatory mechanisms. Therefore, novel estimation techniques are needed to address these challenges. This work proposes SgLV-EKF: a stochastic gLV model that adopts the extended Kalman filter (EKF) algorithm to model the MIN dynamics. In particular, SgLV-EKF employs a stochastic modeling of the MIN by adding a noise term to the dynamical model to compensate for modeling uncertainties. This stochastic modeling is more realistic than the conventional gLV model which assumes that the MIN dynamics are perfectly governed by the gLV equations. After specifying the stochastic model structure, we propose the EKF to estimate the MIN. SgLV-EKF was compared with two similarity-based algorithms, one algorithm from the integral-based family and two regression-based algorithms, in terms of the achieved performance on two synthetic data-sets and two real data-sets. The first data-set models the randomness in measurement data, whereas, the second data-set incorporates uncertainties in the underlying dynamics. The real data-sets are provided by a recent study pertaining to an antibiotic-mediated Clostridium difficile infection. The experimental results demonstrate that SgLV-EKF outperforms the alternative methods in terms of robustness to measurement noise, modeling errors, and tracking the dynamics of the MIN. Performance analysis demonstrates that the proposed SgLV-EKF algorithm represents a powerful and reliable tool to infer MINs and track their dynamics.
Reasoning and Knowledge Acquisition Framework for 5G Network Analytics
2017-01-01
Autonomic self-management is a key challenge for next-generation networks. This paper proposes an automated analysis framework to infer knowledge in 5G networks with the aim to understand the network status and to predict potential situations that might disrupt the network operability. The framework is based on the Endsley situational awareness model, and integrates automated capabilities for metrics discovery, pattern recognition, prediction techniques and rule-based reasoning to infer anomalous situations in the current operational context. Those situations should then be mitigated, either proactive or reactively, by a more complex decision-making process. The framework is driven by a use case methodology, where the network administrator is able to customize the knowledge inference rules and operational parameters. The proposal has also been instantiated to prove its adaptability to a real use case. To this end, a reference network traffic dataset was used to identify suspicious patterns and to predict the behavior of the monitored data volume. The preliminary results suggest a good level of accuracy on the inference of anomalous traffic volumes based on a simple configuration. PMID:29065473
Reasoning and Knowledge Acquisition Framework for 5G Network Analytics.
Sotelo Monge, Marco Antonio; Maestre Vidal, Jorge; García Villalba, Luis Javier
2017-10-21
Autonomic self-management is a key challenge for next-generation networks. This paper proposes an automated analysis framework to infer knowledge in 5G networks with the aim to understand the network status and to predict potential situations that might disrupt the network operability. The framework is based on the Endsley situational awareness model, and integrates automated capabilities for metrics discovery, pattern recognition, prediction techniques and rule-based reasoning to infer anomalous situations in the current operational context. Those situations should then be mitigated, either proactive or reactively, by a more complex decision-making process. The framework is driven by a use case methodology, where the network administrator is able to customize the knowledge inference rules and operational parameters. The proposal has also been instantiated to prove its adaptability to a real use case. To this end, a reference network traffic dataset was used to identify suspicious patterns and to predict the behavior of the monitored data volume. The preliminary results suggest a good level of accuracy on the inference of anomalous traffic volumes based on a simple configuration.
Data-Driven Robust Control Design: Unfalsified Control
2006-12-01
only be determined by fresh information which we shall no doubt find waiting for us.” Sherlock Holmes Arthur Conan Doyle 1.0 INTRODUCTION Though the...begins to twist facts to suit theories instead of theories to suit facts.” Sherlock Holmes Arthur Conan Doyle 6.0 ACKNOWLEDGMENT I thank my current and
Agency as Inference: Toward a Critical Theory of Knowledge Objectification
ERIC Educational Resources Information Center
Gutiérrez, José Francisco
2013-01-01
This article evaluates the plausibility of synthesizing theory of knowledge objectification (Radford, 2003) with equity research on mathematics education. I suggest the cognitive phenomenon of mathematical inference as a promising locus for investigating the types of agency that equity-driven scholars often care for. In particular, I conceptualize…
Data mining and statistical inference in selective laser melting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, Chandrika
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Data mining and statistical inference in selective laser melting
Kamath, Chandrika
2016-01-11
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
NASA Astrophysics Data System (ADS)
Sheng, J. X.; Jacob, D.; Turner, A. J.; Maasakkers, J. D.; Benmergui, J. S.; Bloom, A. A.; Arndt, C.; Gautam, R.; Zavala Araiza, D.; Hamburg, S.; Boesch, H.; Parker, R.
2017-12-01
We use six years (2010-2015) of methane column data from the GOSAT satellite to examine trends in atmospheric methane over North America and infer trends in emissions. Local methane enhancements above background are diagnosed in the GOSAT data on a 0.5°x0.5° grid by estimating the local background as the low (10th-25th) quantile of the deseasonalized frequency distributions of the data for individual years. Trends in methane enhancements on the 0.5°x0.5° grid are then aggregated nationally and for individual source sectors, using information from state-of-science bottom-up inventories, to increase statistical power. We infer that US methane emissions increased by 1.9% a-1 over the six-year period, with contributions from both oil/gas systems (possibly unconventional gas production) and from livestock in the Midwest (possibly swine production). Mexican emissions show a decrease that can be attributed to a decreasing cattle population. Canadian emissions show interannual variability driven by wetlands emissions and correlated with wetland areal extent. The US emission trends inferred from the GOSAT data are within the constraint provided by surface observations from the North American Carbon Program network.
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Comment on "Ducklings imprint on the relational concept of 'same or different'".
Hupé, Jean-Michel
2017-02-24
Martinho and Kacelnik's (Reports, 15 July 2016, p. 286) finding that mallard ducklings can deal with abstract concepts is important for understanding the evolution of cognition. However, a statistically more robust analysis of the data calls their conclusions into question. This example brings to light the risk of drawing too strong an inference by relying solely on P values. Copyright © 2017, American Association for the Advancement of Science.
Misra, Sudip; Singh, Ranjit; Rohith Mohan, S. V.
2010-01-01
The proposed mechanism for jamming attack detection for wireless sensor networks is novel in three respects: firstly, it upgrades the jammer to include versatile military jammers; secondly, it graduates from the existing node-centric detection system to the network-centric system making it robust and economical at the nodes, and thirdly, it tackles the problem through fuzzy inference system, as the decision regarding intensity of jamming is seldom crisp. The system with its high robustness, ability to grade nodes with jamming indices, and its true-detection rate as high as 99.8%, is worthy of consideration for information warfare defense purposes. PMID:22319307
Real-Time Detection and Tracking of Multiple People in Laser Scan Frames
NASA Astrophysics Data System (ADS)
Cui, J.; Song, X.; Zhao, H.; Zha, H.; Shibasaki, R.
This chapter presents an approach to detect and track multiple people ro bustly in real time using laser scan frames. The detection and tracking of people in real time is a problem that arises in a variety of different contexts. Examples in clude intelligent surveillance for security purposes, scene analysis for service robot, and crowd behavior analysis for human behavior study. Over the last several years, an increasing number of laser-based people-tracking systems have been developed in both mobile robotics platforms and fixed platforms using one or multiple laser scanners. It has been proved that processing on laser scanner data makes the tracker much faster and more robust than a vision-only based one in complex situations. In this chapter, we present a novel robust tracker to detect and track multiple people in a crowded and open area in real time. First, raw data are obtained that measures two legs for each people at a height of 16 cm from horizontal ground with multiple registered laser scanners. A stable feature is extracted using accumulated distribu tion of successive laser frames. In this way, the noise that generates split and merged measurements is smoothed well, and the pattern of rhythmic swinging legs is uti lized to extract each leg. Second, a probabilistic tracking model is presented, and then a sequential inference process using a Bayesian rule is described. A sequential inference process is difficult to compute analytically, so two strategies are presented to simplify the computation. In the case of independent tracking, the Kalman fil ter is used with a more efficient measurement likelihood model based on a region coherency property. Finally, to deal with trajectory fragments we present a concise approach to fuse just a little visual information from synchronized video camera to laser data. Evaluation with real data shows that the proposed method is robust and effective. It achieves a significant improvement compared with existing laser-based trackers.
Leveraging the Polar Cap: Ground-Based Measurements of the Solar Wind
NASA Astrophysics Data System (ADS)
Urban, K. D.; Gerrard, A. J.; Weatherwax, A. T.; Lanzerotti, L. J.; Patterson, J. D.
2016-12-01
In this study, we look at and identify relationships between solar wind quantities that have previously been shown to have direct access into the very high-latitude polar cap as measured by ground-based riometers and magnetometers in Antarctica: ultra-low frequency (ULF) power in the interplanetary magnetic field (IMF) Bz component and solar energetic proton (SEP) flux (Urban [2016] and Patterson et al. [2001], respectively). It is shown that such solar wind and ground-based observations can be used to infer the hydromagnetic structure and magnetospheric mapping of the polar cap region in a data-driven manner, and that high-latitude ground-based instrumentation can be used to infer concurrent various state parameters of the geospace environment.
A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data
Calhoun, Vince D.; Liu, Jingyu; Adalı, Tülay
2009-01-01
Independent component analysis (ICA) has become an increasingly utilized approach for analyzing brain imaging data. In contrast to the widely used general linear model (GLM) that requires the user to parameterize the data (e.g. the brain's response to stimuli), ICA, by relying upon a general assumption of independence, allows the user to be agnostic regarding the exact form of the response. In addition, ICA is intrinsically a multivariate approach, and hence each component provides a grouping of brain activity into regions that share the same response pattern thus providing a natural measure of functional connectivity. There are a wide variety of ICA approaches that have been proposed, in this paper we focus upon two distinct methods. The first part of this paper reviews the use of ICA for making group inferences from fMRI data. We provide an overview of current approaches for utilizing ICA to make group inferences with a focus upon the group ICA approach implemented in the GIFT software. In the next part of this paper, we provide an overview of the use of ICA to combine or fuse multimodal data. ICA has proven particularly useful for data fusion of multiple tasks or data modalities such as single nucleotide polymorphism (SNP) data or event-related potentials. As demonstrated by a number of examples in this paper, ICA is a powerful and versatile data-driven approach for studying the brain. PMID:19059344
A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data.
Calhoun, Vince D; Liu, Jingyu; Adali, Tülay
2009-03-01
Independent component analysis (ICA) has become an increasingly utilized approach for analyzing brain imaging data. In contrast to the widely used general linear model (GLM) that requires the user to parameterize the data (e.g. the brain's response to stimuli), ICA, by relying upon a general assumption of independence, allows the user to be agnostic regarding the exact form of the response. In addition, ICA is intrinsically a multivariate approach, and hence each component provides a grouping of brain activity into regions that share the same response pattern thus providing a natural measure of functional connectivity. There are a wide variety of ICA approaches that have been proposed, in this paper we focus upon two distinct methods. The first part of this paper reviews the use of ICA for making group inferences from fMRI data. We provide an overview of current approaches for utilizing ICA to make group inferences with a focus upon the group ICA approach implemented in the GIFT software. In the next part of this paper, we provide an overview of the use of ICA to combine or fuse multimodal data. ICA has proven particularly useful for data fusion of multiple tasks or data modalities such as single nucleotide polymorphism (SNP) data or event-related potentials. As demonstrated by a number of examples in this paper, ICA is a powerful and versatile data-driven approach for studying the brain.
Highly efficient and robust molecular ruthenium catalysts for water oxidation.
Duan, Lele; Araujo, Carlos Moyses; Ahlquist, Mårten S G; Sun, Licheng
2012-09-25
Water oxidation catalysts are essential components of light-driven water splitting systems, which could convert water to H(2) driven by solar radiation (H(2)O + hν → 1/2O(2) + H(2)). The oxidation of water (H(2)O → 1/2O(2) + 2H(+) + 2e(-)) provides protons and electrons for the production of dihydrogen (2H(+) + 2e(-) → H(2)), a clean-burning and high-capacity energy carrier. One of the obstacles now is the lack of effective and robust water oxidation catalysts. Aiming at developing robust molecular Ru-bda (H(2)bda = 2,2'-bipyridine-6,6'-dicarboxylic acid) water oxidation catalysts, we carried out density functional theory studies, correlated the robustness of catalysts against hydration with the highest occupied molecular orbital levels of a set of ligands, and successfully directed the synthesis of robust Ru-bda water oxidation catalysts. A series of mononuclear ruthenium complexes [Ru(bda)L(2)] (L = pyridazine, pyrimidine, and phthalazine) were subsequently synthesized and shown to effectively catalyze Ce(IV)-driven [Ce(IV) = Ce(NH(4))(2)(NO(3))(6)] water oxidation with high oxygen production rates up to 286 s(-1) and high turnover numbers up to 55,400.
Inference on the Ranks of the Canonical Correlation Matrices for Elliptically Symmetric Populations.
1985-05-01
robust estimates of the covariance matrix, the reader is referred to Devlin, Gnanadesikan and Kettenring (1975) and Maronna (1976). Murihead and...contoured distributions. J. Multivariate Anal. 11, 368-385. 6. DEVLIN, S.J. GNANADESIKAN , R. and KETTENRING, J. (1975). Robust estima- tion and outlier
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Ning; Du, Pengwei; Greitzer, Frank L.
2012-12-31
This paper presents the multi-layer, data-driven advanced reasoning tool (M-DART), a proof-of-principle decision support tool for improved power system operation. M-DART will cross-correlate and examine different data sources to assess anomalies, infer root causes, and anneal data into actionable information. By performing higher-level reasoning “triage” of diverse data sources, M-DART focuses on early detection of emerging power system events and identifies highest priority actions for the human decision maker. M-DART represents a significant advancement over today’s grid monitoring technologies that apply offline analyses to derive model-based guidelines for online real-time operations and use isolated data processing mechanisms focusing on individualmore » data domains. The development of the M-DART will bridge these gaps by reasoning about results obtained from multiple data sources that are enabled by the smart grid infrastructure. This hybrid approach integrates a knowledge base that is trained offline but tuned online to capture model-based relationships while revealing complex causal relationships among data from different domains.« less
Chakraborty, Arindom
2016-12-01
A common objective in longitudinal studies is to characterize the relationship between a longitudinal response process and a time-to-event data. Ordinal nature of the response and possible missing information on covariates add complications to the joint model. In such circumstances, some influential observations often present in the data may upset the analysis. In this paper, a joint model based on ordinal partial mixed model and an accelerated failure time model is used, to account for the repeated ordered response and time-to-event data, respectively. Here, we propose an influence function-based robust estimation method. Monte Carlo expectation maximization method-based algorithm is used for parameter estimation. A detailed simulation study has been done to evaluate the performance of the proposed method. As an application, a data on muscular dystrophy among children is used. Robust estimates are then compared with classical maximum likelihood estimates. © The Author(s) 2014.
Robust regression for large-scale neuroimaging studies.
Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand
2015-05-01
Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.
Reconciling differences in stratospheric ozone composites
NASA Astrophysics Data System (ADS)
Ball, William T.; Alsing, Justin; Mortlock, Daniel J.; Rozanov, Eugene V.; Tummon, Fiona; Haigh, Joanna D.
2017-10-01
Observations of stratospheric ozone from multiple instruments now span three decades; combining these into composite datasets allows long-term ozone trends to be estimated. Recently, several ozone composites have been published, but trends disagree by latitude and altitude, even between composites built upon the same instrument data. We confirm that the main causes of differences in decadal trend estimates lie in (i) steps in the composite time series when the instrument source data changes and (ii) artificial sub-decadal trends in the underlying instrument data. These artefacts introduce features that can alias with regressors in multiple linear regression (MLR) analysis; both can lead to inaccurate trend estimates. Here, we aim to remove these artefacts using Bayesian methods to infer the underlying ozone time series from a set of composites by building a joint-likelihood function using a Gaussian-mixture density to model outliers introduced by data artefacts, together with a data-driven prior on ozone variability that incorporates knowledge of problems during instrument operation. We apply this Bayesian self-calibration approach to stratospheric ozone in 10° bands from 60° S to 60° N and from 46 to 1 hPa (˜ 21-48 km) for 1985-2012. There are two main outcomes: (i) we independently identify and confirm many of the data problems previously identified, but which remain unaccounted for in existing composites; (ii) we construct an ozone composite, with uncertainties, that is free from most of these problems - we call this the BAyeSian Integrated and Consolidated (BASIC) composite. To analyse the new BASIC composite, we use dynamical linear modelling (DLM), which provides a more robust estimate of long-term changes through Bayesian inference than MLR. BASIC and DLM, together, provide a step forward in improving estimates of decadal trends. Our results indicate a significant recovery of ozone since 1998 in the upper stratosphere, of both northern and southern midlatitudes, in all four composites analysed, and particularly in the BASIC composite. The BASIC results also show no hemispheric difference in the recovery at midlatitudes, in contrast to an apparent feature that is present, but not consistent, in the four composites. Our overall conclusion is that it is possible to effectively combine different ozone composites and account for artefacts and drifts, and that this leads to a clear and significant result that upper stratospheric ozone levels have increased since 1998, following an earlier decline.
MOST: most-similar ligand based approach to target prediction.
Huang, Tao; Mi, Hong; Lin, Cheng-Yuan; Zhao, Ling; Zhong, Linda L D; Liu, Feng-Bin; Zhang, Ge; Lu, Ai-Ping; Bian, Zhao-Xiang
2017-03-11
Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power. Here we propose the MOst-Similar ligand-based Target inference approach, namely MOST, which uses fingerprint similarity and explicit bioactivity of the most-similar ligands to predict targets of the query compound. Performance of MOST was evaluated by using combinations of different fingerprint schemes, machine learning methods, and bioactivity representations. In sevenfold cross-validation with a benchmark Ki dataset from CHEMBL release 19 containing 61,937 bioactivity data of 173 human targets, MOST achieved high average prediction accuracy (0.95 for pKi ≥ 5, and 0.87 for pKi ≥ 6). Morgan fingerprint was shown to be slightly better than FP2. Logistic Regression and Random Forest methods performed better than Naïve Bayes. In a temporal validation, the Ki dataset from CHEMBL19 were used to train models and predict the bioactivity of newly deposited ligands in CHEMBL20. MOST also performed well with high accuracy (0.90 for pKi ≥ 5, and 0.76 for pKi ≥ 6), when Logistic Regression and Morgan fingerprint were employed. Furthermore, the p values associated with explicit bioactivity were found be a robust index for removing false positive predictions. Implicit bioactivity did not offer this capability. Finally, p values generated with Logistic Regression, Morgan fingerprint and explicit activity were integrated with a false discovery rate (FDR) control procedure to reduce false positives in multiple-target prediction scenario, and the success of this strategy it was demonstrated with a case of fluanisone. In the case of aloe-emodin's laxative effect, MOST predicted that acetylcholinesterase was the mechanism-of-action target; in vivo studies validated this prediction. Using the MOST approach can result in highly accurate and robust target prediction. Integrated with a FDR control procedure, MOST provides a reliable framework for multiple-target inference. It has prospective applications in drug repurposing and mechanism-of-action target prediction.
Observation of ionization fronts in low density foam targets
NASA Astrophysics Data System (ADS)
Hoarty, D.; Willi, O.; Barringer, L.; Vickers, C.; Watt, R.; Nazarov, W.
1999-05-01
Ionization fronts have been observed in low density chlorinated foam targets and low density foams confined in gold tubes using time resolved K-shell absorption spectroscopy. The front was driven by an intense pulse of soft x-rays produced by high power laser irradiation. The density and temperature profiles inferred from the radiographs provided detailed measurement of the conditions. The experimental data were compared to radiation hydrodynamics simulations and reasonable agreement was obtained.
Estimating mountain basin-mean precipitation from streamflow using Bayesian inference
NASA Astrophysics Data System (ADS)
Henn, Brian; Clark, Martyn P.; Kavetski, Dmitri; Lundquist, Jessica D.
2015-10-01
Estimating basin-mean precipitation in complex terrain is difficult due to uncertainty in the topographical representativeness of precipitation gauges relative to the basin. To address this issue, we use Bayesian methodology coupled with a multimodel framework to infer basin-mean precipitation from streamflow observations, and we apply this approach to snow-dominated basins in the Sierra Nevada of California. Using streamflow observations, forcing data from lower-elevation stations, the Bayesian Total Error Analysis (BATEA) methodology and the Framework for Understanding Structural Errors (FUSE), we infer basin-mean precipitation, and compare it to basin-mean precipitation estimated using topographically informed interpolation from gauges (PRISM, the Parameter-elevation Regression on Independent Slopes Model). The BATEA-inferred spatial patterns of precipitation show agreement with PRISM in terms of the rank of basins from wet to dry but differ in absolute values. In some of the basins, these differences may reflect biases in PRISM, because some implied PRISM runoff ratios may be inconsistent with the regional climate. We also infer annual time series of basin precipitation using a two-step calibration approach. Assessment of the precision and robustness of the BATEA approach suggests that uncertainty in the BATEA-inferred precipitation is primarily related to uncertainties in hydrologic model structure. Despite these limitations, time series of inferred annual precipitation under different model and parameter assumptions are strongly correlated with one another, suggesting that this approach is capable of resolving year-to-year variability in basin-mean precipitation.
The efficacy of respondent-driven sampling for the health assessment of minority populations.
Badowski, Grazyna; Somera, Lilnabeth P; Simsiman, Brayan; Lee, Hye-Ryeon; Cassel, Kevin; Yamanaka, Alisha; Ren, JunHao
2017-10-01
Respondent driven sampling (RDS) is a relatively new network sampling technique typically employed for hard-to-reach populations. Like snowball sampling, initial respondents or "seeds" recruit additional respondents from their network of friends. Under certain assumptions, the method promises to produce a sample independent from the biases that may have been introduced by the non-random choice of "seeds." We conducted a survey on health communication in Guam's general population using the RDS method, the first survey that has utilized this methodology in Guam. It was conducted in hopes of identifying a cost-efficient non-probability sampling strategy that could generate reasonable population estimates for both minority and general populations. RDS data was collected in Guam in 2013 (n=511) and population estimates were compared with 2012 BRFSS data (n=2031) and the 2010 census data. The estimates were calculated using the unweighted RDS sample and the weighted sample using RDS inference methods and compared with known population characteristics. The sample size was reached in 23days, providing evidence that the RDS method is a viable, cost-effective data collection method, which can provide reasonable population estimates. However, the results also suggest that the RDS inference methods used to reduce bias, based on self-reported estimates of network sizes, may not always work. Caution is needed when interpreting RDS study findings. For a more diverse sample, data collection should not be conducted in just one location. Fewer questions about network estimates should be asked, and more careful consideration should be given to the kind of incentives offered to participants. Copyright © 2017. Published by Elsevier Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lemke, R. W., E-mail: rwlemke@sandia.gov; Dolan, D. H.; Dalton, D. G.
We report on a new technique for obtaining off-Hugoniot pressure vs. density data for solid metals compressed to extreme pressure by a magnetically driven liner implosion on the Z-machine (Z) at Sandia National Laboratories. In our experiments, the liner comprises inner and outer metal tubes. The inner tube is composed of a sample material (e.g., Ta and Cu) whose compressed state is to be inferred. The outer tube is composed of Al and serves as the current carrying cathode. Another aluminum liner at much larger radius serves as the anode. A shaped current pulse quasi-isentropically compresses the sample as itmore » implodes. The iterative method used to infer pressure vs. density requires two velocity measurements. Photonic Doppler velocimetry probes measure the implosion velocity of the free (inner) surface of the sample material and the explosion velocity of the anode free (outer) surface. These two velocities are used in conjunction with magnetohydrodynamic simulation and mathematical optimization to obtain the current driving the liner implosion, and to infer pressure and density in the sample through maximum compression. This new equation of state calibration technique is illustrated using a simulated experiment with a Cu sample. Monte Carlo uncertainty quantification of synthetic data establishes convergence criteria for experiments. Results are presented from experiments with Al/Ta, Al/Cu, and Al liners. Symmetric liner implosion with quasi-isentropic compression to peak pressure ∼1000 GPa is achieved in all cases. These experiments exhibit unexpectedly softer behavior above 200 GPa, which we conjecture is related to differences in the actual and modeled properties of aluminum.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lemke, R. W.; Dolan, D. H.; Dalton, D. G.
We report on a new technique for obtaining off-Hugoniot pressure vs. density data for solid metals compressed to extreme pressure by a magnetically driven liner implosion on the Z-machine (Z) at Sandia National Laboratories. In our experiments, the liner comprises inner and outer metal tubes. The inner tube is composed of a sample material (e.g., Ta and Cu) whose compressed state is to be inferred. The outer tube is composed of Al and serves as the current carrying cathode. Another aluminum liner at much larger radius serves as the anode. A shaped current pulse quasi-isentropically compresses the sample as itmore » implodes. The iterative method used to infer pressure vs. density requires two velocity measurements. Photonic Doppler velocimetry probes measure the implosion velocity of the free (inner) surface of the sample material and the explosion velocity of the anode free (outer) surface. These two velocities are used in conjunction with magnetohydrodynamic simulation and mathematical optimization to obtain the current driving the liner implosion, and to infer pressure and density in the sample through maximum compression. This new equation of state calibration technique is illustrated using a simulated experiment with a Cu sample. Monte Carlo uncertainty quantification of synthetic data establishes convergence criteria for experiments. Results are presented from experiments with Al/Ta, Al/Cu, and Al liners. Symmetric liner implosion with quasi-isentropic compression to peak pressure ~1000 GPa is achieved in all cases. Lastly, these experiments exhibit unexpectedly softer behavior above 200 GPa, which we conjecture is related to differences in the actual and modeled properties of aluminum.« less
Lemke, R. W.; Dolan, D. H.; Dalton, D. G.; ...
2016-01-07
We report on a new technique for obtaining off-Hugoniot pressure vs. density data for solid metals compressed to extreme pressure by a magnetically driven liner implosion on the Z-machine (Z) at Sandia National Laboratories. In our experiments, the liner comprises inner and outer metal tubes. The inner tube is composed of a sample material (e.g., Ta and Cu) whose compressed state is to be inferred. The outer tube is composed of Al and serves as the current carrying cathode. Another aluminum liner at much larger radius serves as the anode. A shaped current pulse quasi-isentropically compresses the sample as itmore » implodes. The iterative method used to infer pressure vs. density requires two velocity measurements. Photonic Doppler velocimetry probes measure the implosion velocity of the free (inner) surface of the sample material and the explosion velocity of the anode free (outer) surface. These two velocities are used in conjunction with magnetohydrodynamic simulation and mathematical optimization to obtain the current driving the liner implosion, and to infer pressure and density in the sample through maximum compression. This new equation of state calibration technique is illustrated using a simulated experiment with a Cu sample. Monte Carlo uncertainty quantification of synthetic data establishes convergence criteria for experiments. Results are presented from experiments with Al/Ta, Al/Cu, and Al liners. Symmetric liner implosion with quasi-isentropic compression to peak pressure ~1000 GPa is achieved in all cases. Lastly, these experiments exhibit unexpectedly softer behavior above 200 GPa, which we conjecture is related to differences in the actual and modeled properties of aluminum.« less
Northern Russian chironomid-based modern summer temperature data set and inference models
NASA Astrophysics Data System (ADS)
Nazarova, Larisa; Self, Angela E.; Brooks, Stephen J.; van Hardenbroek, Maarten; Herzschuh, Ulrike; Diekmann, Bernhard
2015-11-01
West and East Siberian data sets and 55 new sites were merged based on the high taxonomic similarity, and the strong relationship between mean July air temperature and the distribution of chironomid taxa in both data sets compared with other environmental parameters. Multivariate statistical analysis of chironomid and environmental data from the combined data set consisting of 268 lakes, located in northern Russia, suggests that mean July air temperature explains the greatest amount of variance in chironomid distribution compared with other measured variables (latitude, longitude, altitude, water depth, lake surface area, pH, conductivity, mean January air temperature, mean July air temperature, and continentality). We established two robust inference models to reconstruct mean summer air temperatures from subfossil chironomids based on ecological and geographical approaches. The North Russian 2-component WA-PLS model (RMSEPJack = 1.35 °C, rJack2 = 0.87) can be recommended for application in palaeoclimatic studies in northern Russia. Based on distinctive chironomid fauna and climatic regimes of Kamchatka the Far East 2-component WAPLS model (RMSEPJack = 1.3 °C, rJack2 = 0.81) has potentially better applicability in Kamchatka.
NASA Astrophysics Data System (ADS)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan
2016-12-01
The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore themore » robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.« less
Testing hypotheses on distribution shifts and changes in phenology of imperfectly detectable species
Chambert, Thierry A.; Kendall, William L.; Hines, James E.; Nichols, James D.; Pedrini, Paolo; Waddle, J. Hardin; Tavecchia, Giacomo; Walls, Susan C.; Tenan, Simone
2015-01-01
With ongoing climate change, many species are expected to shift their spatial and temporal distributions. To document changes in species distribution and phenology, detection/non-detection data have proven very useful. Occupancy models provide a robust way to analyse such data, but inference is usually focused on species spatial distribution, not phenology.We present a multi-season extension of the staggered-entry occupancy model of Kendall et al. (2013, Ecology, 94, 610), which permits inference about the within-season patterns of species arrival and departure at sampling sites. The new model presented here allows investigation of species phenology and spatial distribution across years, as well as site extinction/colonization dynamics.We illustrate the model with two data sets on European migratory passerines and one data set on North American treefrogs. We show how to derive several additional phenological parameters, such as annual mean arrival and departure dates, from estimated arrival and departure probabilities.Given the extent of detection/non-detection data that are available, we believe that this modelling approach will prove very useful to further understand and predict species responses to climate change.
The development of variable MLM editor and TSQL translator based on Arden Syntax in Taiwan.
Liang, Yan Ching; Chang, Polun
2003-01-01
The Arden Syntax standard has been utilized in the medical informatics community in several countries during the past decade. It is never used in nursing in Taiwan. We try to develop a system that acquire medical expert knowledge in Chinese and translates data and logic slot into TSQL Language. The system implements TSQL translator interpreting database queries referred to in the knowledge modules. The decision-support systems in medicine are data driven system where TSQL triggers as inference engine can be used to facilitate linking to a database.
Alamaniotis, Miltiadis; Agarwal, Vivek
2014-04-01
Anticipatory control systems are a class of systems whose decisions are based on predictions for the future state of the system under monitoring. Anticipation denotes intelligence and is an inherent property of humans that make decisions by projecting in future. Likewise, artificially intelligent systems equipped with predictive functions may be utilized for anticipating future states of complex systems, and therefore facilitate automated control decisions. Anticipatory control of complex energy systems is paramount to their normal and safe operation. In this paper a new intelligent methodology integrating fuzzy inference with support vector regression is introduced. Our proposed methodology implements an anticipatorymore » system aiming at controlling energy systems in a robust way. Initially a set of support vector regressors is adopted for making predictions over critical system parameters. Furthermore, the predicted values are fed into a two stage fuzzy inference system that makes decisions regarding the state of the energy system. The inference system integrates the individual predictions into a single one at its first stage, and outputs a decision together with a certainty factor computed at its second stage. The certainty factor is an index of the significance of the decision. The proposed anticipatory control system is tested on a real world set of data obtained from a complex energy system, describing the degradation of a turbine. Results exhibit the robustness of the proposed system in controlling complex energy systems.« less
Wang, Junbai; Wu, Qianqian; Hu, Xiaohua Tony; Tian, Tianhai
2016-11-01
Investigating the dynamics of genetic regulatory networks through high throughput experimental data, such as microarray gene expression profiles, is a very important but challenging task. One of the major hindrances in building detailed mathematical models for genetic regulation is the large number of unknown model parameters. To tackle this challenge, a new integrated method is proposed by combining a top-down approach and a bottom-up approach. First, the top-down approach uses probabilistic graphical models to predict the network structure of DNA repair pathway that is regulated by the p53 protein. Two networks are predicted, namely a network of eight genes with eight inferred interactions and an extended network of 21 genes with 17 interactions. Then, the bottom-up approach using differential equation models is developed to study the detailed genetic regulations based on either a fully connected regulatory network or a gene network obtained by the top-down approach. Model simulation error, parameter identifiability and robustness property are used as criteria to select the optimal network. Simulation results together with permutation tests of input gene network structures indicate that the prediction accuracy and robustness property of the two predicted networks using the top-down approach are better than those of the corresponding fully connected networks. In particular, the proposed approach reduces computational cost significantly for inferring model parameters. Overall, the new integrated method is a promising approach for investigating the dynamics of genetic regulation. Copyright © 2016 Elsevier Inc. All rights reserved.
Radar attenuation and temperature within the Greenland Ice Sheet
MacGregor, Joseph A; Li, Jilu; Paden, John D; Catania, Ginny A; Clow, Gary D.; Fahnestock, Mark A; Gogineni, Prasad S.; Grimm, Robert E.; Morlighem, Mathieu; Nandi, Soumyaroop; Seroussi, Helene; Stillman, David E
2015-01-01
The flow of ice is temperature-dependent, but direct measurements of englacial temperature are sparse. The dielectric attenuation of radio waves through ice is also temperature-dependent, and radar sounding of ice sheets is sensitive to this attenuation. Here we estimate depth-averaged radar-attenuation rates within the Greenland Ice Sheet from airborne radar-sounding data and its associated radiostratigraphy. Using existing empirical relationships between temperature, chemistry, and radar attenuation, we then infer the depth-averaged englacial temperature. The dated radiostratigraphy permits a correction for the confounding effect of spatially varying ice chemistry. Where radar transects intersect boreholes, radar-inferred temperature is consistently higher than that measured directly. We attribute this discrepancy to the poorly recognized frequency dependence of the radar-attenuation rate and correct for this effect empirically, resulting in a robust relationship between radar-inferred and borehole-measured depth-averaged temperature. Radar-inferred englacial temperature is often lower than modern surface temperature and that of a steady state ice-sheet model, particularly in southern Greenland. This pattern suggests that past changes in surface boundary conditions (temperature and accumulation rate) affect the ice sheet's present temperature structure over a much larger area than previously recognized. This radar-inferred temperature structure provides a new constraint for thermomechanical models of the Greenland Ice Sheet.
Robust inference in discrete hazard models for randomized clinical trials.
Nguyen, Vinh Q; Gillen, Daniel L
2012-10-01
Time-to-event data in which failures are only assessed at discrete time points are common in many clinical trials. Examples include oncology studies where events are observed through periodic screenings such as radiographic scans. When the survival endpoint is acknowledged to be discrete, common methods for the analysis of observed failure times include the discrete hazard models (e.g., the discrete-time proportional hazards and the continuation ratio model) and the proportional odds model. In this manuscript, we consider estimation of a marginal treatment effect in discrete hazard models where the constant treatment effect assumption is violated. We demonstrate that the estimator resulting from these discrete hazard models is consistent for a parameter that depends on the underlying censoring distribution. An estimator that removes the dependence on the censoring mechanism is proposed and its asymptotic distribution is derived. Basing inference on the proposed estimator allows for statistical inference that is scientifically meaningful and reproducible. Simulation is used to assess the performance of the presented methodology in finite samples.
Nonparametric methods for doubly robust estimation of continuous treatment effects.
Kennedy, Edward H; Ma, Zongming; McHugh, Matthew D; Small, Dylan S
2017-09-01
Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.
Fan, Jean; Lee, Hae-Ock; Lee, Soohyun; Ryu, Da-Eun; Lee, Semin; Xue, Catherine; Kim, Seok Jin; Kim, Kihyun; Barkas, Nikolas; Park, Peter J; Park, Woong-Yang; Kharchenko, Peter V
2018-06-13
Characterization of intratumoral heterogeneity is critical to cancer therapy, as presence of phenotypically diverse cell populations commonly fuels relapse and resistance to treatment. Although genetic variation is a well-studied source of intratumoral heterogeneity, the functional impact of most genetic alterations remains unclear. Even less understood is the relative importance of other factors influencing heterogeneity, such as epigenetic state or tumor microenvironment. To investigate the relationship between genetic and transcriptional heterogeneity in a context of cancer progression, we devised a computational approach called HoneyBADGER to identify copy number variation and loss-of-heterozygosity in individual cells from single-cell RNA-sequencing data. By integrating allele and normalized expression information, HoneyBADGER is able to identify and infer the presence of subclone-specific alterations in individual cells and reconstruct underlying subclonal architecture. Examining several tumor types, we show that HoneyBADGER is effective at identifying deletion, amplifications, and copy-neutral loss-of-heterozygosity events, and is capable of robustly identifying subclonal focal alterations as small as 10 megabases. We further apply HoneyBADGER to analyze single cells from a progressive multiple myeloma patient to identify major genetic subclones that exhibit distinct transcriptional signatures relevant to cancer progression. Surprisingly, other prominent transcriptional subpopulations within these tumors did not line up with the genetic subclonal structure, and were likely driven by alternative, non-clonal mechanisms. These results highlight the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer. Published by Cold Spring Harbor Laboratory Press.
Plis, Sergey M; Sui, Jing; Lane, Terran; Roy, Sushmita; Clark, Vincent P; Potluru, Vamsi K; Huster, Rene J; Michael, Andrew; Sponheim, Scott R; Weisend, Michael P; Calhoun, Vince D
2013-01-01
Identifying the complex activity relationships present in rich, modern neuroimaging data sets remains a key challenge for neuroscience. The problem is hard because (a) the underlying spatial and temporal networks may be nonlinear and multivariate and (b) the observed data may be driven by numerous latent factors. Further, modern experiments often produce data sets containing multiple stimulus contexts or tasks processed by the same subjects. Fusing such multi-session data sets may reveal additional structure, but raises further statistical challenges. We present a novel analysis method for extracting complex activity networks from such multifaceted imaging data sets. Compared to previous methods, we choose a new point in the trade-off space, sacrificing detailed generative probability models and explicit latent variable inference in order to achieve robust estimation of multivariate, nonlinear group factors (“network clusters”). We apply our method to identify relationships of task-specific intrinsic networks in schizophrenia patients and control subjects from a large fMRI study. After identifying network-clusters characterized by within- and between-task interactions, we find significant differences between patient and control groups in interaction strength among networks. Our results are consistent with known findings of brain regions exhibiting deviations in schizophrenic patients. However, we also find high-order, nonlinear interactions that discriminate groups but that are not detected by linear, pair-wise methods. We additionally identify high-order relationships that provide new insights into schizophrenia but that have not been found by traditional univariate or second-order methods. Overall, our approach can identify key relationships that are missed by existing analysis methods, without losing the ability to find relationships that are known to be important. PMID:23876245
Approximation of epidemic models by diffusion processes and their statistical inference.
Guy, Romain; Larédo, Catherine; Vergu, Elisabeta
2015-02-01
Multidimensional continuous-time Markov jump processes [Formula: see text] on [Formula: see text] form a usual set-up for modeling [Formula: see text]-like epidemics. However, when facing incomplete epidemic data, inference based on [Formula: see text] is not easy to be achieved. Here, we start building a new framework for the estimation of key parameters of epidemic models based on statistics of diffusion processes approximating [Formula: see text]. First, previous results on the approximation of density-dependent [Formula: see text]-like models by diffusion processes with small diffusion coefficient [Formula: see text], where [Formula: see text] is the population size, are generalized to non-autonomous systems. Second, our previous inference results on discretely observed diffusion processes with small diffusion coefficient are extended to time-dependent diffusions. Consistent and asymptotically Gaussian estimates are obtained for a fixed number [Formula: see text] of observations, which corresponds to the epidemic context, and for [Formula: see text]. A correction term, which yields better estimates non asymptotically, is also included. Finally, performances and robustness of our estimators with respect to various parameters such as [Formula: see text] (the basic reproduction number), [Formula: see text], [Formula: see text] are investigated on simulations. Two models, [Formula: see text] and [Formula: see text], corresponding to single and recurrent outbreaks, respectively, are used to simulate data. The findings indicate that our estimators have good asymptotic properties and behave noticeably well for realistic numbers of observations and population sizes. This study lays the foundations of a generic inference method currently under extension to incompletely observed epidemic data. Indeed, contrary to the majority of current inference techniques for partially observed processes, which necessitates computer intensive simulations, our method being mostly an analytical approach requires only the classical optimization steps.
What can we learn from fitness landscapes?
Hartl, Daniel L
2014-10-01
A combinatorially complete data set consists of studies of all possible combinations of a set of mutant sites in a gene or mutant alleles in a genome. Among the most robust conclusions from these studies is that epistasis between beneficial mutations often shows a pattern of diminishing returns, in which favorable mutations are less fit when combined than would be expected. Another robust inference is that the number of adaptive evolutionary paths is often limited to a relatively small fraction of the theoretical possibilities, owing largely to sign epistasis requiring evolutionary steps that would entail a decrease in fitness. Here we summarize these and other results while also examining issues that remain unresolved and future directions that seem promising. Copyright © 2014 Elsevier Ltd. All rights reserved.
Influence function for robust phylogenetic reconstructions.
Bar-Hen, Avner; Mariadassou, Mahendra; Poursat, Marie-Anne; Vandenkoornhuyse, Philippe
2008-05-01
Based on the computation of the influence function, a tool to measure the impact of each piece of sampled data on the statistical inference of a parameter, we propose to analyze the support of the maximum-likelihood (ML) tree for each site. We provide a new tool for filtering data sets (nucleotides, amino acids, and others) in the context of ML phylogenetic reconstructions. Because different sites support different phylogenic topologies in different ways, outlier sites, that is, sites with a very negative influence value, are important: they can drastically change the topology resulting from the statistical inference. Therefore, these outlier sites must be clearly identified and their effects accounted for before drawing biological conclusions from the inferred tree. A matrix containing 158 fungal terminals all belonging to Chytridiomycota, Zygomycota, and Glomeromycota is analyzed. We show that removing the strongest outlier from the analysis strikingly modifies the ML topology, with a loss of as many as 20% of the internal nodes. As a result, estimating the topology on the filtered data set results in a topology with enhanced bootstrap support. From this analysis, the polyphyletic status of the fungal phyla Chytridiomycota and Zygomycota is reinforced, suggesting the necessity of revisiting the systematics of these fungal groups. We show the ability of influence function to produce new evolution hypotheses.
Materials discovery guided by data-driven insights
NASA Astrophysics Data System (ADS)
Klintenberg, Mattias
As the computational power continues to grow systematic computational exploration has become an important tool for materials discovery. In this presentation the Electronic Structure Project (ESP/ELSA) will be discussed and a number of examples presented that show some of the capabilities of a data-driven methodology for guiding materials discovery. These examples include topological insulators, detector materials and 2D materials. ESP/ELSA is an initiative that dates back to 2001 and today contain many tens of thousands of materials that have been investigated using a robust and high accuracy electronic structure method (all-electron FP-LMTO) thus providing basic materials first-principles data for most inorganic compounds that have been structurally characterized. The web-site containing the ESP/ELSA data has as of today been accessed from more than 4,000 unique computers from all around the world.
Life is a Self-Organizing Machine Driven by the Informational Cycle of Brillouin
NASA Astrophysics Data System (ADS)
Michel, Denis
2013-04-01
Acquiring information is indisputably energy-consuming and conversely, the availability of information permits greater efficiency. Strangely, the scientific community long remained reluctant to establish a physical equivalence between the abstract notion of information and sensible thermodynamics. However, certain physicists such as Szilard and Brillouin proposed: (i) to give to information the status of a genuine thermodynamic entity ( k B T ln2 joules/bit) and (ii) to link the capacity of storing information inferred from correlated systems, to that of indefinitely increasing organization. This positive feedback coupled to the self-templating molecular potential could provide a universal basis for the spontaneous rise of highly organized structures, typified by the emergence of life from a prebiotic chemical soup. Once established, this mechanism ensures the longevity and robustness of life envisioned as a general system, by allowing it to accumulate and optimize microstate-reducing recipes, thereby giving rise to strong nonlinearity, decisional capacity and multistability. Mechanisms possibly involved in priming this cycle are proposed.
The dependence of cosmic ray-driven galactic winds on halo mass
NASA Astrophysics Data System (ADS)
Jacob, Svenja; Pakmor, Rüdiger; Simpson, Christine M.; Springel, Volker; Pfrommer, Christoph
2018-03-01
Galactic winds regulate star formation in disc galaxies and help to enrich the circum-galactic medium. They are therefore crucial for galaxy formation, but their driving mechanism is still poorly understood. Recent studies have demonstrated that cosmic rays (CRs) can drive outflows if active CR transport is taken into account. Using hydrodynamical simulations of isolated galaxies with virial masses between 1010 and 1013 M⊙, we study how the properties of CR-driven winds depend on halo mass. CRs are treated in a two-fluid approximation and their transport is modelled through isotropic or anisotropic diffusion. We find that CRs are only able to drive mass-loaded winds beyond the virial radius in haloes with masses below 1012 M⊙. For our lowest examined halo mass, the wind is roughly spherical and has velocities of ˜20 km s-1. With increasing halo mass, the wind becomes biconical and can reach 10 times higher velocities. The mass loading factor drops rapidly with virial mass, a dependence that approximately follows a power law with a slope between -1 and -2. This scaling is slightly steeper than observational inferences, and also steeper than commonly used prescriptions for wind feedback in cosmological simulations. The slope is quite robust to variations of the CR injection efficiency or the CR diffusion coefficient. In contrast to the mass loading, the energy loading shows no significant dependence on halo mass. While these scalings are close to successful heuristic models of wind feedback, the CR-driven winds in our present models are not yet powerful enough to fully account for the required feedback strength.
A Predictive Approach to Network Reverse-Engineering
NASA Astrophysics Data System (ADS)
Wiggins, Chris
2005-03-01
A central challenge of systems biology is the ``reverse engineering" of transcriptional networks: inferring which genes exert regulatory control over which other genes. Attempting such inference at the genomic scale has only recently become feasible, via data-intensive biological innovations such as DNA microrrays (``DNA chips") and the sequencing of whole genomes. In this talk we present a predictive approach to network reverse-engineering, in which we integrate DNA chip data and sequence data to build a model of the transcriptional network of the yeast S. cerevisiae capable of predicting the response of genes in unseen experiments. The technique can also be used to extract ``motifs,'' sequence elements which act as binding sites for regulatory proteins. We validate by a number of approaches and present comparison of theoretical prediction vs. experimental data, along with biological interpretations of the resulting model. En route, we will illustrate some basic notions in statistical learning theory (fitting vs. over-fitting; cross- validation; assessing statistical significance), highlighting ways in which physicists can make a unique contribution in data- driven approaches to reverse engineering.
Inferring multi-scale neural mechanisms with brain network modelling
Schirner, Michael; McIntosh, Anthony Randal; Jirsa, Viktor; Deco, Gustavo
2018-01-01
The neurophysiological processes underlying non-invasive brain activity measurements are incompletely understood. Here, we developed a connectome-based brain network model that integrates individual structural and functional data with neural population dynamics to support multi-scale neurophysiological inference. Simulated populations were linked by structural connectivity and, as a novelty, driven by electroencephalography (EEG) source activity. Simulations not only predicted subjects' individual resting-state functional magnetic resonance imaging (fMRI) time series and spatial network topologies over 20 minutes of activity, but more importantly, they also revealed precise neurophysiological mechanisms that underlie and link six empirical observations from different scales and modalities: (1) resting-state fMRI oscillations, (2) functional connectivity networks, (3) excitation-inhibition balance, (4, 5) inverse relationships between α-rhythms, spike-firing and fMRI on short and long time scales, and (6) fMRI power-law scaling. These findings underscore the potential of this new modelling framework for general inference and integration of neurophysiological knowledge to complement empirical studies. PMID:29308767
Application of Bounded Linear Stability Analysis Method for Metrics-Driven Adaptive Control
NASA Technical Reports Server (NTRS)
Bakhtiari-Nejad, Maryam; Nguyen, Nhan T.; Krishnakumar, Kalmanje
2009-01-01
This paper presents the application of Bounded Linear Stability Analysis (BLSA) method for metrics-driven adaptive control. The bounded linear stability analysis method is used for analyzing stability of adaptive control models, without linearizing the adaptive laws. Metrics-driven adaptive control introduces a notion that adaptation should be driven by some stability metrics to achieve robustness. By the application of bounded linear stability analysis method the adaptive gain is adjusted during the adaptation in order to meet certain phase margin requirements. Analysis of metrics-driven adaptive control is evaluated for a second order system that represents a pitch attitude control of a generic transport aircraft. The analysis shows that the system with the metrics-conforming variable adaptive gain becomes more robust to unmodeled dynamics or time delay. The effect of analysis time-window for BLSA is also evaluated in order to meet the stability margin criteria.
Pisharady, Pramod Kumar; Duarte-Carvajalino, Julio M; Sotiropoulos, Stamatios N; Sapiro, Guillermo; Lenglet, Christophe
2017-01-01
The RubiX [1] algorithm combines high SNR characteristics of low resolution data with high spacial specificity of high resolution data, to extract microstructural tissue parameters from diffusion MRI. In this paper we focus on estimating crossing fiber orientations and introduce sparsity to the RubiX algorithm, making it suitable for reconstruction from compressed (under-sampled) data. We propose a sparse Bayesian algorithm for estimation of fiber orientations and volume fractions from compressed diffusion MRI. The data at high resolution is modeled using a parametric spherical deconvolution approach and represented using a dictionary created with the exponential decay components along different possible directions. Volume fractions of fibers along these orientations define the dictionary weights. The data at low resolution is modeled using a spatial partial volume representation. The proposed dictionary representation and sparsity priors consider the dependence between fiber orientations and the spatial redundancy in data representation. Our method exploits the sparsity of fiber orientations, therefore facilitating inference from under-sampled data. Experimental results show improved accuracy and decreased uncertainty in fiber orientation estimates. For under-sampled data, the proposed method is also shown to produce more robust estimates of fiber orientations. PMID:28845484
Pisharady, Pramod Kumar; Duarte-Carvajalino, Julio M; Sotiropoulos, Stamatios N; Sapiro, Guillermo; Lenglet, Christophe
2015-10-01
The RubiX [1] algorithm combines high SNR characteristics of low resolution data with high spacial specificity of high resolution data, to extract microstructural tissue parameters from diffusion MRI. In this paper we focus on estimating crossing fiber orientations and introduce sparsity to the RubiX algorithm, making it suitable for reconstruction from compressed (under-sampled) data. We propose a sparse Bayesian algorithm for estimation of fiber orientations and volume fractions from compressed diffusion MRI. The data at high resolution is modeled using a parametric spherical deconvolution approach and represented using a dictionary created with the exponential decay components along different possible directions. Volume fractions of fibers along these orientations define the dictionary weights. The data at low resolution is modeled using a spatial partial volume representation. The proposed dictionary representation and sparsity priors consider the dependence between fiber orientations and the spatial redundancy in data representation. Our method exploits the sparsity of fiber orientations, therefore facilitating inference from under-sampled data. Experimental results show improved accuracy and decreased uncertainty in fiber orientation estimates. For under-sampled data, the proposed method is also shown to produce more robust estimates of fiber orientations.
Design of experiments and data analysis challenges in calibration for forensics applications
Anderson-Cook, Christine M.; Burr, Thomas L.; Hamada, Michael S.; ...
2015-07-15
Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statistical design of experiments and data analysis challenges arising in nuclear forensics. More specifically, we focus on inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as temperature, nitric acid concentration, and Pu concentration, using measured features of the product Pu oxide powder. The measured features, Y, include trace chemical concentrations and particle morphology such as particle size and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X that were usedmore » to create nuclear materials having particular characteristics, Y, is an inverse problem. Therefore, statistical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can fit a model (or models) such as Υ = f(Χ) + error, for each of the responses, based on a calibration experiment and then “invert” to solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental data to consider aspects of data collection and experiment design for the calibration data to maximize the quality of the predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective in the inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluate the quality of the result and its robustness to a variety of initial assumptions, and different correlation structures between the responses. In addition, we also briefly review recent advances in metrology issues related to characterizing particle morphology measurements used in the response vector, Y.« less
Design of experiments and data analysis challenges in calibration for forensics applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson-Cook, Christine M.; Burr, Thomas L.; Hamada, Michael S.
Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statistical design of experiments and data analysis challenges arising in nuclear forensics. More specifically, we focus on inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as temperature, nitric acid concentration, and Pu concentration, using measured features of the product Pu oxide powder. The measured features, Y, include trace chemical concentrations and particle morphology such as particle size and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X that were usedmore » to create nuclear materials having particular characteristics, Y, is an inverse problem. Therefore, statistical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can fit a model (or models) such as Υ = f(Χ) + error, for each of the responses, based on a calibration experiment and then “invert” to solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental data to consider aspects of data collection and experiment design for the calibration data to maximize the quality of the predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective in the inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluate the quality of the result and its robustness to a variety of initial assumptions, and different correlation structures between the responses. In addition, we also briefly review recent advances in metrology issues related to characterizing particle morphology measurements used in the response vector, Y.« less
Data-driven outbreak forecasting with a simple nonlinear growth model.
Lega, Joceline; Brown, Heidi E
2016-12-01
Recent events have thrown the spotlight on infectious disease outbreak response. We developed a data-driven method, EpiGro, which can be applied to cumulative case reports to estimate the order of magnitude of the duration, peak and ultimate size of an ongoing outbreak. It is based on a surprisingly simple mathematical property of many epidemiological data sets, does not require knowledge or estimation of disease transmission parameters, is robust to noise and to small data sets, and runs quickly due to its mathematical simplicity. Using data from historic and ongoing epidemics, we present the model. We also provide modeling considerations that justify this approach and discuss its limitations. In the absence of other information or in conjunction with other models, EpiGro may be useful to public health responders. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Bayesian Inference on the Radio-quietness of Gamma-ray Pulsars
NASA Astrophysics Data System (ADS)
Yu, Hoi-Fung; Hui, Chung Yue; Kong, Albert K. H.; Takata, Jumpei
2018-04-01
For the first time we demonstrate using a robust Bayesian approach to analyze the populations of radio-quiet (RQ) and radio-loud (RL) gamma-ray pulsars. We quantify their differences and obtain their distributions of the radio-cone opening half-angle δ and the magnetic inclination angle α by Bayesian inference. In contrast to the conventional frequentist point estimations that might be non-representative when the distribution is highly skewed or multi-modal, which is often the case when data points are scarce, Bayesian statistics displays the complete posterior distribution that the uncertainties can be readily obtained regardless of the skewness and modality. We found that the spin period, the magnetic field strength at the light cylinder, the spin-down power, the gamma-ray-to-X-ray flux ratio, and the spectral curvature significance of the two groups of pulsars exhibit significant differences at the 99% level. Using Bayesian inference, we are able to infer the values and uncertainties of δ and α from the distribution of RQ and RL pulsars. We found that δ is between 10° and 35° and the distribution of α is skewed toward large values.
EEG-based functional networks evoked by acupuncture at ST 36: A data-driven thresholding study
NASA Astrophysics Data System (ADS)
Li, Huiyan; Wang, Jiang; Yi, Guosheng; Deng, Bin; Zhou, Hexi
2017-10-01
This paper investigates how acupuncture at ST 36 modulates the brain functional network. 20 channel EEG signals from 15 healthy subjects are respectively recorded before, during and after acupuncture. The correlation between two EEG channels is calculated by using Pearson’s coefficient. A data-driven approach is applied to determine the threshold, which is performed by considering the connected set, connected edge and network connectivity. Based on such thresholding approach, the functional network in each acupuncture period is built with graph theory, and the associated functional connectivity is determined. We show that acupuncturing at ST 36 increases the connectivity of the EEG-based functional network, especially for the long distance ones between two hemispheres. The properties of the functional network in five EEG sub-bands are also characterized. It is found that the delta and gamma bands are affected more obviously by acupuncture than the other sub-bands. These findings highlight the modulatory effects of acupuncture on the EEG-based functional connectivity, which is helpful for us to understand how it participates in the cortical or subcortical activities. Further, the data-driven threshold provides an alternative approach to infer the functional connectivity under other physiological conditions.
NASA Astrophysics Data System (ADS)
González, D. L., II; Angus, M. P.; Tetteh, I. K.; Bello, G. A.; Padmanabhan, K.; Pendse, S. V.; Srinivas, S.; Yu, J.; Semazzi, F.; Kumar, V.; Samatova, N. F.
2014-04-01
Decades of hypothesis-driven and/or first-principles research have been applied towards the discovery and explanation of the mechanisms that drive climate phenomena, such as western African Sahel summer rainfall variability. Although connections between various climate factors have been theorized, not all of the key relationships are fully understood. We propose a data-driven approach to identify candidate players in this climate system, which can help explain underlying mechanisms and/or even suggest new relationships, to facilitate building a more comprehensive and predictive model of the modulatory relationships influencing a climate phenomenon of interest. We applied coupled heterogeneous association rule mining (CHARM), Lasso multivariate regression, and Dynamic Bayesian networks to find relationships within a complex system, and explored means with which to obtain a consensus result from the application of such varied methodologies. Using this fusion of approaches, we identified relationships among climate factors that modulate Sahel rainfall, including well-known associations from prior climate knowledge, as well as promising discoveries that invite further research by the climate science community.
PREMER: a Tool to Infer Biological Networks.
Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R
2017-10-04
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).
Robustly detecting differential expression in RNA sequencing data using observation weights
Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.
2014-01-01
A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412
Robust geostatistical analysis of spatial data
NASA Astrophysics Data System (ADS)
Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.
2013-04-01
Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.
Ruane, Sara; Raxworthy, Christopher J; Lemmon, Alan R; Lemmon, Emily Moriarty; Burbrink, Frank T
2015-10-12
Using molecular data generated by high throughput next generation sequencing (NGS) platforms to infer phylogeny is becoming common as costs go down and the ability to capture loci from across the genome goes up. While there is a general consensus that greater numbers of independent loci should result in more robust phylogenetic estimates, few studies have compared phylogenies resulting from smaller datasets for commonly used genetic markers with the large datasets captured using NGS. Here, we determine how a 5-locus Sanger dataset compares with a 377-locus anchored genomics dataset for understanding the evolutionary history of the pseudoxyrhophiine snake radiation centered in Madagascar. The Pseudoxyrhophiinae comprise ~86 % of Madagascar's serpent diversity, yet they are poorly known with respect to ecology, behavior, and systematics. Using the 377-locus NGS dataset and the summary statistics species-tree methods STAR and MP-EST, we estimated a well-supported species tree that provides new insights concerning intergeneric relationships for the pseudoxyrhophiines. We also compared how these and other methods performed with respect to estimating tree topology using datasets with varying numbers of loci. Using Sanger sequencing and an anchored phylogenomics approach, we sequenced datasets comprised of 5 and 377 loci, respectively, for 23 pseudoxyrhophiine taxa. For each dataset, we estimated phylogenies using both gene-tree (concatenation) and species-tree (STAR, MP-EST) approaches. We determined the similarity of resulting tree topologies from the different datasets using Robinson-Foulds distances. In addition, we examined how subsets of these data performed compared to the complete Sanger and anchored datasets for phylogenetic accuracy using the same tree inference methodologies, as well as the program *BEAST to determine if a full coalescent model for species tree estimation could generate robust results with fewer loci compared to the summary statistics species tree approaches. We also examined the individual gene trees in comparison to the 377-locus species tree using the program MetaTree. Using the full anchored dataset under a variety of methods gave us the same, well-supported phylogeny for pseudoxyrhophiines. The African pseudoxyrhophiine Duberria is the sister taxon to the Malagasy pseudoxyrhophiines genera, providing evidence for a monophyletic radiation in Madagascar. In addition, within Madagascar, the two major clades inferred correspond largely to the aglyphous and opisthoglyphous genera, suggesting that feeding specializations associated with tooth venom delivery may have played a major role in the early diversification of this radiation. The comparison of tree topologies from the concatenated and species-tree methods using different datasets indicated the 5-locus dataset cannot beused to infer a correct phylogeny for the pseudoxyrhophiines under any method tested here and that summary statistics methods require 50 or more loci to consistently recover the species-tree inferred using the complete anchored dataset. However, as few as 15 loci may infer the correct topology when using the full coalescent species tree method *BEAST. MetaTree analyses of each gene tree from the Sanger and anchored datasets found that none of the individual gene trees matched the 377-locus species tree, and that no gene trees were identical with respect to topology. Our results suggest that ≥50 loci may be necessary to confidently infer phylogenies when using summaryspecies-tree methods, but that the coalescent-based method *BEAST consistently recovers the same topology using only 15 loci. These results reinforce that datasets with small numbers of markers may result in misleading topologies, and further, that the method of inference used to generate a phylogeny also has a major influence on the number of loci necessary to infer robust species trees.
Observation of Transonic Ionization Fronts in Low-Density Foam Targets
NASA Astrophysics Data System (ADS)
Hoarty, D.; Barringer, L.; Vickers, C.; Willi, O.; Nazarov, W.
1999-04-01
Transonic ionization fronts have been observed in low-density chlorinated foam targets using time-resolved K-shell absorption spectroscopy. The front was driven by an intense pulse of soft x rays produced by high-power laser irradiation of a thin foil. The density and temperature profiles inferred from the radiographs provided detailed measurement of the conditions at a number of times. The experimental data were compared to radiation hydrodynamics simulations and reasonable agreement was obtained.
Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S.; Theis, Fabian J.
2015-01-01
MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method “miRlastic”, which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of miRNAs that were predicted to mediate HPV-associated dysregulation in HNSCC. Our novel approach was able to characterize distinct pathway regulations from matched miRNA and mRNA data. An R package of miRlastic was made available through: http://icb.helmholtz-muenchen.de/mirlastic. PMID:26694379
Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S; Theis, Fabian J
2015-12-18
MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method "miRlastic", which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of miRNAs that were predicted to mediate HPV-associated dysregulation in HNSCC. Our novel approach was able to characterize distinct pathway regulations from matched miRNA and mRNA data. An R package of miRlastic was made available through: http://icb.helmholtz-muenchen.de/mirlastic.
Diagnostic quality driven physiological data collection for personal healthcare.
Jea, David; Balani, Rahul; Hsu, Ju-Lan; Cho, Dae-Ki; Gerla, Mario; Srivastava, Mani B
2008-01-01
We believe that each individual is unique, and that it is necessary for diagnosis purpose to have a distinctive combination of signals and data features that fits the personal health status. It is essential to develop mechanisms for reducing the amount of data that needs to be transferred (to mitigate the troublesome periodically recharging of a device) while maintaining diagnostic accuracy. Thus, the system should not uniformly compress the collected physiological data, but compress data in a personalized fashion that preserves the 'important' signal features for each individual such that it is enough to make the diagnosis with a required high confidence level. We present a diagnostic quality driven mechanism for remote ECG monitoring, which enables a notation of priorities encoded into the wave segments. The priority is specified by the diagnosis engine or medical experts and is dynamic and individual dependent. The system pre-processes the collected physiological information according to the assigned priority before delivering to the backend server. We demonstrate that the proposed approach provides accurate inference results while effectively compressing the data.
MULTINEST: an efficient and robust Bayesian inference tool for cosmology and particle physics
NASA Astrophysics Data System (ADS)
Feroz, F.; Hobson, M. P.; Bridges, M.
2009-10-01
We present further development and the first public release of our multimodal nested sampling algorithm, called MULTINEST. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson, which itself significantly outperformed existing Markov chain Monte Carlo techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MULTINEST algorithm are demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla Λ cold dark matter model to include spatial curvature and a varying equation of state for dark energy. The MULTINEST software, which is fully parallelized using MPI and includes an interface to COSMOMC, is available at http://www.mrao.cam.ac.uk/software/multinest/. It will also be released as part of the SUPERBAYES package, for the analysis of supersymmetric theories of particle physics, at http://www.superbayes.org.
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Origin of the pulse-like signature of shallow long-period volcano seismicity
Chouet, Bernard A.; Dawson, Phillip B.
2016-01-01
Short-duration, pulse-like long-period (LP) events are a characteristic type of seismicity accompanying eruptive activity at Mount Etna in Italy in 2004 and 2008 and at Turrialba Volcano in Costa Rica and Ubinas Volcano in Peru in 2009. We use the discrete wave number method to compute the free surface response in the near field of a rectangular tensile crack embedded in a homogeneous elastic half space and to gain insights into the origin of the LP pulses. Two source models are considered, including (1) a vertical fluid-driven crack and (2) a unilateral tensile rupture growing at a fixed sub-Rayleigh velocity with constant opening on a vertical crack. We apply cross correlation to the synthetics and data to demonstrate that a fluid-driven crack provides a natural explanation for these data with realistic source sizes and fluid properties. Our modeling points to shallow sources (<1 km depth), whose signatures are representative of the Rayleigh pulse sampled at epicentral distances >∼1 km. While a slow-rupture failure provides another potential model for these events, the synthetics and resulting fits to the data are not optimal in this model compared to a fluid-driven source. We infer that pulse-like LP signatures are parts of the continuum of responses produced by shallow fluid-driven sources in volcanoes.
Cultural Geography Model Validation
2010-03-01
the Cultural Geography Model (CGM), a government owned, open source multi - agent system utilizing Bayesian networks, queuing systems, the Theory of...referent determined either from theory or SME opinion. 4. CGM Overview The CGM is a government-owned, open source, data driven multi - agent social...HSCB, validation, social network analysis ABSTRACT: In the current warfighting environment , the military needs robust modeling and simulation (M&S
Personalized health care and health information technology policy: an exploratory analysis.
Wald, Jonathan S; Shapiro, Michael
2013-01-01
Personalized healthcare (PHC) is envisioned to enhance clinical practice decision-making using new genome-driven knowledge that tailors diagnosis, treatment, and prevention to the individual patient. In 2012, we conducted a focused environmental scan and informal interviews with fifteen experts to anticipate how PHC might impact health Information Technology (IT) policy in the United States. Findings indicatedthat PHC has a variable impact on current clinical practice, creates complex questions for providers, patients, and policy-makers, and will require a robust health IT infrastructure with advanced data architecture, clinical decision support, provider workflow tools, and re-use of clinical data for research. A number of health IT challenge areas were identified, along with five policy areas including: interoperable clinical decision support, standards for patient values and preferences, patient engagement, data transparency, and robust privacy and security.
Cascading Failures in Networks: Inference, Intervention and Robustness to WMDs
2016-08-01
model is posited, and different cascade eventualities are investigated), this proposal aimed to focus on the inverse problem and...theory and algorithms for an “ inverse problem” or “data- driven” study of cascades – specifically, learning about how they start
Towards Automatically Detecting Whether Student Learning Is Shallow
ERIC Educational Resources Information Center
Gowda, Sujith M.; Baker, Ryan S.; Corbett, Albert T.; Rossi, Lisa M.
2013-01-01
Recent research has extended student modeling to infer not just whether a student knows a skill or set of skills, but also whether the student has achieved robust learning--learning that enables the student to transfer their knowledge and prepares them for future learning (PFL). However, a student may fail to have robust learning in two fashions:…
Olsher, Daniel
2014-10-01
Noise-resistant and nuanced, COGBASE makes 10 million pieces of commonsense data and a host of novel reasoning algorithms available via a family of semantically-driven prior probability distributions. Machine learning, Big Data, natural language understanding/processing, and social AI can draw on COGBASE to determine lexical semantics, infer goals and interests, simulate emotion and affect, calculate document gists and topic models, and link commonsense knowledge to domain models and social, spatial, cultural, and psychological data. COGBASE is especially ideal for social Big Data, which tends to involve highly implicit contexts, cognitive artifacts, difficult-to-parse texts, and deep domain knowledge dependencies. Copyright © 2014 Elsevier Ltd. All rights reserved.
Polynomial chaos representation of databases on manifolds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu
2017-04-15
Characterizing the polynomial chaos expansion (PCE) of a vector-valued random variable with probability distribution concentrated on a manifold is a relevant problem in data-driven settings. The probability distribution of such random vectors is multimodal in general, leading to potentially very slow convergence of the PCE. In this paper, we build on a recent development for estimating and sampling from probabilities concentrated on a diffusion manifold. The proposed methodology constructs a PCE of the random vector together with an associated generator that samples from the target probability distribution which is estimated from data concentrated in the neighborhood of the manifold. Themore » method is robust and remains efficient for high dimension and large datasets. The resulting polynomial chaos construction on manifolds permits the adaptation of many uncertainty quantification and statistical tools to emerging questions motivated by data-driven queries.« less
An improved method for bivariate meta-analysis when within-study correlations are unknown.
Hong, Chuan; D Riley, Richard; Chen, Yong
2018-03-01
Multivariate meta-analysis, which jointly analyzes multiple and possibly correlated outcomes in a single analysis, is becoming increasingly popular in recent years. An attractive feature of the multivariate meta-analysis is its ability to account for the dependence between multiple estimates from the same study. However, standard inference procedures for multivariate meta-analysis require the knowledge of within-study correlations, which are usually unavailable. This limits standard inference approaches in practice. Riley et al proposed a working model and an overall synthesis correlation parameter to account for the marginal correlation between outcomes, where the only data needed are those required for a separate univariate random-effects meta-analysis. As within-study correlations are not required, the Riley method is applicable to a wide variety of evidence synthesis situations. However, the standard variance estimator of the Riley method is not entirely correct under many important settings. As a consequence, the coverage of a function of pooled estimates may not reach the nominal level even when the number of studies in the multivariate meta-analysis is large. In this paper, we improve the Riley method by proposing a robust variance estimator, which is asymptotically correct even when the model is misspecified (ie, when the likelihood function is incorrect). Simulation studies of a bivariate meta-analysis, in a variety of settings, show a function of pooled estimates has improved performance when using the proposed robust variance estimator. In terms of individual pooled estimates themselves, the standard variance estimator and robust variance estimator give similar results to the original method, with appropriate coverage. The proposed robust variance estimator performs well when the number of studies is relatively large. Therefore, we recommend the use of the robust method for meta-analyses with a relatively large number of studies (eg, m≥50). When the sample size is relatively small, we recommend the use of the robust method under the working independence assumption. We illustrate the proposed method through 2 meta-analyses. Copyright © 2017 John Wiley & Sons, Ltd.
The early rise and late demise of New Zealand’s last glacial maximum
Rother, Henrik; Fink, David; Shulmeister, James; Mifsud, Charles; Evans, Michael; Pugh, Jeremy
2014-01-01
Recent debate on records of southern midlatitude glaciation has focused on reconstructing glacier dynamics during the last glacial termination, with different results supporting both in-phase and out-of-phase correlations with Northern Hemisphere glacial signals. A continuing major weakness in this debate is the lack of robust data, particularly from the early and maximum phase of southern midlatitude glaciation (∼30–20 ka), to verify the competing models. Here we present a suite of 58 cosmogenic exposure ages from 17 last-glacial ice limits in the Rangitata Valley of New Zealand, capturing an extensive record of glacial oscillations between 28–16 ka. The sequence shows that the local last glacial maximum in this region occurred shortly before 28 ka, followed by several successively less extensive ice readvances between 26–19 ka. The onset of Termination 1 and the ensuing glacial retreat is preserved in exceptional detail through numerous recessional moraines, indicating that ice retreat between 19–16 ka was very gradual. Extensive valley glaciers survived in the Rangitata catchment until at least 15.8 ka. These findings preclude the previously inferred rapid climate-driven ice retreat in the Southern Alps after the onset of Termination 1. Our record documents an early last glacial maximum, an overall trend of diminishing ice volume in New Zealand between 28–20 ka, and gradual deglaciation until at least 15 ka. PMID:25071171
The early rise and late demise of New Zealand's last glacial maximum.
Rother, Henrik; Fink, David; Shulmeister, James; Mifsud, Charles; Evans, Michael; Pugh, Jeremy
2014-08-12
Recent debate on records of southern midlatitude glaciation has focused on reconstructing glacier dynamics during the last glacial termination, with different results supporting both in-phase and out-of-phase correlations with Northern Hemisphere glacial signals. A continuing major weakness in this debate is the lack of robust data, particularly from the early and maximum phase of southern midlatitude glaciation (∼30-20 ka), to verify the competing models. Here we present a suite of 58 cosmogenic exposure ages from 17 last-glacial ice limits in the Rangitata Valley of New Zealand, capturing an extensive record of glacial oscillations between 28-16 ka. The sequence shows that the local last glacial maximum in this region occurred shortly before 28 ka, followed by several successively less extensive ice readvances between 26-19 ka. The onset of Termination 1 and the ensuing glacial retreat is preserved in exceptional detail through numerous recessional moraines, indicating that ice retreat between 19-16 ka was very gradual. Extensive valley glaciers survived in the Rangitata catchment until at least 15.8 ka. These findings preclude the previously inferred rapid climate-driven ice retreat in the Southern Alps after the onset of Termination 1. Our record documents an early last glacial maximum, an overall trend of diminishing ice volume in New Zealand between 28-20 ka, and gradual deglaciation until at least 15 ka.
2014-01-01
Background As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Results Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. Conclusions When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates. PMID:24678701
DeGiorgio, Michael; Syring, John; Eckert, Andrew J; Liston, Aaron; Cronn, Richard; Neale, David B; Rosenberg, Noah A
2014-03-29
As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each "strategy" for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.
Real-time quality monitoring in debutanizer column with regression tree and ANFIS
NASA Astrophysics Data System (ADS)
Siddharth, Kumar; Pathak, Amey; Pani, Ajaya Kumar
2018-05-01
A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the output is the butane concentration in the debutanizer column bottom product. The input-output dataset is divided equally into a training (calibration) set and a validation (testing) set. The training set data were used to develop fuzzy inference, adaptive neuro fuzzy (ANFIS) and regression tree models for the debutanizer column. The accuracy of the developed models were evaluated by simulation of the models with the validation dataset. It is observed that the ANFIS model has better estimation accuracy than other models developed in this work and many data-driven models proposed so far in the literature for the debutanizer column.
Biologically inspired robots elicit a robust fear response in zebrafish
NASA Astrophysics Data System (ADS)
Ladu, Fabrizio; Bartolini, Tiziana; Panitz, Sarah G.; Butail, Sachit; Macrı, Simone; Porfiri, Maurizio
2015-03-01
We investigate the behavioral response of zebrafish to three fear-evoking stimuli. In a binary choice test, zebrafish are exposed to a live allopatric predator, a biologically-inspired robot, and a computer-animated image of the live predator. A target tracking algorithm is developed to score zebrafish behavior. Unlike computer-animated images, the robotic and live predator elicit a robust avoidance response. Importantly, the robotic stimulus elicits more consistent inter-individual responses than the live predator. Results from this effort are expected to aid in hypothesis-driven studies on zebrafish fear response, by offering a valuable approach to maximize data-throughput and minimize animal subjects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tobias, Benjamin John; Palaniyappan, Sasikumar; Gautier, Donald Cort
Images of the R2DTO resolution target were obtained during laser-driven-radiography experiments performed at the TRIDENT laser facility, and analysis of these images using the Bayesian Inference Engine (BIE) determines a most probable full-width half maximum (FWHM) spot size of 78 μm. However, significant uncertainty prevails due to variation in the measured detector blur. Propagating this uncertainty in detector blur through the forward model results in an interval of probabilistic ambiguity spanning approximately 35-195 μm when the laser energy impinges on a thick (1 mm) tantalum target. In other phases of the experiment, laser energy is deposited on a thin (~100more » nm) aluminum target placed 250 μm ahead of the tantalum converter. When the energetic electron beam is generated in this manner, upstream from the bremsstrahlung converter, the inferred spot size shifts to a range of much larger values, approximately 270-600 μm FWHM. This report discusses methods applied to obtain these intervals as well as concepts necessary for interpreting the result within a context of probabilistic quantitative inference.« less
NASA Astrophysics Data System (ADS)
Li, Shuanghong; Cao, Hongliang; Yang, Yupu
2018-02-01
Fault diagnosis is a key process for the reliability and safety of solid oxide fuel cell (SOFC) systems. However, it is difficult to rapidly and accurately identify faults for complicated SOFC systems, especially when simultaneous faults appear. In this research, a data-driven Multi-Label (ML) pattern identification approach is proposed to address the simultaneous fault diagnosis of SOFC systems. The framework of the simultaneous-fault diagnosis primarily includes two components: feature extraction and ML-SVM classifier. The simultaneous-fault diagnosis approach can be trained to diagnose simultaneous SOFC faults, such as fuel leakage, air leakage in different positions in the SOFC system, by just using simple training data sets consisting only single fault and not demanding simultaneous faults data. The experimental result shows the proposed framework can diagnose the simultaneous SOFC system faults with high accuracy requiring small number training data and low computational burden. In addition, Fault Inference Tree Analysis (FITA) is employed to identify the correlations among possible faults and their corresponding symptoms at the system component level.
The Development of Variable MLM Editor and TSQL Translator Based on Arden Syntax in Taiwan
Liang, Yan-Ching; Chang, Polun
2003-01-01
The Arden Syntax standard has been utilized in the medical informatics community in several countries during the past decade. It is never used in nursing in Taiwan. We try to develop a system that acquire medical expert knowledge in Chinese and translates data and logic slot into TSQL Language. The system implements TSQL translator interpreting database queries referred to in the knowledge modules. The decision-support systems in medicine are data driven system where TSQL triggers as inference engine can be used to facilitate linking to a database. PMID:14728414
Effect of distance-related heterogeneity on population size estimates from point counts
Efford, Murray G.; Dawson, Deanna K.
2009-01-01
Point counts are used widely to index bird populations. Variation in the proportion of birds counted is a known source of error, and for robust inference it has been advocated that counts be converted to estimates of absolute population size. We used simulation to assess nine methods for the conduct and analysis of point counts when the data included distance-related heterogeneity of individual detection probability. Distance from the observer is a ubiquitous source of heterogeneity, because nearby birds are more easily detected than distant ones. Several recent methods (dependent double-observer, time of first detection, time of detection, independent multiple-observer, and repeated counts) do not account for distance-related heterogeneity, at least in their simpler forms. We assessed bias in estimates of population size by simulating counts with fixed radius w over four time intervals (occasions). Detection probability per occasion was modeled as a half-normal function of distance with scale parameter sigma and intercept g(0) = 1.0. Bias varied with sigma/w; values of sigma inferred from published studies were often 50% for a 100-m fixed-radius count. More critically, the bias of adjusted counts sometimes varied more than that of unadjusted counts, and inference from adjusted counts would be less robust. The problem was not solved by using mixture models or including distance as a covariate. Conventional distance sampling performed well in simulations, but its assumptions are difficult to meet in the field. We conclude that no existing method allows effective estimation of population size from point counts.
NASA Astrophysics Data System (ADS)
Dietze, M.; Raiho, A.; Fer, I.; Dawson, A.; Heilman, K.; Hooten, M.; McLachlan, J. S.; Moore, D. J.; Paciorek, C. J.; Pederson, N.; Rollinson, C.; Tipton, J.
2017-12-01
The pre-industrial period serves as an essential baseline against which we judge anthropogenic impacts on the earth's systems. However, direct measurements of key biogeochemical processes, such as carbon, water, and nutrient cycling, are absent for this period and there is no direct way to link paleoecological proxies, such as pollen and tree rings, to these processes. Process-based terrestrial ecosystem models provide a way to make inferences about the past, but have large uncertainties and by themselves often fail to capture much of the observed variability. Here we investigate the ability to improve inferences about pre-industrial biogeochemical cycles through the formal assimilation of proxy data into multiple process-based models. A Tobit ensemble filter with explicit estimation of process error was run at five sites across the eastern US for three models (LINKAGES, ED2, LPJ-GUESS). In addition to process error, the ensemble accounted for parameter uncertainty, estimated through the assimilation of the TRY and BETY trait databases, and driver uncertainty, accommodated by probabilistically downscaling and debiasing CMIP5 GCM output then filtering based on paleoclimate reconstructions. The assimilation was informed by four PalEON data products, each of which includes an explicit Bayesian error estimate: (1) STEPPS forest composition estimated from fossil pollen; (2) REFAB aboveground biomass (AGB) estimated from fossil pollen; (3) tree ring AGB and woody net primary productivity (wNPP); and (4) public land survey composition, stem density, and AGB. By comparing ensemble runs with and without data assimilation we are able to assess the information contribution of the proxy data to constraining biogeochemical fluxes, which is driven by the combination of model uncertainty, data uncertainty, and the strength of correlation between observed and unobserved quantities in the model ensemble. To our knowledge this is the first attempt at multi-model data assimilation with terrestrial ecosystem models. Results from the data-model assimilation allow us to assess the consistency across models in post-assimilation inferences about indirectly inferred quantities, such as GPP, soil carbon, and the water budget.
Nichols, James D.; Pollock, Kenneth H.; Hines, James E.
1984-01-01
The robust design of Pollock (1982) was used to estimate parameters of a Maryland M. pennsylvanicus population. Closed model tests provided strong evidence of heterogeneity of capture probability, and model M eta (Otis et al., 1978) was selected as the most appropriate model for estimating population size. The Jolly-Seber model goodness-of-fit test indicated rejection of the model for this data set, and the M eta estimates of population size were all higher than the Jolly-Seber estimates. Both of these results are consistent with the evidence of heterogeneous capture probabilities. The authors thus used M eta estimates of population size, Jolly-Seber estimates of survival rate, and estimates of birth-immigration based on a combination of the population size and survival rate estimates. Advantages of the robust design estimates for certain inference procedures are discussed, and the design is recommended for future small mammal capture-recapture studies directed at estimation.
Log-Normal Turbulence Dissipation in Global Ocean Models
NASA Astrophysics Data System (ADS)
Pearson, Brodie; Fox-Kemper, Baylor
2018-03-01
Data from turbulent numerical simulations of the global ocean demonstrate that the dissipation of kinetic energy obeys a nearly log-normal distribution even at large horizontal scales O (10 km ) . As the horizontal scales of resolved turbulence are larger than the ocean is deep, the Kolmogorov-Yaglom theory for intermittency in 3D homogeneous, isotropic turbulence cannot apply; instead, the down-scale potential enstrophy cascade of quasigeostrophic turbulence should. Yet, energy dissipation obeys approximate log-normality—robustly across depths, seasons, regions, and subgrid schemes. The distribution parameters, skewness and kurtosis, show small systematic departures from log-normality with depth and subgrid friction schemes. Log-normality suggests that a few high-dissipation locations dominate the integrated energy and enstrophy budgets, which should be taken into account when making inferences from simplified models and inferring global energy budgets from sparse observations.
Robust geostatistical analysis of spatial data
NASA Astrophysics Data System (ADS)
Papritz, A.; Künsch, H. R.; Schwierz, C.; Stahel, W. A.
2012-04-01
Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outlying observations may results from errors (e.g. in data transcription) or from local perturbations in the processes that are responsible for a given pattern of spatial variation. As an example, the spatial distribution of some trace metal in the soils of a region may be distorted by emissions of local anthropogenic sources. Outliers affect the modelling of the large-scale spatial variation, the so-called external drift or trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) [2] proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) [1] for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation. Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and unsampled locations and kriging variances. The method has been implemented in an R package. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis of the Tarrawarra soil moisture data set [3].
Schema-Driven Facilitation of New Hierarchy Learning in the Transitive Inference Paradigm
ERIC Educational Resources Information Center
Kumaran, Dharshan
2013-01-01
Prior knowledge, in the form of a mental schema or framework, is viewed to facilitate the learning of new information in a range of experimental and everyday scenarios. Despite rising interest in the cognitive and neural mechanisms underlying schema-driven facilitation of new learning, few paradigms have been developed to examine this issue in…
Symbolic phase transfer entropy method and its application
NASA Astrophysics Data System (ADS)
Zhang, Ningning; Lin, Aijing; Shang, Pengjian
2017-10-01
In this paper, we introduce symbolic phase transfer entropy (SPTE) to infer the direction and strength of information flow among systems. The advantages of the proposed method are investigated by simulations on synthetic signals and real-world data. We demonstrate that symbolic phase transfer entropy is a robust and efficient tool to infer the information flow between complex systems. Based on the study of the synthetic data, we find a significant advantage of SPTE is its reduced sensitivity to noise. In addition, SPTE requires less amount of data than symbolic transfer entropy(STE). We analyze the direction and strength of information flow between six stock markets during the period from 2006 to 2016. The results indicate that the information flow among stocks varies over different periods. We also find that the interaction network pattern among stocks undergoes hierarchial reorganization with transition from one period to another. It is shown that the clusters are mainly classified according to period, and then by region. The stocks during the same time period are shown to drop into the same cluster.
Extreme learning machine for reduced order modeling of turbulent geophysical flows.
San, Omer; Maulik, Romit
2018-04-01
We investigate the application of artificial neural networks to stabilize proper orthogonal decomposition-based reduced order models for quasistationary geophysical turbulent flows. An extreme learning machine concept is introduced for computing an eddy-viscosity closure dynamically to incorporate the effects of the truncated modes. We consider a four-gyre wind-driven ocean circulation problem as our prototype setting to assess the performance of the proposed data-driven approach. Our framework provides a significant reduction in computational time and effectively retains the dynamics of the full-order model during the forward simulation period beyond the training data set. Furthermore, we show that the method is robust for larger choices of time steps and can be used as an efficient and reliable tool for long time integration of general circulation models.
Extreme learning machine for reduced order modeling of turbulent geophysical flows
NASA Astrophysics Data System (ADS)
San, Omer; Maulik, Romit
2018-04-01
We investigate the application of artificial neural networks to stabilize proper orthogonal decomposition-based reduced order models for quasistationary geophysical turbulent flows. An extreme learning machine concept is introduced for computing an eddy-viscosity closure dynamically to incorporate the effects of the truncated modes. We consider a four-gyre wind-driven ocean circulation problem as our prototype setting to assess the performance of the proposed data-driven approach. Our framework provides a significant reduction in computational time and effectively retains the dynamics of the full-order model during the forward simulation period beyond the training data set. Furthermore, we show that the method is robust for larger choices of time steps and can be used as an efficient and reliable tool for long time integration of general circulation models.
Weiss, Scott T.
2014-01-01
Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com. PMID:24922310
Meyer, Georg F; Spray, Amy; Fairlie, Jo E; Uomini, Natalie T
2014-01-01
Current neuroimaging techniques with high spatial resolution constrain participant motion so that many natural tasks cannot be carried out. The aim of this paper is to show how a time-locked correlation-analysis of cerebral blood flow velocity (CBFV) lateralization data, obtained with functional TransCranial Doppler (fTCD) ultrasound, can be used to infer cerebral activation patterns across tasks. In a first experiment we demonstrate that the proposed analysis method results in data that are comparable with the standard Lateralization Index (LI) for within-task comparisons of CBFV patterns, recorded during cued word generation (CWG) at two difficulty levels. In the main experiment we demonstrate that the proposed analysis method shows correlated blood-flow patterns for two different cognitive tasks that are known to draw on common brain areas, CWG, and Music Synthesis. We show that CBFV patterns for Music and CWG are correlated only for participants with prior musical training. CBFV patterns for tasks that draw on distinct brain areas, the Tower of London and CWG, are not correlated. The proposed methodology extends conventional fTCD analysis by including temporal information in the analysis of cerebral blood-flow patterns to provide a robust, non-invasive method to infer whether common brain areas are used in different cognitive tasks. It complements conventional high resolution imaging techniques.
McGeachie, Michael J; Chang, Hsun-Hsien; Weiss, Scott T
2014-06-01
Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.
Highly efficient and robust molecular ruthenium catalysts for water oxidation
Duan, Lele; Araujo, Carlos Moyses; Ahlquist, Mårten S.G.; Sun, Licheng
2012-01-01
Water oxidation catalysts are essential components of light-driven water splitting systems, which could convert water to H2 driven by solar radiation (H2O + hν → 1/2O2 + H2). The oxidation of water (H2O → 1/2O2 + 2H+ + 2e-) provides protons and electrons for the production of dihydrogen (2H+ + 2e- → H2), a clean-burning and high-capacity energy carrier. One of the obstacles now is the lack of effective and robust water oxidation catalysts. Aiming at developing robust molecular Ru-bda (H2bda = 2,2′-bipyridine-6,6′-dicarboxylic acid) water oxidation catalysts, we carried out density functional theory studies, correlated the robustness of catalysts against hydration with the highest occupied molecular orbital levels of a set of ligands, and successfully directed the synthesis of robust Ru-bda water oxidation catalysts. A series of mononuclear ruthenium complexes [Ru(bda)L2] (L = pyridazine, pyrimidine, and phthalazine) were subsequently synthesized and shown to effectively catalyze CeIV-driven [CeIV = Ce(NH4)2(NO3)6] water oxidation with high oxygen production rates up to 286 s-1 and high turnover numbers up to 55,400. PMID:22753518
Robust non-parametric one-sample tests for the analysis of recurrent events.
Rebora, Paola; Galimberti, Stefania; Valsecchi, Maria Grazia
2010-12-30
One-sample non-parametric tests are proposed here for inference on recurring events. The focus is on the marginal mean function of events and the basis for inference is the standardized distance between the observed and the expected number of events under a specified reference rate. Different weights are considered in order to account for various types of alternative hypotheses on the mean function of the recurrent events process. A robust version and a stratified version of the test are also proposed. The performance of these tests was investigated through simulation studies under various underlying event generation processes, such as homogeneous and nonhomogeneous Poisson processes, autoregressive and renewal processes, with and without frailty effects. The robust versions of the test have been shown to be suitable in a wide variety of event generating processes. The motivating context is a study on gene therapy in a very rare immunodeficiency in children, where a major end-point is the recurrence of severe infections. Robust non-parametric one-sample tests for recurrent events can be useful to assess efficacy and especially safety in non-randomized studies or in epidemiological studies for comparison with a standard population. Copyright © 2010 John Wiley & Sons, Ltd.
DeepInfer: open-source deep learning deployment toolkit for image-guided therapy
NASA Astrophysics Data System (ADS)
Mehrtash, Alireza; Pesteie, Mehran; Hetherington, Jorden; Behringer, Peter A.; Kapur, Tina; Wells, William M.; Rohling, Robert; Fedorov, Andriy; Abolmaesumi, Purang
2017-03-01
Deep learning models have outperformed some of the previous state-of-the-art approaches in medical image analysis. Instead of using hand-engineered features, deep models attempt to automatically extract hierarchical representations at multiple levels of abstraction from the data. Therefore, deep models are usually considered to be more flexible and robust solutions for image analysis problems compared to conventional computer vision models. They have demonstrated significant improvements in computer-aided diagnosis and automatic medical image analysis applied to such tasks as image segmentation, classification and registration. However, deploying deep learning models often has a steep learning curve and requires detailed knowledge of various software packages. Thus, many deep models have not been integrated into the clinical research work ows causing a gap between the state-of-the-art machine learning in medical applications and evaluation in clinical research procedures. In this paper, we propose "DeepInfer" - an open-source toolkit for developing and deploying deep learning models within the 3D Slicer medical image analysis platform. Utilizing a repository of task-specific models, DeepInfer allows clinical researchers and biomedical engineers to deploy a trained model selected from the public registry, and apply it to new data without the need for software development or configuration. As two practical use cases, we demonstrate the application of DeepInfer in prostate segmentation for targeted MRI-guided biopsy and identification of the target plane in 3D ultrasound for spinal injections.
DeepInfer: Open-Source Deep Learning Deployment Toolkit for Image-Guided Therapy.
Mehrtash, Alireza; Pesteie, Mehran; Hetherington, Jorden; Behringer, Peter A; Kapur, Tina; Wells, William M; Rohling, Robert; Fedorov, Andriy; Abolmaesumi, Purang
2017-02-11
Deep learning models have outperformed some of the previous state-of-the-art approaches in medical image analysis. Instead of using hand-engineered features, deep models attempt to automatically extract hierarchical representations at multiple levels of abstraction from the data. Therefore, deep models are usually considered to be more flexible and robust solutions for image analysis problems compared to conventional computer vision models. They have demonstrated significant improvements in computer-aided diagnosis and automatic medical image analysis applied to such tasks as image segmentation, classification and registration. However, deploying deep learning models often has a steep learning curve and requires detailed knowledge of various software packages. Thus, many deep models have not been integrated into the clinical research workflows causing a gap between the state-of-the-art machine learning in medical applications and evaluation in clinical research procedures. In this paper, we propose "DeepInfer" - an open-source toolkit for developing and deploying deep learning models within the 3D Slicer medical image analysis platform. Utilizing a repository of task-specific models, DeepInfer allows clinical researchers and biomedical engineers to deploy a trained model selected from the public registry, and apply it to new data without the need for software development or configuration. As two practical use cases, we demonstrate the application of DeepInfer in prostate segmentation for targeted MRI-guided biopsy and identification of the target plane in 3D ultrasound for spinal injections.
Preserving temporal relations in clinical data while maintaining privacy
Mirhaji, Parsa; Low, Alexander FH; Malin, Bradley A
2016-01-01
Objective Maintaining patient privacy is a challenge in large-scale observational research. To assist in reducing the risk of identifying study subjects through publicly available data, we introduce a method for obscuring date information for clinical events and patient characteristics. Methods The method, which we call Shift and Truncate (SANT), obscures date information to any desired granularity. Shift and Truncate first assigns each patient a random shift value, such that all dates in that patient’s record are shifted by that amount. Data are then truncated from the beginning and end of the data set. Results The data set can be proven to not disclose temporal information finer than the chosen granularity. Unlike previous strategies such as a simple shift, it remains robust to frequent – even daily – updates and robust to inferring dates at the beginning and end of date-shifted data sets. Time-of-day may be retained or obscured, depending on the goal and anticipated knowledge of the data recipient. Conclusions The method can be useful as a scientific approach for reducing re-identification risk under the Privacy Rule of the Health Insurance Portability and Accountability Act and may contribute to qualification for the Safe Harbor implementation. PMID:27013522
Mahajan, Anubha; Wessel, Jennifer; Willems, Sara M; Zhao, Wei; Robertson, Neil R; Chu, Audrey Y; Gan, Wei; Kitajima, Hidetoshi; Taliun, Daniel; Rayner, N William; Guo, Xiuqing; Lu, Yingchang; Li, Man; Jensen, Richard A; Hu, Yao; Huo, Shaofeng; Lohman, Kurt K; Zhang, Weihua; Cook, James P; Prins, Bram Peter; Flannick, Jason; Grarup, Niels; Trubetskoy, Vassily Vladimirovich; Kravic, Jasmina; Kim, Young Jin; Rybin, Denis V; Yaghootkar, Hanieh; Müller-Nurasyid, Martina; Meidtner, Karina; Li-Gao, Ruifang; Varga, Tibor V; Marten, Jonathan; Li, Jin; Smith, Albert Vernon; An, Ping; Ligthart, Symen; Gustafsson, Stefan; Malerba, Giovanni; Demirkan, Ayse; Tajes, Juan Fernandez; Steinthorsdottir, Valgerdur; Wuttke, Matthias; Lecoeur, Cécile; Preuss, Michael; Bielak, Lawrence F; Graff, Marielisa; Highland, Heather M; Justice, Anne E; Liu, Dajiang J; Marouli, Eirini; Peloso, Gina Marie; Warren, Helen R; Afaq, Saima; Afzal, Shoaib; Ahlqvist, Emma; Almgren, Peter; Amin, Najaf; Bang, Lia B; Bertoni, Alain G; Bombieri, Cristina; Bork-Jensen, Jette; Brandslund, Ivan; Brody, Jennifer A; Burtt, Noël P; Canouil, Mickaël; Chen, Yii-Der Ida; Cho, Yoon Shin; Christensen, Cramer; Eastwood, Sophie V; Eckardt, Kai-Uwe; Fischer, Krista; Gambaro, Giovanni; Giedraitis, Vilmantas; Grove, Megan L; de Haan, Hugoline G; Hackinger, Sophie; Hai, Yang; Han, Sohee; Tybjærg-Hansen, Anne; Hivert, Marie-France; Isomaa, Bo; Jäger, Susanne; Jørgensen, Marit E; Jørgensen, Torben; Käräjämäki, Annemari; Kim, Bong-Jo; Kim, Sung Soo; Koistinen, Heikki A; Kovacs, Peter; Kriebel, Jennifer; Kronenberg, Florian; Läll, Kristi; Lange, Leslie A; Lee, Jung-Jin; Lehne, Benjamin; Li, Huaixing; Lin, Keng-Hung; Linneberg, Allan; Liu, Ching-Ti; Liu, Jun; Loh, Marie; Mägi, Reedik; Mamakou, Vasiliki; McKean-Cowdin, Roberta; Nadkarni, Girish; Neville, Matt; Nielsen, Sune F; Ntalla, Ioanna; Peyser, Patricia A; Rathmann, Wolfgang; Rice, Kenneth; Rich, Stephen S; Rode, Line; Rolandsson, Olov; Schönherr, Sebastian; Selvin, Elizabeth; Small, Kerrin S; Stančáková, Alena; Surendran, Praveen; Taylor, Kent D; Teslovich, Tanya M; Thorand, Barbara; Thorleifsson, Gudmar; Tin, Adrienne; Tönjes, Anke; Varbo, Anette; Witte, Daniel R; Wood, Andrew R; Yajnik, Pranav; Yao, Jie; Yengo, Loïc; Young, Robin; Amouyel, Philippe; Boeing, Heiner; Boerwinkle, Eric; Bottinger, Erwin P; Chowdhury, Rajiv; Collins, Francis S; Dedoussis, George; Dehghan, Abbas; Deloukas, Panos; Ferrario, Marco M; Ferrières, Jean; Florez, Jose C; Frossard, Philippe; Gudnason, Vilmundur; Harris, Tamara B; Heckbert, Susan R; Howson, Joanna M M; Ingelsson, Martin; Kathiresan, Sekar; Kee, Frank; Kuusisto, Johanna; Langenberg, Claudia; Launer, Lenore J; Lindgren, Cecilia M; Männistö, Satu; Meitinger, Thomas; Melander, Olle; Mohlke, Karen L; Moitry, Marie; Morris, Andrew D; Murray, Alison D; de Mutsert, Renée; Orho-Melander, Marju; Owen, Katharine R; Perola, Markus; Peters, Annette; Province, Michael A; Rasheed, Asif; Ridker, Paul M; Rivadineira, Fernando; Rosendaal, Frits R; Rosengren, Anders H; Salomaa, Veikko; Sheu, Wayne H-H; Sladek, Rob; Smith, Blair H; Strauch, Konstantin; Uitterlinden, André G; Varma, Rohit; Willer, Cristen J; Blüher, Matthias; Butterworth, Adam S; Chambers, John Campbell; Chasman, Daniel I; Danesh, John; van Duijn, Cornelia; Dupuis, Josée; Franco, Oscar H; Franks, Paul W; Froguel, Philippe; Grallert, Harald; Groop, Leif; Han, Bok-Ghee; Hansen, Torben; Hattersley, Andrew T; Hayward, Caroline; Ingelsson, Erik; Kardia, Sharon L R; Karpe, Fredrik; Kooner, Jaspal Singh; Köttgen, Anna; Kuulasmaa, Kari; Laakso, Markku; Lin, Xu; Lind, Lars; Liu, Yongmei; Loos, Ruth J F; Marchini, Jonathan; Metspalu, Andres; Mook-Kanamori, Dennis; Nordestgaard, Børge G; Palmer, Colin N A; Pankow, James S; Pedersen, Oluf; Psaty, Bruce M; Rauramaa, Rainer; Sattar, Naveed; Schulze, Matthias B; Soranzo, Nicole; Spector, Timothy D; Stefansson, Kari; Stumvoll, Michael; Thorsteinsdottir, Unnur; Tuomi, Tiinamaija; Tuomilehto, Jaakko; Wareham, Nicholas J; Wilson, James G; Zeggini, Eleftheria; Scott, Robert A; Barroso, Inês; Frayling, Timothy M; Goodarzi, Mark O; Meigs, James B; Boehnke, Michael; Saleheen, Danish; Morris, Andrew P; Rotter, Jerome I; McCarthy, Mark I
2018-04-01
We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (P < 2.2 × 10 -7 ); of these, 16 map outside known risk-associated loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent 'false leads' with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets; however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition.
The MR-Base platform supports systematic causal inference across the human phenome
Wade, Kaitlin H; Haberland, Valeriia; Baird, Denis; Laurin, Charles; Burgess, Stephen; Bowden, Jack; Langdon, Ryan; Tan, Vanessa Y; Yarmolinsky, James; Shihab, Hashem A; Timpson, Nicholas J; Evans, David M; Relton, Caroline; Martin, Richard M; Davey Smith, George
2018-01-01
Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies. PMID:29846171
Interpretable inference on the mixed effect model with the Box-Cox transformation.
Maruo, K; Yamaguchi, Y; Noma, H; Gosho, M
2017-07-10
We derived results for inference on parameters of the marginal model of the mixed effect model with the Box-Cox transformation based on the asymptotic theory approach. We also provided a robust variance estimator of the maximum likelihood estimator of the parameters of this model in consideration of the model misspecifications. Using these results, we developed an inference procedure for the difference of the model median between treatment groups at the specified occasion in the context of mixed effects models for repeated measures analysis for randomized clinical trials, which provided interpretable estimates of the treatment effect. From simulation studies, it was shown that our proposed method controlled type I error of the statistical test for the model median difference in almost all the situations and had moderate or high performance for power compared with the existing methods. We illustrated our method with cluster of differentiation 4 (CD4) data in an AIDS clinical trial, where the interpretability of the analysis results based on our proposed method is demonstrated. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Topics in inference and decision-making with partial knowledge
NASA Technical Reports Server (NTRS)
Safavian, S. Rasoul; Landgrebe, David
1990-01-01
Two essential elements needed in the process of inference and decision-making are prior probabilities and likelihood functions. When both of these components are known accurately and precisely, the Bayesian approach provides a consistent and coherent solution to the problems of inference and decision-making. In many situations, however, either one or both of the above components may not be known, or at least may not be known precisely. This problem of partial knowledge about prior probabilities and likelihood functions is addressed. There are at least two ways to cope with this lack of precise knowledge: robust methods, and interval-valued methods. First, ways of modeling imprecision and indeterminacies in prior probabilities and likelihood functions are examined; then how imprecision in the above components carries over to the posterior probabilities is examined. Finally, the problem of decision making with imprecise posterior probabilities and the consequences of such actions are addressed. Application areas where the above problems may occur are in statistical pattern recognition problems, for example, the problem of classification of high-dimensional multispectral remote sensing image data.
IC-Finder: inferring robustly the hierarchical organization of chromatin folding
Haddad, Noelle
2017-01-01
Abstract The spatial organization of the genome plays a crucial role in the regulation of gene expression. Recent experimental techniques like Hi-C have emphasized the segmentation of genomes into interaction compartments that constitute conserved functional domains participating in the maintenance of a proper cell identity. Here, we propose a novel method, IC-Finder, to identify interaction compartments (IC) from experimental Hi-C maps. IC-Finder is based on a hierarchical clustering approach that we adapted to account for the polymeric nature of chromatin. Based on a benchmark of realistic in silico Hi-C maps, we show that IC-Finder is one of the best methods in terms of reliability and is the most efficient numerically. IC-Finder proposes two original options: a probabilistic description of the inferred compartments and the possibility to explore the various hierarchies of chromatin organization. Applying the method to experimental data in fly and human, we show how the predicted segmentation may depend on the normalization scheme and how 3D compartmentalization is tightly associated with epigenomic information. IC-Finder provides a robust and generic ‘all-in-one’ tool to uncover the general principles of 3D chromatin folding and their influence on gene regulation. The software is available at http://membres-timc.imag.fr/Daniel.Jost/DJ-TIMC/Software.html. PMID:28130423
Reconstructing Dynamic Promoter Activity Profiles from Reporter Gene Data.
Kannan, Soumya; Sams, Thomas; Maury, Jérôme; Workman, Christopher T
2018-03-16
Accurate characterization of promoter activity is important when designing expression systems for systems biology and metabolic engineering applications. Promoters that respond to changes in the environment enable the dynamic control of gene expression without the necessity of inducer compounds, for example. However, the dynamic nature of these processes poses challenges for estimating promoter activity. Most experimental approaches utilize reporter gene expression to estimate promoter activity. Typically the reporter gene encodes a fluorescent protein that is used to infer a constant promoter activity despite the fact that the observed output may be dynamic and is a number of steps away from the transcription process. In fact, some promoters that are often thought of as constitutive can show changes in activity when growth conditions change. For these reasons, we have developed a system of ordinary differential equations for estimating dynamic promoter activity for promoters that change their activity in response to the environment that is robust to noise and changes in growth rate. Our approach, inference of dynamic promoter activity (PromAct), improves on existing methods by more accurately inferring known promoter activity profiles. This method is also capable of estimating the correct scale of promoter activity and can be applied to quantitative data sets to estimate quantitative rates.
Frey, Jennifer K.; Lewis, Jeremy C.; Guy, Rachel K.; Stuart, James N.
2013-01-01
Simple Summary We evaluated the influence of occurrence records with different reliability on predicted distribution of a unique, rare mammal in the American Southwest, the white-nosed coati (Nasua narica). We concluded that occurrence datasets that include anecdotal records can be used to infer species distributions, providing such data are used only for easily-identifiable species and based on robust modeling methods such as maximum entropy. Use of a reliability rating system is critical for using anecdotal data. Abstract Species distributions are usually inferred from occurrence records. However, these records are prone to errors in spatial precision and reliability. Although influence of spatial errors has been fairly well studied, there is little information on impacts of poor reliability. Reliability of an occurrence record can be influenced by characteristics of the species, conditions during the observation, and observer’s knowledge. Some studies have advocated use of anecdotal data, while others have advocated more stringent evidentiary standards such as only accepting records verified by physical evidence, at least for rare or elusive species. Our goal was to evaluate the influence of occurrence records with different reliability on species distribution models (SDMs) of a unique mammal, the white-nosed coati (Nasua narica) in the American Southwest. We compared SDMs developed using maximum entropy analysis of combined bioclimatic and biophysical variables and based on seven subsets of occurrence records that varied in reliability and spatial precision. We found that the predicted distribution of the coati based on datasets that included anecdotal occurrence records were similar to those based on datasets that only included physical evidence. Coati distribution in the American Southwest was predicted to occur in southwestern New Mexico and southeastern Arizona and was defined primarily by evenness of climate and Madrean woodland and chaparral land-cover types. Coati distribution patterns in this region suggest a good model for understanding the biogeographic structure of range margins. We concluded that occurrence datasets that include anecdotal records can be used to infer species distributions, providing such data are used only for easily-identifiable species and based on robust modeling methods such as maximum entropy. Use of a reliability rating system is critical for using anecdotal data. PMID:26487405
Görgen, Kai; Hebart, Martin N; Allefeld, Carsten; Haynes, John-Dylan
2017-12-27
Standard neuroimaging data analysis based on traditional principles of experimental design, modelling, and statistical inference is increasingly complemented by novel analysis methods, driven e.g. by machine learning methods. While these novel approaches provide new insights into neuroimaging data, they often have unexpected properties, generating a growing literature on possible pitfalls. We propose to meet this challenge by adopting a habit of systematic testing of experimental design, analysis procedures, and statistical inference. Specifically, we suggest to apply the analysis method used for experimental data also to aspects of the experimental design, simulated confounds, simulated null data, and control data. We stress the importance of keeping the analysis method the same in main and test analyses, because only this way possible confounds and unexpected properties can be reliably detected and avoided. We describe and discuss this Same Analysis Approach in detail, and demonstrate it in two worked examples using multivariate decoding. With these examples, we reveal two sources of error: A mismatch between counterbalancing (crossover designs) and cross-validation which leads to systematic below-chance accuracies, and linear decoding of a nonlinear effect, a difference in variance. Copyright © 2017 Elsevier Inc. All rights reserved.
Global Quantitative Modeling of Chromatin Factor Interactions
Zhou, Jian; Troyanskaya, Olga G.
2014-01-01
Chromatin is the driver of gene regulation, yet understanding the molecular interactions underlying chromatin factor combinatorial patterns (or the “chromatin codes”) remains a fundamental challenge in chromatin biology. Here we developed a global modeling framework that leverages chromatin profiling data to produce a systems-level view of the macromolecular complex of chromatin. Our model ultilizes maximum entropy modeling with regularization-based structure learning to statistically dissect dependencies between chromatin factors and produce an accurate probability distribution of chromatin code. Our unsupervised quantitative model, trained on genome-wide chromatin profiles of 73 histone marks and chromatin proteins from modENCODE, enabled making various data-driven inferences about chromatin profiles and interactions. We provided a highly accurate predictor of chromatin factor pairwise interactions validated by known experimental evidence, and for the first time enabled higher-order interaction prediction. Our predictions can thus help guide future experimental studies. The model can also serve as an inference engine for predicting unknown chromatin profiles — we demonstrated that with this approach we can leverage data from well-characterized cell types to help understand less-studied cell type or conditions. PMID:24675896
Browne, Fiona; Wang, Haiying; Zheng, Huiru; Azuaje, Francisco
2010-03-01
This study applied a knowledge-driven data integration framework for the inference of protein-protein interactions (PPI). Evidence from diverse genomic features is integrated using a knowledge-driven Bayesian network (KD-BN). Receiver operating characteristic (ROC) curves may not be the optimal assessment method to evaluate a classifier's performance in PPI prediction as the majority of the area under the curve (AUC) may not represent biologically meaningful results. It may be of benefit to interpret the AUC of a partial ROC curve whereby biologically interesting results are represented. Therefore, the novel application of the assessment method referred to as the partial ROC has been employed in this study to assess predictive performance of PPI predictions along with calculating the True positive/false positive rate and true positive/positive rate. By incorporating domain knowledge into the construction of the KD-BN, we demonstrate improvement in predictive performance compared with previous studies based upon the Naive Bayesian approach. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Eastern Indian Ocean microcontinent formation driven by plate motion changes
NASA Astrophysics Data System (ADS)
Whittaker, J. M.; Williams, S. E.; Halpin, J. A.; Wild, T. J.; Stilwell, J. D.; Jourdan, F.; Daczko, N. R.
2016-11-01
The roles of plate tectonic or mantle dynamic forces in rupturing continental lithosphere remain controversial. Particularly enigmatic is the rifting of microcontinents from mature continental rifted margins, with plume-driven thermal weakening commonly inferred to facilitate calving. However, a role for plate tectonic reorganisations has also been suggested. Here, we show that a combination of plate tectonic reorganisation and plume-driven thermal weakening were required to calve the Batavia and Gulden Draak microcontinents in the Cretaceous Indian Ocean. We reconstruct the evolution of these two microcontinents using constraints from new paleontological samples, 40Ar/39Ar ages, and geophysical data. Calving from India occurred at 101-104 Ma, coinciding with the onset of a dramatic change in Indian plate motion. Critically, Kerguelen plume volcanism does not appear to have directly triggered calving. Rather, it is likely that plume-related thermal weakening of the Indian passive margin preconditioned it for microcontinent formation but calving was triggered by changes in plate tectonic boundary forces.
NASA Astrophysics Data System (ADS)
Yi, J.; Choi, C.
2014-12-01
Rainfall observation and forecasting using remote sensing such as RADAR(Radio Detection and Ranging) and satellite images are widely used to delineate the increased damage by rapid weather changeslike regional storm and flash flood. The flood runoff was calculated by using adaptive neuro-fuzzy inference system, the data driven models and MAPLE(McGill Algorithm for Precipitation Nowcasting by Lagrangian Extrapolation) forecasted precipitation data as the input variables.The result of flood estimation method using neuro-fuzzy technique and RADAR forecasted precipitation data was evaluated by comparing it with the actual data.The Adaptive Neuro Fuzzy method was applied to the Chungju Reservoir basin in Korea. The six rainfall events during the flood seasons in 2010 and 2011 were used for the input data.The reservoir inflow estimation results were comparedaccording to the rainfall data used for training, checking and testing data in the model setup process. The results of the 15 models with the combination of the input variables were compared and analyzed. Using the relatively larger clustering radius and the biggest flood ever happened for training data showed the better flood estimation in this study.The model using the MAPLE forecasted precipitation data showed better result for inflow estimation in the Chungju Reservoir.
GPS Imaging of Global Vertical Land Motion for Sea Level Studies
NASA Astrophysics Data System (ADS)
Hammond, W. C.; Blewitt, G.; Hamlington, B. D.
2015-12-01
Coastal vertical land motion contributes to the signal of local relative sea level change. Moreover, understanding global sea level change requires understanding local sea level rise at many locations around Earth. It is therefore essential to understand the regional secular vertical land motion attributable to mantle flow, tectonic deformation, glacial isostatic adjustment, postseismic viscoelastic relaxation, groundwater basin subsidence, elastic rebound from groundwater unloading or other processes that can change the geocentric height of tide gauges anchored to the land. These changes can affect inferences of global sea level rise and should be taken into account for global projections. We present new results of GPS imaging of vertical land motion across most of Earth's continents including its ice-free coastlines around North and South America, Europe, Australia, Japan, parts of Africa and Indonesia. These images are based on data from many independent open access globally distributed continuously recording GPS networks including over 13,500 stations. The data are processed in our system to obtain solutions aligned to the International Terrestrial Reference Frame (ITRF08). To generate images of vertical rate we apply the Median Interannual Difference Adjusted for Skewness (MIDAS) algorithm to the vertical times series to obtain robust non-parametric estimates with realistic uncertainties. We estimate the vertical land motion at the location of 1420 tide gauges locations using Delaunay-based geographic interpolation with an empirically derived distance weighting function and median spatial filtering. The resulting image is insensitive to outliers and steps in the GPS time series, omits short wavelength features attributable to unstable stations or unrepresentative rates, and emphasizes long-wavelength mantle-driven vertical rates.
Design and experiment of data-driven modeling and flutter control of a prototype wing
NASA Astrophysics Data System (ADS)
Lum, Kai-Yew; Xu, Cai-Lin; Lu, Zhenbo; Lai, Kwok-Leung; Cui, Yongdong
2017-06-01
This paper presents an approach for data-driven modeling of aeroelasticity and its application to flutter control design of a wind-tunnel wing model. Modeling is centered on system identification of unsteady aerodynamic loads using computational fluid dynamics data, and adopts a nonlinear multivariable extension of the Hammerstein-Wiener system. The formulation is in modal coordinates of the elastic structure, and yields a reduced-order model of the aeroelastic feedback loop that is parametrized by airspeed. Flutter suppression is thus cast as a robust stabilization problem over uncertain airspeed, for which a low-order H∞ controller is computed. The paper discusses in detail parameter sensitivity and observability of the model, the former to justify the chosen model structure, and the latter to provide a criterion for physical sensor placement. Wind tunnel experiments confirm the validity of the modeling approach and the effectiveness of the control design.
Genomic clocks and evolutionary timescales
NASA Technical Reports Server (NTRS)
Blair Hedges, S.; Kumar, Sudhir
2003-01-01
For decades, molecular clocks have helped to illuminate the evolutionary timescale of life, but now genomic data pose a challenge for time estimation methods. It is unclear how to integrate data from many genes, each potentially evolving under a different model of substitution and at a different rate. Current methods can be grouped by the way the data are handled (genes considered separately or combined into a 'supergene') and the way gene-specific rate models are applied (global versus local clock). There are advantages and disadvantages to each of these approaches, and the optimal method has not yet emerged. Fortunately, time estimates inferred using many genes or proteins have greater precision and appear to be robust to different approaches.
Maneshi, Mona; Vahdat, Shahabeddin; Gotman, Jean; Grova, Christophe
2016-01-01
Independent component analysis (ICA) has been widely used to study functional magnetic resonance imaging (fMRI) connectivity. However, the application of ICA in multi-group designs is not straightforward. We have recently developed a new method named “shared and specific independent component analysis” (SSICA) to perform between-group comparisons in the ICA framework. SSICA is sensitive to extract those components which represent a significant difference in functional connectivity between groups or conditions, i.e., components that could be considered “specific” for a group or condition. Here, we investigated the performance of SSICA on realistic simulations, and task fMRI data and compared the results with one of the state-of-the-art group ICA approaches to infer between-group differences. We examined SSICA robustness with respect to the number of allowable extracted specific components and between-group orthogonality assumptions. Furthermore, we proposed a modified formulation of the back-reconstruction method to generate group-level t-statistics maps based on SSICA results. We also evaluated the consistency and specificity of the extracted specific components by SSICA. The results on realistic simulated and real fMRI data showed that SSICA outperforms the regular group ICA approach in terms of reconstruction and classification performance. We demonstrated that SSICA is a powerful data-driven approach to detect patterns of differences in functional connectivity across groups/conditions, particularly in model-free designs such as resting-state fMRI. Our findings in task fMRI show that SSICA confirms results of the general linear model (GLM) analysis and when combined with clustering analysis, it complements GLM findings by providing additional information regarding the reliability and specificity of networks. PMID:27729843
AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis
Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud
2003-01-01
Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565
Semi-Supervised Multi-View Learning for Gene Network Reconstruction
Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo
2015-01-01
The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091
Bhaskar, Anand; Javanmard, Adel; Courtade, Thomas A; Tse, David
2017-03-15
Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal components analysis (PCA) and multidimensional scaling, or using explicit spatial probabilistic models of allele frequency evolution. We develop a general probabilistic model and an associated inference algorithm that unify the model-based and data-driven approaches to visualizing and inferring population structure. Our spatial inference algorithm can also be effectively applied to the problem of population stratification in genome-wide association studies (GWAS), where hidden population structure can create fictitious associations when population ancestry is correlated with both the genotype and the trait. Our algorithm Geographic Ancestry Positioning (GAP) relates local genetic distances between samples to their spatial distances, and can be used for visually discerning population structure as well as accurately inferring the spatial origin of individuals on a two-dimensional continuum. On both simulated and several real datasets from diverse human populations, GAP exhibits substantially lower error in reconstructing spatial ancestry coordinates compared to PCA. We also develop an association test that uses the ancestry coordinates inferred by GAP to accurately account for ancestry-induced correlations in GWAS. Based on simulations and analysis of a dataset of 10 metabolic traits measured in a Northern Finland cohort, which is known to exhibit significant population structure, we find that our method has superior power to current approaches. Our software is available at https://github.com/anand-bhaskar/gap . abhaskar@stanford.edu or ajavanma@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Data for Renewable Energy Planning, Policy, and Investment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cox, Sarah L
Reliable, robust, and validated data are critical for informed planning, policy development, and investment in the clean energy sector. The Renewable Energy (RE) Explorer was developed to support data-driven renewable energy analysis that can inform key renewable energy decisions globally. This document presents the types of geospatial and other data at the core of renewable energy analysis and decision making. Individual data sets used to inform decisions vary in relation to spatial and temporal resolution, quality, and overall usefulness. From Data to Decisions, a complementary geospatial data and analysis decision guide, provides an in-depth view of these and other considerationsmore » to enable data-driven planning, policymaking, and investment. Data support a wide variety of renewable energy analyses and decisions, including technical and economic potential assessment, renewable energy zone analysis, grid integration, risk and resiliency identification, electrification, and distributed solar photovoltaic potential. This fact sheet provides information on the types of data that are important for renewable energy decision making using the RE Data Explorer or similar types of geospatial analysis tools.« less
Hiratani, Naoki; Fukai, Tomoki
2016-01-01
In the adult mammalian cortex, a small fraction of spines are created and eliminated every day, and the resultant synaptic connection structure is highly nonrandom, even in local circuits. However, it remains unknown whether a particular synaptic connection structure is functionally advantageous in local circuits, and why creation and elimination of synaptic connections is necessary in addition to rich synaptic weight plasticity. To answer these questions, we studied an inference task model through theoretical and numerical analyses. We demonstrate that a robustly beneficial network structure naturally emerges by combining Hebbian-type synaptic weight plasticity and wiring plasticity. Especially in a sparsely connected network, wiring plasticity achieves reliable computation by enabling efficient information transmission. Furthermore, the proposed rule reproduces experimental observed correlation between spine dynamics and task performance. PMID:27303271
NASA Astrophysics Data System (ADS)
González, D. L., II; Angus, M. P.; Tetteh, I. K.; Bello, G. A.; Padmanabhan, K.; Pendse, S. V.; Srinivas, S.; Yu, J.; Semazzi, F.; Kumar, V.; Samatova, N. F.
2015-01-01
Decades of hypothesis-driven and/or first-principles research have been applied towards the discovery and explanation of the mechanisms that drive climate phenomena, such as western African Sahel summer rainfall~variability. Although connections between various climate factors have been theorized, not all of the key relationships are fully understood. We propose a data-driven approach to identify candidate players in this climate system, which can help explain underlying mechanisms and/or even suggest new relationships, to facilitate building a more comprehensive and predictive model of the modulatory relationships influencing a climate phenomenon of interest. We applied coupled heterogeneous association rule mining (CHARM), Lasso multivariate regression, and dynamic Bayesian networks to find relationships within a complex system, and explored means with which to obtain a consensus result from the application of such varied methodologies. Using this fusion of approaches, we identified relationships among climate factors that modulate Sahel rainfall. These relationships fall into two categories: well-known associations from prior climate knowledge, such as the relationship with the El Niño-Southern Oscillation (ENSO) and putative links, such as North Atlantic Oscillation, that invite further research.
Gonzalez, II, D. L.; Angus, M. P.; Tetteh, I. K.; ...
2015-01-13
Decades of hypothesis-driven and/or first-principles research have been applied towards the discovery and explanation of the mechanisms that drive climate phenomena, such as western African Sahel summer rainfall~variability. Although connections between various climate factors have been theorized, not all of the key relationships are fully understood. We propose a data-driven approach to identify candidate players in this climate system, which can help explain underlying mechanisms and/or even suggest new relationships, to facilitate building a more comprehensive and predictive model of the modulatory relationships influencing a climate phenomenon of interest. We applied coupled heterogeneous association rule mining (CHARM), Lasso multivariate regression,more » and dynamic Bayesian networks to find relationships within a complex system, and explored means with which to obtain a consensus result from the application of such varied methodologies. Using this fusion of approaches, we identified relationships among climate factors that modulate Sahel rainfall. As a result, these relationships fall into two categories: well-known associations from prior climate knowledge, such as the relationship with the El Niño–Southern Oscillation (ENSO) and putative links, such as North Atlantic Oscillation, that invite further research.« less
Quantile regression in the presence of monotone missingness with sensitivity analysis
Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.
2016-01-01
In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008
Transcriptome sequences resolve deep relationships of the grape family.
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.
ERIC Educational Resources Information Center
King, Gary; Gakidou, Emmanuela; Ravishankar, Nirmala; Moore, Ryan T.; Lakin, Jason; Vargas, Manett; Tellez-Rojo, Martha Maria; Avila, Juan Eugenio Hernandez; Avila, Mauricio Hernandez; Llamas, Hector Hernandez
2007-01-01
We develop an approach to conducting large-scale randomized public policy experiments intended to be more robust to the political interventions that have ruined some or all parts of many similar previous efforts. Our proposed design is insulated from selection bias in some circumstances even if we lose observations; our inferences can still be…
Estimation of flow properties using surface deformation and head data: A trajectory-based approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vasco, D.W.
2004-07-12
A trajectory-based algorithm provides an efficient and robust means to infer flow properties from surface deformation and head data. The algorithm is based upon the concept of an ''arrival time'' of a drawdown front, which is defined as the time corresponding to the maximum slope of the drawdown curve. The technique involves three steps: the inference of head changes as a function of position and time, the use of the estimated head changes to define arrival times, and the inversion of the arrival times for flow properties. Trajectories, computed from the output of a numerical simulator, are used to relatemore » the drawdown arrival times to flow properties. The inversion algorithm is iterative, requiring one reservoir simulation for each iteration. The method is applied to data from a set of 14 tiltmeters, located at the Raymond Quarry field site in California. Using the technique, I am able to image a high-conductivity channel which extends to the south of the pumping well. The presence of th is permeable pathway is supported by an analysis of earlier cross-well transient pressure test data.« less
Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir
2011-01-01
Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353
Wimberley, Catriona J; Fischer, Kristina; Reilhac, Anthonin; Pichler, Bernd J; Gregoire, Marie Claude
2014-10-01
The partial saturation approach (PSA) is a simple, single injection experimental protocol that will estimate both B(avail) and appK(D) without the use of blood sampling. This makes it ideal for use in longitudinal studies of neurodegenerative diseases in the rodent. The aim of this study was to increase the range and applicability of the PSA by developing a data driven strategy for determining reliable regional estimates of receptor density (B(avail)) and in vivo affinity (1/appK(D)), and validate the strategy using a simulation model. The data driven method uses a time window guided by the dynamic equilibrium state of the system as opposed to using a static time window. To test the method, simulations of partial saturation experiments were generated and validated against experimental data. The experimental conditions simulated included a range of receptor occupancy levels and three different B(avail) and appK(D) values to mimic diseases states. Also the effect of using a reference region and typical PET noise on the stability and accuracy of the estimates was investigated. The investigations showed that the parameter estimates in a simulated healthy mouse, using the data driven method were within 10±30% of the simulated input for the range of occupancy levels simulated. Throughout all experimental conditions simulated, the accuracy and robustness of the estimates using the data driven method were much improved upon the typical method of using a static time window, especially at low receptor occupancy levels. Introducing a reference region caused a bias of approximately 10% over the range of occupancy levels. Based on extensive simulated experimental conditions, it was shown the data driven method provides accurate and precise estimates of B(avail) and appK(D) for a broader range of conditions compared to the original method. Copyright © 2014 Elsevier Inc. All rights reserved.
Fast and accurate inference of local ancestry in Latino populations
Baran, Yael; Pasaniuc, Bogdan; Sankararaman, Sriram; Torgerson, Dara G.; Gignoux, Christopher; Eng, Celeste; Rodriguez-Cintron, William; Chapela, Rocio; Ford, Jean G.; Avila, Pedro C.; Rodriguez-Santana, Jose; Burchard, Esteban Gonzàlez; Halperin, Eran
2012-01-01
Motivation: It is becoming increasingly evident that the analysis of genotype data from recently admixed populations is providing important insights into medical genetics and population history. Such analyses have been used to identify novel disease loci, to understand recombination rate variation and to detect recent selection events. The utility of such studies crucially depends on accurate and unbiased estimation of the ancestry at every genomic locus in recently admixed populations. Although various methods have been proposed and shown to be extremely accurate in two-way admixtures (e.g. African Americans), only a few approaches have been proposed and thoroughly benchmarked on multi-way admixtures (e.g. Latino populations of the Americas). Results: To address these challenges we introduce here methods for local ancestry inference which leverage the structure of linkage disequilibrium in the ancestral population (LAMP-LD), and incorporate the constraint of Mendelian segregation when inferring local ancestry in nuclear family trios (LAMP-HAP). Our algorithms uniquely combine hidden Markov models (HMMs) of haplotype diversity within a novel window-based framework to achieve superior accuracy as compared with published methods. Further, unlike previous methods, the structure of our HMM does not depend on the number of reference haplotypes but on a fixed constant, and it is thereby capable of utilizing large datasets while remaining highly efficient and robust to over-fitting. Through simulations and analysis of real data from 489 nuclear trio families from the mainland US, Puerto Rico and Mexico, we demonstrate that our methods achieve superior accuracy compared with published methods for local ancestry inference in Latinos. Availability: http://lamp.icsi.berkeley.edu/lamp/lampld/ Contact: bpasaniu@hsph.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22495753
Inferring sex-specific demographic history from SNP data
Gautier, Mathieu
2018-01-01
The relative female and male contributions to demography are of great importance to better understand the history and dynamics of populations. While earlier studies relied on uniparental markers to investigate sex-specific questions, the increasing amount of sequence data now enables us to take advantage of tens to hundreds of thousands of independent loci from autosomes and the X chromosome. Here, we develop a novel method to estimate effective sex ratios or ESR (defined as the female proportion of the effective population) from allele count data for each branch of a rooted tree topology that summarizes the history of the populations of interest. Our method relies on Kimura’s time-dependent diffusion approximation for genetic drift, and is based on a hierarchical Bayesian model to integrate over the allele frequencies along the branches. We show via simulations that parameters are inferred robustly, even under scenarios that violate some of the model assumptions. Analyzing bovine SNP data, we infer a strongly female-biased ESR in both dairy and beef cattle, as expected from the underlying breeding scheme. Conversely, we observe a strongly male-biased ESR in early domestication times, consistent with an easier taming and management of cows, and/or introgression from wild auroch males, that would both cause a relative increase in male effective population size. In humans, analyzing a subsample of non-African populations, we find a male-biased ESR in Oceanians that may reflect complex marriage patterns in Aboriginal Australians. Because our approach relies on allele count data, it may be applied on a wide range of species. PMID:29385127
2dFLenS and KiDS: determining source redshift distributions with cross-correlations
NASA Astrophysics Data System (ADS)
Johnson, Andrew; Blake, Chris; Amon, Alexandra; Erben, Thomas; Glazebrook, Karl; Harnois-Deraps, Joachim; Heymans, Catherine; Hildebrandt, Hendrik; Joudaki, Shahab; Klaes, Dominik; Kuijken, Konrad; Lidman, Chris; Marin, Felipe A.; McFarland, John; Morrison, Christopher B.; Parkinson, David; Poole, Gregory B.; Radovich, Mario; Wolf, Christian
2017-03-01
We develop a statistical estimator to infer the redshift probability distribution of a photometric sample of galaxies from its angular cross-correlation in redshift bins with an overlapping spectroscopic sample. This estimator is a minimum-variance weighted quadratic function of the data: a quadratic estimator. This extends and modifies the methodology presented by McQuinn & White. The derived source redshift distribution is degenerate with the source galaxy bias, which must be constrained via additional assumptions. We apply this estimator to constrain source galaxy redshift distributions in the Kilo-Degree imaging survey through cross-correlation with the spectroscopic 2-degree Field Lensing Survey, presenting results first as a binned step-wise distribution in the range z < 0.8, and then building a continuous distribution using a Gaussian process model. We demonstrate the robustness of our methodology using mock catalogues constructed from N-body simulations, and comparisons with other techniques for inferring the redshift distribution.
ERIC Educational Resources Information Center
Chen, Yu-Hua; Baker, Paul
2016-01-01
In this study, we investigated criterial discourse features in L2 writing through the use of recurrent word combinations, a.k.a. lexical bundles, taking a corpus-driven and expert-judged approach by examining L2 English data across various proficiency levels from L1 Chinese learners. Proficiency was determined by a robust rating procedure which is…
Ma, Peng-Fei; Zhang, Yu-Xiao; Zeng, Chun-Xia; Guo, Zhen-Hua; Li, De-Zhu
2014-11-01
The temperate woody bamboos constitute a distinct tribe Arundinarieae (Poaceae: Bambusoideae) with high species diversity. Estimating phylogenetic relationships among the 11 major lineages of Arundinarieae has been particularly difficult, owing to a possible rapid radiation and the extremely low rate of sequence divergence. Here, we explore the use of chloroplast genome sequencing for phylogenetic inference. We sampled 25 species (22 temperate bamboos and 3 outgroups) for the complete genome representing eight major lineages of Arundinarieae in an attempt to resolve backbone relationships. Phylogenetic analyses of coding versus noncoding sequences, and of different regions of the genome (large single copy and small single copy, and inverted repeat regions) yielded no well-supported contradicting topologies but potential incongruence was found between the coding and noncoding sequences. The use of various data partitioning schemes in analysis of the complete sequences resulted in nearly identical topologies and node support values, although the partitioning schemes were decisively different from each other as to the fit to the data. Our full genomic data set substantially increased resolution along the backbone and provided strong support for most relationships despite the very short internodes and long branches in the tree. The inferred relationships were also robust to potential confounding factors (e.g., long-branch attraction) and received support from independent indels in the genome. We then added taxa from the three Arundinarieae lineages that were not included in the full-genome data set; each of these were sampled for more than 50% genome sequences. The resulting trees not only corroborated the reconstructed deep-level relationships but also largely resolved the phylogenetic placements of these three additional lineages. Furthermore, adding 129 additional taxa sampled for only eight chloroplast loci to the combined data set yielded almost identical relationships, albeit with low support values. We believe that the inferred phylogeny is robust to taxon sampling. Having resolved the deep-level relationships of Arundinarieae, we illuminate how chloroplast phylogenomics can be used for elucidating difficult phylogeny at low taxonomic levels in intractable plant groups. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Helioseismic and neutrino data-driven reconstruction of solar properties
NASA Astrophysics Data System (ADS)
Song, Ningqiang; Gonzalez-Garcia, M. C.; Villante, Francesco L.; Vinyoles, Nuria; Serenelli, Aldo
2018-06-01
In this work, we use Bayesian inference to quantitatively reconstruct the solar properties most relevant to the solar composition problem using as inputs the information provided by helioseismic and solar neutrino data. In particular, we use a Gaussian process to model the functional shape of the opacity uncertainty to gain flexibility and become as free as possible from prejudice in this regard. With these tools we first readdress the statistical significance of the solar composition problem. Furthermore, starting from a composition unbiased set of standard solar models (SSMs) we are able to statistically select those with solar chemical composition and other solar inputs which better describe the helioseismic and neutrino observations. In particular, we are able to reconstruct the solar opacity profile in a data-driven fashion, independently of any reference opacity tables, obtaining a 4 per cent uncertainty at the base of the convective envelope and 0.8 per cent at the solar core. When systematic uncertainties are included, results are 7.5 per cent and 2 per cent, respectively. In addition, we find that the values of most of the other inputs of the SSMs required to better describe the helioseismic and neutrino data are in good agreement with those adopted as the standard priors, with the exception of the astrophysical factor S11 and the microscopic diffusion rates, for which data suggests a 1 per cent and 30 per cent reduction, respectively. As an output of the study we derive the corresponding data-driven predictions for the solar neutrino fluxes.
Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A
2012-01-01
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
Lasko, Thomas A; Denny, Joshua C; Levy, Mia A
2013-01-01
Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don't think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data - Electronic Medical Records - typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies.
Lasko, Thomas A.; Denny, Joshua C.; Levy, Mia A.
2013-01-01
Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don’t think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data – Electronic Medical Records – typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies. PMID:23826094
Data-driven process decomposition and robust online distributed modelling for large-scale processes
NASA Astrophysics Data System (ADS)
Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou
2018-02-01
With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.
Reverse engineering systems models of regulation: discovery, prediction and mechanisms.
Ashworth, Justin; Wurtmann, Elisabeth J; Baliga, Nitin S
2012-08-01
Biological systems can now be understood in comprehensive and quantitative detail using systems biology approaches. Putative genome-scale models can be built rapidly based upon biological inventories and strategic system-wide molecular measurements. Current models combine statistical associations, causative abstractions, and known molecular mechanisms to explain and predict quantitative and complex phenotypes. This top-down 'reverse engineering' approach generates useful organism-scale models despite noise and incompleteness in data and knowledge. Here we review and discuss the reverse engineering of biological systems using top-down data-driven approaches, in order to improve discovery, hypothesis generation, and the inference of biological properties. Copyright © 2011 Elsevier Ltd. All rights reserved.
Liu, Yixin; Zhou, Kai; Lei, Yu
2015-01-01
High temperature gas sensors have been highly demanded for combustion process optimization and toxic emissions control, which usually suffer from poor selectivity. In order to solve this selectivity issue and identify unknown reducing gas species (CO, CH 4 , and CH 8 ) and concentrations, a high temperature resistive sensor array data set was built in this study based on 5 reported sensors. As each sensor showed specific responses towards different types of reducing gas with certain concentrations, based on which calibration curves were fitted, providing benchmark sensor array response database, then Bayesian inference framework was utilized to process themore » sensor array data and build a sample selection program to simultaneously identify gas species and concentration, by formulating proper likelihood between input measured sensor array response pattern of an unknown gas and each sampled sensor array response pattern in benchmark database. This algorithm shows good robustness which can accurately identify gas species and predict gas concentration with a small error of less than 10% based on limited amount of experiment data. These features indicate that Bayesian probabilistic approach is a simple and efficient way to process sensor array data, which can significantly reduce the required computational overhead and training data.« less
Foreground contribution to the inferred cosmological parameters from Planck
NASA Astrophysics Data System (ADS)
Vincent, Aaron C.; Wibig, Tadeusz; Wolfendale, Arnold W.
Previous analyses of cosmic microwave background (CMB) measurements [T. Wibig and A. W. Wolfendale, Mon. Not. R. Astron. Soc. 360 (2005) 236, arXiv:astro-ph/0409397; Mon. Not. R. Astron. Soc. 448 (2015) 1030, arXiv:1507.0677.] have revealed contamination by areas of high cosmic ray activity in the Milky Way. Here, we update studies, looking at the most recent Planck release of residual maps. We search for possible effects of foreground contamination in the reconstruction of the ΛCDM cosmological parameters. We focus on the Hubble parameter H0 and the optical depth to reionization τ, both of which exhibit discrepancies between CMB-inferred values and low-redshift measurements (“the delta H0 problem”). Using the publicly available “component separated” Planck temperature maps, we single out three distinct regions: the “loops”, “chimneys” and “low CR” regions, which disproportionately contributed to CR contamination of WMAP data. We find that two of the four maps are strongly affected by removal of anomalously high or low CR activity regions. However, the Commander method, used to produce the angular power spectrum at low ( < 30) multipoles in cosmological analyses, appears robust under these changes. Finally, we use the inferred Hubble parameter H0 as a proxy to look for general directional dependence of the CMB power spectrum, finding a small but robust dependence on the Galactic longitude. Although there is some evidence for a continuing CR contamination, it is insufficient to provide an answer to the delta H0 problem, or to the optical depth problem, though dependence of the derived H0 on direction seems significant. The geometrical pattern — striations along constant longitudes — suggests CR contamination as distinct from a truly cosmological effect.
Statistical inference for noisy nonlinear ecological dynamic systems.
Wood, Simon N
2010-08-26
Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Detecting phenotype-driven transitions in regulatory network structure.
Padi, Megha; Quackenbush, John
2018-01-01
Complex traits and diseases like human height or cancer are often not caused by a single mutation or genetic variant, but instead arise from functional changes in the underlying molecular network. Biological networks are known to be highly modular and contain dense "communities" of genes that carry out cellular processes, but these structures change between tissues, during development, and in disease. While many methods exist for inferring networks and analyzing their topologies separately, there is a lack of robust methods for quantifying differences in network structure. Here, we describe ALPACA (ALtered Partitions Across Community Architectures), a method for comparing two genome-scale networks derived from different phenotypic states to identify condition-specific modules. In simulations, ALPACA leads to more nuanced, sensitive, and robust module discovery than currently available network comparison methods. As an application, we use ALPACA to compare transcriptional networks in three contexts: angiogenic and non-angiogenic subtypes of ovarian cancer, human fibroblasts expressing transforming viral oncogenes, and sexual dimorphism in human breast tissue. In each case, ALPACA identifies modules enriched for processes relevant to the phenotype. For example, modules specific to angiogenic ovarian tumors are enriched for genes associated with blood vessel development, and modules found in female breast tissue are enriched for genes involved in estrogen receptor and ERK signaling. The functional relevance of these new modules suggests that not only can ALPACA identify structural changes in complex networks, but also that these changes may be relevant for characterizing biological phenotypes.
Mixture-based gatekeeping procedures in adaptive clinical trials.
Kordzakhia, George; Dmitrienko, Alex; Ishida, Eiji
2018-01-01
Clinical trials with data-driven decision rules often pursue multiple clinical objectives such as the evaluation of several endpoints or several doses of an experimental treatment. These complex analysis strategies give rise to "multivariate" multiplicity problems with several components or sources of multiplicity. A general framework for defining gatekeeping procedures in clinical trials with adaptive multistage designs is proposed in this paper. The mixture method is applied to build a gatekeeping procedure at each stage and inferences at each decision point (interim or final analysis) are performed using the combination function approach. An advantage of utilizing the mixture method is that it enables powerful gatekeeping procedures applicable to a broad class of settings with complex logical relationships among the hypotheses of interest. Further, the combination function approach supports flexible data-driven decisions such as a decision to increase the sample size or remove a treatment arm. The paper concludes with a clinical trial example that illustrates the methodology by applying it to develop an adaptive two-stage design with a mixture-based gatekeeping procedure.
NASA Astrophysics Data System (ADS)
Sanchez, Christopher A.; Ruddell, Benjamin L.; Schiesser, Roy; Merwade, Venkatesh
2016-03-01
Previous research has suggested that the use of more authentic learning activities can produce more robust and durable knowledge gains. This is consistent with calls within civil engineering education, specifically hydrology, that suggest that curricula should more often include professional perspective and data analysis skills to better develop the "T-shaped" knowledge profile of a professional hydrologist (i.e., professional breadth combined with technical depth). It was expected that the inclusion of a data-driven simulation lab exercise that was contextualized within a real-world situation and more consistent with the job duties of a professional in the field, would provide enhanced learning and appreciation of job duties beyond more conventional paper-and-pencil exercises in a lower-division undergraduate course. Results indicate that while students learned in both conditions, learning was enhanced for the data-driven simulation group in nearly every content area. This pattern of results suggests that the use of data-driven modeling and visualization activities can have a significant positive impact on instruction. This increase in learning likely facilitates the development of student perspective and conceptual mastery, enabling students to make better choices about their studies, while also better preparing them for work as a professional in the field.
NASA Astrophysics Data System (ADS)
Sanchez, C. A.; Ruddell, B. L.; Schiesser, R.; Merwade, V.
2015-07-01
Previous research has suggested that the use of more authentic learning activities can produce more robust and durable knowledge gains. This is consistent with calls within civil engineering education, specifically hydrology, that suggest that curricula should more often include professional perspective and data analysis skills to better develop the "T-shaped" knowledge profile of a professional hydrologist (i.e., professional breadth combined with technical depth). It was expected that the inclusion of a data driven simulation lab exercise that was contextualized within a real-world situation and more consistent with the job duties of a professional in the field, would provide enhanced learning and appreciation of job duties beyond more conventional paper-and-pencil exercises in a lower division undergraduate course. Results indicate that while students learned in both conditions, learning was enhanced for the data-driven simulation group in nearly every content area. This pattern of results suggests that the use of data-driven modeling and visualization activities can have a significant positive impact on instruction. This increase in learning likely facilitates the development of student perspective and conceptual mastery, enabling students to make better choices about their studies, while also better preparing them for work as a professional in the field.
Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair
NASA Astrophysics Data System (ADS)
Sasou, Akira; Kojima, Hiroaki
2009-12-01
Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roux, Simon; Emerson, Joanne B.; Eloe-Fadrosh, Emiley A.
BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, andmore » IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less
Roux, Simon; Emerson, Joanne B.; Eloe-Fadrosh, Emiley A.; ...
2017-09-21
BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, andmore » IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less
Johnson, Catherine L; Phillips, Roger J; Purucker, Michael E; Anderson, Brian J; Byrne, Paul K; Denevi, Brett W; Feinberg, Joshua M; Hauck, Steven A; Head, James W; Korth, Haje; James, Peter B; Mazarico, Erwan; Neumann, Gregory A; Philpott, Lydia C; Siegler, Matthew A; Tsyganenko, Nikolai A; Solomon, Sean C
2015-05-22
Magnetized rocks can record the history of the magnetic field of a planet, a key constraint for understanding its evolution. From orbital vector magnetic field measurements of Mercury taken by the MErcury Surface, Space ENvironment, GEochemistry, and Ranging (MESSENGER) spacecraft at altitudes below 150 kilometers, we have detected remanent magnetization in Mercury's crust. We infer a lower bound on the average age of magnetization of 3.7 to 3.9 billion years. Our findings indicate that a global magnetic field driven by dynamo processes in the fluid outer core operated early in Mercury's history. Ancient field strengths that range from those similar to Mercury's present dipole field to Earth-like values are consistent with the magnetic field observations and with the low iron content of Mercury's crust inferred from MESSENGER elemental composition data. Copyright © 2015, American Association for the Advancement of Science.
Day length unlikely to constrain climate-driven shifts in leaf-out times of northern woody plants
NASA Astrophysics Data System (ADS)
Zohner, Constantin M.; Benito, Blas M.; Svenning, Jens-Christian; Renner, Susanne S.
2016-12-01
The relative roles of temperature and day length in driving spring leaf unfolding are known for few species, limiting our ability to predict phenology under climate warming. Using experimental data, we assess the importance of photoperiod as a leaf-out regulator in 173 woody species from throughout the Northern Hemisphere, and we also infer the influence of winter duration, temperature seasonality, and inter-annual temperature variability. We combine results from climate- and light-controlled chambers with species’ native climate niches inferred from georeferenced occurrences and range maps. Of the 173 species, only 35% relied on spring photoperiod as a leaf-out signal. Contrary to previous suggestions, these species come from lower latitudes, whereas species from high latitudes with long winters leafed out independent of photoperiod. The strong effect of species’ geographic-climatic history on phenological strategies complicates the prediction of community-wide phenological change.
Experimental Observation of a Current-Driven Instability in a Neutral Electron-Positron Beam.
Warwick, J; Dzelzainis, T; Dieckmann, M E; Schumaker, W; Doria, D; Romagnani, L; Poder, K; Cole, J M; Alejo, A; Yeung, M; Krushelnick, K; Mangles, S P D; Najmudin, Z; Reville, B; Samarin, G M; Symes, D D; Thomas, A G R; Borghesi, M; Sarri, G
2017-11-03
We report on the first experimental observation of a current-driven instability developing in a quasineutral matter-antimatter beam. Strong magnetic fields (≥1 T) are measured, via means of a proton radiography technique, after the propagation of a neutral electron-positron beam through a background electron-ion plasma. The experimentally determined equipartition parameter of ε_{B}≈10^{-3} is typical of values inferred from models of astrophysical gamma-ray bursts, in which the relativistic flows are also expected to be pair dominated. The data, supported by particle-in-cell simulations and simple analytical estimates, indicate that these magnetic fields persist in the background plasma for thousands of inverse plasma frequencies. The existence of such long-lived magnetic fields can be related to analog astrophysical systems, such as those prevalent in lepton-dominated jets.
Experimental Observation of a Current-Driven Instability in a Neutral Electron-Positron Beam
NASA Astrophysics Data System (ADS)
Warwick, J.; Dzelzainis, T.; Dieckmann, M. E.; Schumaker, W.; Doria, D.; Romagnani, L.; Poder, K.; Cole, J. M.; Alejo, A.; Yeung, M.; Krushelnick, K.; Mangles, S. P. D.; Najmudin, Z.; Reville, B.; Samarin, G. M.; Symes, D. D.; Thomas, A. G. R.; Borghesi, M.; Sarri, G.
2017-11-01
We report on the first experimental observation of a current-driven instability developing in a quasineutral matter-antimatter beam. Strong magnetic fields (≥1 T ) are measured, via means of a proton radiography technique, after the propagation of a neutral electron-positron beam through a background electron-ion plasma. The experimentally determined equipartition parameter of ɛB≈10-3 is typical of values inferred from models of astrophysical gamma-ray bursts, in which the relativistic flows are also expected to be pair dominated. The data, supported by particle-in-cell simulations and simple analytical estimates, indicate that these magnetic fields persist in the background plasma for thousands of inverse plasma frequencies. The existence of such long-lived magnetic fields can be related to analog astrophysical systems, such as those prevalent in lepton-dominated jets.
Zavou, Christina; Kkoushi, Antria; Koutsou, Achilleas; Christodoulou, Chris
2017-11-01
The aim of the current work is twofold: firstly to adapt an existing method measuring the input synchrony of a neuron driven only by excitatory inputs in such a way so as to account for inhibitory inputs as well and secondly to further appropriately adapt this measure so as to be correctly utilised on experimentally-recorded data. The existing method uses the normalized pre-spike slope (NPSS) of the membrane potential, resulting from observing the slope of depolarization of the membrane potential of a neuron prior to the moment of crossing the threshold within a short period of time, to identify the response-relevant input synchrony and through it to infer the operational mode of a neuron. The first adaptation of NPSS is made such that its upper bound calculation accommodates for the higher possible slope values caused by the lower average and minimum membrane potential values due to inhibitory inputs. Results indicate that when the input spike trains arrive randomly, the modified NPSS works as expected inferring that the neuron is operating as a temporal integrator. When the input spike trains arrive in perfect synchrony though, the modified NPSS works as expected only when the level of inhibition is much higher than the level of excitation. This suggests that calculation of the upper bound of the NPSS should be a function of the ratio between excitatory and inhibitory inputs in order to be able to correctly capture perfect synchrony at a neuron's input. In addition, we effectively demonstrate a process which has to be followed when aiming to use the NPSS on real neuron recordings. This process, which relies on empirical observations of the slope of depolarisation for estimating the bounds for the range of observed interspike interval lengths, is successfully applied to experimentally-recorded data showing that through it both a real neuron's operational mode and the amount of input synchrony that caused its firing can be inferred. Copyright © 2017 Elsevier B.V. All rights reserved.
Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.
Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W
2017-02-01
Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M
2010-01-01
Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.
A Novel Online Data-Driven Algorithm for Detecting UAV Navigation Sensor Faults.
Sun, Rui; Cheng, Qi; Wang, Guanyu; Ochieng, Washington Yotto
2017-09-29
The use of Unmanned Aerial Vehicles (UAVs) has increased significantly in recent years. On-board integrated navigation sensors are a key component of UAVs' flight control systems and are essential for flight safety. In order to ensure flight safety, timely and effective navigation sensor fault detection capability is required. In this paper, a novel data-driven Adaptive Neuron Fuzzy Inference System (ANFIS)-based approach is presented for the detection of on-board navigation sensor faults in UAVs. Contrary to the classic UAV sensor fault detection algorithms, based on predefined or modelled faults, the proposed algorithm combines an online data training mechanism with the ANFIS-based decision system. The main advantages of this algorithm are that it allows real-time model-free residual analysis from Kalman Filter (KF) estimates and the ANFIS to build a reliable fault detection system. In addition, it allows fast and accurate detection of faults, which makes it suitable for real-time applications. Experimental results have demonstrated the effectiveness of the proposed fault detection method in terms of accuracy and misdetection rate.
Pataky, Todd C; Robinson, Mark A; Vanrenterghem, Jos
2018-01-03
Statistical power assessment is an important component of hypothesis-driven research but until relatively recently (mid-1990s) no methods were available for assessing power in experiments involving continuum data and in particular those involving one-dimensional (1D) time series. The purpose of this study was to describe how continuum-level power analyses can be used to plan hypothesis-driven biomechanics experiments involving 1D data. In particular, we demonstrate how theory- and pilot-driven 1D effect modeling can be used for sample-size calculations for both single- and multi-subject experiments. For theory-driven power analysis we use the minimum jerk hypothesis and single-subject experiments involving straight-line, planar reaching. For pilot-driven power analysis we use a previously published knee kinematics dataset. Results show that powers on the order of 0.8 can be achieved with relatively small sample sizes, five and ten for within-subject minimum jerk analysis and between-subject knee kinematics, respectively. However, the appropriate sample size depends on a priori justifications of biomechanical meaning and effect size. The main advantage of the proposed technique is that it encourages a priori justification regarding the clinical and/or scientific meaning of particular 1D effects, thereby robustly structuring subsequent experimental inquiry. In short, it shifts focus from a search for significance to a search for non-rejectable hypotheses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Reid, Michael J C; Switzer, William M; Schillaci, Michael A; Ragonnet-Cronin, Manon; Joanisse, Isabelle; Caminiti, Kyna; Lowenberger, Carl A; Galdikas, Birute Mary F; Sandstrom, Paul A; Brooks, James I
2016-09-01
While human T-lymphotropic virus type 1 (HTLV-1) originates from ancient cross-species transmission of simian T-lymphotropic virus type 1 (STLV-1) from infected nonhuman primates, much debate exists on whether the first HTLV-1 occurred in Africa, or in Asia during early human evolution and migration. This topic is complicated by a lack of representative Asian STLV-1 to infer PTLV-1 evolutionary histories. In this study we obtained new STLV-1 LTR and tax sequences from a wild-born Bornean orangutan (Pongo pygmaeus) and performed detailed phylogenetic analyses using both maximum likelihood and Bayesian inference of available Asian PTLV-1 and African STLV-1 sequences. Phylogenies, divergence dates and nucleotide substitution rates were co-inferred and compared using six different molecular clock calibrations in a Bayesian framework, including both archaeological and/or nucleotide substitution rate calibrations. We then combined our molecular results with paleobiogeographical and ecological data to infer the most likely evolutionary history of PTLV-1. Based on the preferred models our analyses robustly inferred an Asian source for PTLV-1 with cross-species transmission of STLV-1 likely from a macaque (Macaca sp.) to an orangutan about 37.9-48.9kya, and to humans between 20.3-25.5kya. An orangutan diversification of STLV-1 commenced approximately 6.4-7.3kya. Our analyses also inferred that HTLV-1 was first introduced into Australia ~3.1-3.7kya, corresponding to both genetic and archaeological changes occurring in Australia at that time. Finally, HTLV-1 appears in Melanesia at ~2.3-2.7kya corresponding to the migration of the Lapita peoples into the region. Our results also provide an important future reference for calibrating information essential for PTLV evolutionary timescale inference. Longer sequence data, or full genomes from a greater representation of Asian primates, including gibbons, leaf monkeys, and Sumatran orangutans are needed to fully elucidate these evolutionary dates and relationships using the model criteria suggested herein. Copyright © 2016 Elsevier B.V. All rights reserved.
Loewen, T N; Carriere, B; Reist, J D; Halden, N M; Anderson, W G
2016-12-01
Biomineral chemistry is frequently used to infer life history events and habitat use in fishes; however, significant gaps remain in our understanding of the underlying mechanisms. Here we have taken a multidisciplinary approach to review the current understanding of element incorporation into biomineralized structures in fishes. Biominerals are primarily composed of calcium-based derivatives such as calcium carbonate found in otoliths and calcium phosphates found in scales, fins and bones. By focusing on non-essential life elements (strontium and barium) and essential life elements (calcium, zinc and magnesium), we attempt to connect several fields of study to synergise how physiology may influence biomineralization and subsequent inference of life history. Data provided in this review indicate that the presence of non-essential elements in biominerals of fish is driven primarily by hypo- and hyper-calcemic environmental conditions. The uptake kinetics between environmental calcium and its competing mimics define what is ultimately incorporated in the biomineral structure. Conversely, circannual hormonally driven variations likely influence essential life elements like zinc that are known to associate with enzyme function. Environmental temperature and pH as well as uptake kinetics for strontium and barium isotopes demonstrate the role of mass fractionation in isotope selection for uptake into fish bony structures. In consideration of calcium mobilisation, the action of osteoclast-like cells on calcium phosphates of scales, fins and bones likely plays a role in fractionation along with transport kinetics. Additional investigations into calcium mobilisation are warranted to understand differing views of strontium, and barium isotope fractionation between calcium phosphates and calcium carbonate structures in fishes. Copyright © 2016 Elsevier Inc. All rights reserved.
Order restricted inference for oscillatory systems for detecting rhythmic signals
Larriba, Yolanda; Rueda, Cristina; Fernández, Miguel A.; Peddada, Shyamal D.
2016-01-01
Motivation: Many biological processes, such as cell cycle, circadian clock, menstrual cycles, are governed by oscillatory systems consisting of numerous components that exhibit rhythmic patterns over time. It is not always easy to identify such rhythmic components. For example, it is a challenging problem to identify circadian genes in a given tissue using time-course gene expression data. There is a great potential for misclassifying non-rhythmic as rhythmic genes and vice versa. This has been a problem of considerable interest in recent years. In this article we develop a constrained inference based methodology called Order Restricted Inference for Oscillatory Systems (ORIOS) to detect rhythmic signals. Instead of using mathematical functions (e.g. sinusoidal) to describe shape of rhythmic signals, ORIOS uses mathematical inequalities. Consequently, it is robust and not limited by the biologist's choice of the mathematical model. We studied the performance of ORIOS using simulated as well as real data obtained from mouse liver, pituitary gland and data from NIH3T3, U2OS cell lines. Our results suggest that, for a broad collection of patterns of gene expression, ORIOS has substantially higher power to detect true rhythmic genes in comparison to some popular methods, while also declaring substantially fewer non-rhythmic genes as rhythmic. Availability and Implementation: A user friendly code implemented in R language can be downloaded from http://www.niehs.nih.gov/research/atniehs/labs/bb/staff/peddada/index.cfm. Contact: peddada@niehs.nih.gov PMID:27596593
Robust evaluation of time series classification algorithms for structural health monitoring
NASA Astrophysics Data System (ADS)
Harvey, Dustin Y.; Worden, Keith; Todd, Michael D.
2014-03-01
Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and mechanical infrastructure through analysis of structural response measurements. The supervised learning methodology for data-driven SHM involves computation of low-dimensional, damage-sensitive features from raw measurement data that are then used in conjunction with machine learning algorithms to detect, classify, and quantify damage states. However, these systems often suffer from performance degradation in real-world applications due to varying operational and environmental conditions. Probabilistic approaches to robust SHM system design suffer from incomplete knowledge of all conditions a system will experience over its lifetime. Info-gap decision theory enables nonprobabilistic evaluation of the robustness of competing models and systems in a variety of decision making applications. Previous work employed info-gap models to handle feature uncertainty when selecting various components of a supervised learning system, namely features from a pre-selected family and classifiers. In this work, the info-gap framework is extended to robust feature design and classifier selection for general time series classification through an efficient, interval arithmetic implementation of an info-gap data model. Experimental results are presented for a damage type classification problem on a ball bearing in a rotating machine. The info-gap framework in conjunction with an evolutionary feature design system allows for fully automated design of a time series classifier to meet performance requirements under maximum allowable uncertainty.
Robust Gaussian Graphical Modeling via l1 Penalization
Sun, Hokeun; Li, Hongzhe
2012-01-01
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified-likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re-estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso. PMID:23020775
Anchored phylogenomics illuminates the skipper butterfly tree of life.
Toussaint, Emmanuel F A; Breinholt, Jesse W; Earl, Chandra; Warren, Andrew D; Brower, Andrew V Z; Yago, Masaya; Dexter, Kelly M; Espeland, Marianne; Pierce, Naomi E; Lohman, David J; Kawahara, Akito Y
2018-06-19
Butterflies (Papilionoidea) are perhaps the most charismatic insect lineage, yet phylogenetic relationships among them remain incompletely studied and controversial. This is especially true for skippers (Hesperiidae), one of the most species-rich and poorly studied butterfly families. To infer a robust phylogenomic hypothesis for Hesperiidae, we sequenced nearly 400 loci using Anchored Hybrid Enrichment and sampled all tribes and more than 120 genera of skippers. Molecular datasets were analyzed using maximum-likelihood, parsimony and coalescent multi-species phylogenetic methods. All analyses converged on a novel, robust phylogenetic hypothesis for skippers. Different optimality criteria and methodologies recovered almost identical phylogenetic trees with strong nodal support at nearly all nodes and all taxonomic levels. Our results support Coeliadinae as the sister group to the remaining skippers, the monotypic Euschemoninae as the sister group to all other subfamilies but Coeliadinae, and the monophyly of Eudaminae plus Pyrginae. Within Pyrginae, Celaenorrhinini and Tagiadini are sister groups, the Neotropical firetips, Pyrrhopygini, are sister to all other tribes but Celaenorrhinini and Tagiadini. Achlyodini is recovered as the sister group to Carcharodini, and Erynnini as sister group to Pyrgini. Within the grass skippers (Hesperiinae), there is strong support for the monophyly of Aeromachini plus remaining Hesperiinae. The giant skippers (Agathymus and Megathymus) once classified as a subfamily, are recovered as monophyletic with strong support, but are deeply nested within Hesperiinae. Anchored Hybrid Enrichment sequencing resulted in a large amount of data that built the foundation for a new, robust evolutionary tree of skippers. The newly inferred phylogenetic tree resolves long-standing systematic issues and changes our understanding of the skipper tree of life. These resultsenhance understanding of the evolution of one of the most species-rich butterfly families.
Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks
Yamanaka, Ryota; Kitano, Hiroaki
2013-01-01
Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. PMID:24278007
Early and mid-Holocene age for the Tempanos moraines, Laguna San Rafael, Patagonian Chile
NASA Astrophysics Data System (ADS)
Harrison, Stephan; Glasser, Neil F.; Duller, Geoff A. T.; Jansson, Krister N.
2012-01-01
Data about the nature and timing of Holocene events from the Southern Hemisphere, especially in southern South America, are required to provide insight into the extent and nature of past climate change in a region where land-based records are restricted. Here we present the first use of single grain Optically Stimulated Luminescence (OSL) dating of a moraine sequence recording glacial advance along the western side of the Patagonian Icefields. Dates from the Tempanos moraines at Laguna San Rafael (LSR) show that the San Rafael Glacier (SRG) advanced to maximum Holocene positions during the period 9.3 to 9.7 ka and at 5.7 ka. Outwash lying beneath the moraine in its northern portion, dated to 7.7 ka, indicates that the glacier front was also advanced at this time. Since these advances span both the regional early Holocene warm-dry phase (11.5 ka to 7.8 ka) and the subsequent cooling and rise in precipitation in the mid-late Holocene (since 6.6 ka) we infer that the advances of the SRG are not simply climate-driven, but that the glacier has also probably responded strongly to non-climatic stimuli such as internal ice dynamics and the transition between calving and non-calving. Many westwards-flowing glaciers in Patagonia were probably calving during much of the Late Pleistocene and Holocene, so we conclude that establishing robust glacial chronologies where climatic and non-climatic factors cannot be distinguished is likely to remain a challenge.
Lopes, J S; Arenas, M; Posada, D; Beaumont, M A
2014-03-01
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.
NASA Astrophysics Data System (ADS)
Chakraborty, Amitav; Roy, Sumit; Banerjee, Rahul
2018-03-01
This experimental work highlights the inherent capability of an adaptive-neuro fuzzy inference system (ANFIS) based model to act as a robust system identification tool (SIT) in prognosticating the performance and emission parameters of an existing diesel engine running of diesel-LPG dual fuel mode. The developed model proved its adeptness by successfully harnessing the effects of the input parameters of load, injection duration and LPG energy share on output parameters of BSFCEQ, BTE, NOX, SOOT, CO and HC. Successive evaluation of the ANFIS model, revealed high levels of resemblance with the already forecasted ANN results for the same input parameters and it was evident that similar to ANN, ANFIS also has the innate ability to act as a robust SIT. The ANFIS predicted data harmonized the experimental data with high overall accuracy. The correlation coefficient (R) values are stretched in between 0.99207 to 0.999988. The mean absolute percentage error (MAPE) tallies were recorded in the range of 0.02-0.173% with the root mean square errors (RMSE) in acceptable margins. Hence the developed model is capable of emulating the actual engine parameters with commendable ranges of accuracy, which in turn would act as a robust prediction platform in the future domains of optimization.
On statistical inference in time series analysis of the evolution of road safety.
Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora
2013-11-01
Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.
Lenton, T. M.; Livina, V. N.; Dakos, V.; Van Nes, E. H.; Scheffer, M.
2012-01-01
We address whether robust early warning signals can, in principle, be provided before a climate tipping point is reached, focusing on methods that seek to detect critical slowing down as a precursor of bifurcation. As a test bed, six previously analysed datasets are reconsidered, three palaeoclimate records approaching abrupt transitions at the end of the last ice age and three models of varying complexity forced through a collapse of the Atlantic thermohaline circulation. Approaches based on examining the lag-1 autocorrelation function or on detrended fluctuation analysis are applied together and compared. The effects of aggregating the data, detrending method, sliding window length and filtering bandwidth are examined. Robust indicators of critical slowing down are found prior to the abrupt warming event at the end of the Younger Dryas, but the indicators are less clear prior to the Bølling-Allerød warming, or glacial termination in Antarctica. Early warnings of thermohaline circulation collapse can be masked by inter-annual variability driven by atmospheric dynamics. However, rapidly decaying modes can be successfully filtered out by using a long bandwidth or by aggregating data. The two methods have complementary strengths and weaknesses and we recommend applying them together to improve the robustness of early warnings. PMID:22291229
Exact Bayesian Inference for Phylogenetic Birth-Death Models.
Parag, K V; Pybus, O G
2018-04-26
Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.
Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen
2018-03-01
Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.
A new framework for comprehensive, robust, and efficient global sensitivity analysis: 2. Application
NASA Astrophysics Data System (ADS)
Razavi, Saman; Gupta, Hoshin V.
2016-01-01
Based on the theoretical framework for sensitivity analysis called "Variogram Analysis of Response Surfaces" (VARS), developed in the companion paper, we develop and implement a practical "star-based" sampling strategy (called STAR-VARS), for the application of VARS to real-world problems. We also develop a bootstrap approach to provide confidence level estimates for the VARS sensitivity metrics and to evaluate the reliability of inferred factor rankings. The effectiveness, efficiency, and robustness of STAR-VARS are demonstrated via two real-data hydrological case studies (a 5-parameter conceptual rainfall-runoff model and a 45-parameter land surface scheme hydrology model), and a comparison with the "derivative-based" Morris and "variance-based" Sobol approaches are provided. Our results show that STAR-VARS provides reliable and stable assessments of "global" sensitivity across the full range of scales in the factor space, while being 1-2 orders of magnitude more efficient than the Morris or Sobol approaches.
Spatially explicit dynamic N-mixture models
Zhao, Qing; Royle, Andy; Boomer, G. Scott
2017-01-01
Knowledge of demographic parameters such as survival, reproduction, emigration, and immigration is essential to understand metapopulation dynamics. Traditionally the estimation of these demographic parameters requires intensive data from marked animals. The development of dynamic N-mixture models makes it possible to estimate demographic parameters from count data of unmarked animals, but the original dynamic N-mixture model does not distinguish emigration and immigration from survival and reproduction, limiting its ability to explain important metapopulation processes such as movement among local populations. In this study we developed a spatially explicit dynamic N-mixture model that estimates survival, reproduction, emigration, local population size, and detection probability from count data under the assumption that movement only occurs among adjacent habitat patches. Simulation studies showed that the inference of our model depends on detection probability, local population size, and the implementation of robust sampling design. Our model provides reliable estimates of survival, reproduction, and emigration when detection probability is high, regardless of local population size or the type of sampling design. When detection probability is low, however, our model only provides reliable estimates of survival, reproduction, and emigration when local population size is moderate to high and robust sampling design is used. A sensitivity analysis showed that our model is robust against the violation of the assumption that movement only occurs among adjacent habitat patches, suggesting wide applications of this model. Our model can be used to improve our understanding of metapopulation dynamics based on count data that are relatively easy to collect in many systems.
Xu, Jason; Guttorp, Peter; Kato-Maeda, Midori; Minin, Vladimir N
2015-12-01
Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements-important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a multi-type branching process approximation to BDS processes and develop a corresponding expectation maximization algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low-dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply broadly to multi-type branching processes whose rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections. © 2015, The International Biometric Society.
Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision
Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson
2014-01-01
The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339
Kappenman, Emily S; Keil, Andreas
2017-01-01
In recent years, the psychological and behavioral sciences have increased efforts to strengthen methodological practices and publication standards, with the ultimate goal of enhancing the value and reproducibility of published reports. These issues are especially important in the multidisciplinary field of psychophysiology, which yields rich and complex data sets with a large number of observations. In addition, the technological tools and analysis methods available in the field of psychophysiology are continually evolving, widening the array of techniques and approaches available to researchers. This special issue presents articles detailing rigorous and systematic evaluations of tasks, measures, materials, analysis approaches, and statistical practices in a variety of subdisciplines of psychophysiology. These articles highlight challenges in conducting and interpreting psychophysiological research and provide data-driven, evidence-based recommendations for overcoming those challenges to produce robust, reproducible results in the field of psychophysiology. © 2016 Society for Psychophysiological Research.
Bayesian classification theory
NASA Technical Reports Server (NTRS)
Hanson, Robin; Stutz, John; Cheeseman, Peter
1991-01-01
The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.
Transcriptome Sequences Resolve Deep Relationships of the Grape Family
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307
Inference of Spatio-Temporal Functions Over Graphs via Multikernel Kriged Kalman Filtering
NASA Astrophysics Data System (ADS)
Ioannidis, Vassilis N.; Romero, Daniel; Giannakis, Georgios B.
2018-06-01
Inference of space-time varying signals on graphs emerges naturally in a plethora of network science related applications. A frequently encountered challenge pertains to reconstructing such dynamic processes, given their values over a subset of vertices and time instants. The present paper develops a graph-aware kernel-based kriged Kalman filter that accounts for the spatio-temporal variations, and offers efficient online reconstruction, even for dynamically evolving network topologies. The kernel-based learning framework bypasses the need for statistical information by capitalizing on the smoothness that graph signals exhibit with respect to the underlying graph. To address the challenge of selecting the appropriate kernel, the proposed filter is combined with a multi-kernel selection module. Such a data-driven method selects a kernel attuned to the signal dynamics on-the-fly within the linear span of a pre-selected dictionary. The novel multi-kernel learning algorithm exploits the eigenstructure of Laplacian kernel matrices to reduce computational complexity. Numerical tests with synthetic and real data demonstrate the superior reconstruction performance of the novel approach relative to state-of-the-art alternatives.
The impact of temporal sampling resolution on parameter inference for biological transport models.
Harrison, Jonathan U; Baker, Ruth E
2018-06-25
Imaging data has become an essential tool to explore key biological questions at various scales, for example the motile behaviour of bacteria or the transport of mRNA, and it has the potential to transform our understanding of important transport mechanisms. Often these imaging studies require us to compare biological species or mutants, and to do this we need to quantitatively characterise their behaviour. Mathematical models offer a quantitative description of a system that enables us to perform this comparison, but to relate mechanistic mathematical models to imaging data, we need to estimate their parameters. In this work we study how collecting data at different temporal resolutions impacts our ability to infer parameters of biological transport models; performing exact inference for simple velocity jump process models in a Bayesian framework. The question of how best to choose the frequency with which data is collected is prominent in a host of studies because the majority of imaging technologies place constraints on the frequency with which images can be taken, and the discrete nature of observations can introduce errors into parameter estimates. In this work, we mitigate such errors by formulating the velocity jump process model within a hidden states framework. This allows us to obtain estimates of the reorientation rate and noise amplitude for noisy observations of a simple velocity jump process. We demonstrate the sensitivity of these estimates to temporal variations in the sampling resolution and extent of measurement noise. We use our methodology to provide experimental guidelines for researchers aiming to characterise motile behaviour that can be described by a velocity jump process. In particular, we consider how experimental constraints resulting in a trade-off between temporal sampling resolution and observation noise may affect parameter estimates. Finally, we demonstrate the robustness of our methodology to model misspecification, and then apply our inference framework to a dataset that was generated with the aim of understanding the localization of RNA-protein complexes.
NASA Astrophysics Data System (ADS)
Strickland, David
2004-10-01
We propose to observe 3 edge-on Milky-Way-like normal spiral galaxies in order to constrain the presence, properties and physical origin of hot gas in their halos, a topic about which relatively little is currently known. These observations will complete our sample of 8 edge-on normal spirals for which we have a wide range of existing observational data, so that all galaxies will have deep XMM-Newton and/or Chandra observations. With this sample we can assess the relative contribution to the halo X-ray emission of normal spirals from SNII-driven galactic fountains, accretion of primordial gas, and SNIa-driven outflows. The observations will robustly detect NGC 891-like hot halos, broadly quantify their properties, and can be used to constrain the efficiency of mechanical energy feedback.
NASA Astrophysics Data System (ADS)
Rovny, Jared; Blum, Robert L.; Barrett, Sean E.
2018-05-01
The rich dynamics and phase structure of driven systems include the recently described phenomenon of the "discrete time crystal" (DTC), a robust phase which spontaneously breaks the discrete time translation symmetry of its driving Hamiltonian. Experiments in trapped ions and diamond nitrogen vacancy centers have recently shown evidence for this DTC order. Here, we show nuclear magnetic resonance (NMR) data of DTC behavior in a third, strikingly different, system: a highly ordered spatial crystal in three dimensions. We devise a DTC echo experiment to probe the coherence of the driven system. We examine potential decay mechanisms for the DTC oscillations, and demonstrate the important effect of the internal Hamiltonian during nonzero duration pulses.
Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele
2014-01-01
The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202
Inferring Ice Thickness from a Glacier Dynamics Model and Multiple Surface Datasets.
NASA Astrophysics Data System (ADS)
Guan, Y.; Haran, M.; Pollard, D.
2017-12-01
The future behavior of the West Antarctic Ice Sheet (WAIS) may have a major impact on future climate. For instance, ice sheet melt may contribute significantly to global sea level rise. Understanding the current state of WAIS is therefore of great interest. WAIS is drained by fast-flowing glaciers which are major contributors to ice loss. Hence, understanding the stability and dynamics of glaciers is critical for predicting the future of the ice sheet. Glacier dynamics are driven by the interplay between the topography, temperature and basal conditions beneath the ice. A glacier dynamics model describes the interactions between these processes. We develop a hierarchical Bayesian model that integrates multiple ice sheet surface data sets with a glacier dynamics model. Our approach allows us to (1) infer important parameters describing the glacier dynamics, (2) learn about ice sheet thickness, and (3) account for errors in the observations and the model. Because we have relatively dense and accurate ice thickness data from the Thwaites Glacier in West Antarctica, we use these data to validate the proposed approach. The long-term goal of this work is to have a general model that may be used to study multiple glaciers in the Antarctic.
Strong Bayesian evidence for the normal neutrino hierarchy
NASA Astrophysics Data System (ADS)
Simpson, Fergus; Jimenez, Raul; Pena-Garay, Carlos; Verde, Licia
2017-06-01
The configuration of the three neutrino masses can take two forms, known as the normal and inverted hierarchies. We compute the Bayesian evidence associated with these two hierarchies. Previous studies found a mild preference for the normal hierarchy, and this was driven by the asymmetric manner in which cosmological data has confined the available parameter space. Here we identify the presence of a second asymmetry, which is imposed by data from neutrino oscillations. By combining constraints on the squared-mass splittings [1] with the limit on the sum of neutrino masses of Σmν < 0.13 eV [2], and using a minimally informative prior on the masses, we infer odds of 42:1 in favour of the normal hierarchy, which is classified as "strong" in the Jeffreys' scale. We explore how these odds may evolve in light of higher precision cosmological data, and discuss the implications of this finding with regards to the nature of neutrinos. Finally the individual masses are inferred to be m1=3.80+26.2-3.73meV; m2=8.8+18-1.2meV; m3=50.4+5.8-1.2meV (95% credible intervals).
Gaggero, D; Grasso, D; Marinelli, A; Taoso, M; Urbano, A
2017-07-21
We present a novel interpretation of the γ-ray diffuse emission measured by Fermi-LAT and H.E.S.S. in the Galactic center (GC) region and the Galactic ridge (GR). In the first part we perform a data-driven analysis based on PASS8 Fermi-LAT data: We extend down to a few GeV the spectra measured by H.E.S.S. and infer the primary cosmic-ray (CR) radial distribution between 0.1 and 3 TeV. In the second part we adopt a CR transport model based on a position-dependent diffusion coefficient. Such behavior reproduces the radial dependence of the CR spectral index recently inferred from the Fermi-LAT observations. We find that the bulk of the GR emission can be naturally explained by the interaction of the diffuse steady-state Galactic CR sea with the gas present in the central molecular zone. Although we confirm the presence of a residual radial-dependent emission associated with a central source, the relevance of the large-scale diffuse component prevents to claim a solid evidence of GC pevatrons.
Minică, Camelia C.; Genovese, Giulio; Hultman, Christina M.; Pool, René; Vink, Jacqueline M.; Neale, Michael C.; Dolan, Conor V.; Neale, Benjamin M.
2017-01-01
Sequence-based association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Because the true weights are generally unknown, and so are subject to misspecification, we examined the efficiency of a data-driven weighting scheme. We propose the use of a set of theoretically defensible weighting schemes, of which, we assume, the one that gives the largest test statistic is likely to capture best the allele frequency-functional effect relationship. We show that the use of alternative weights obviates the need to impose arbitrary frequency thresholds in sequence data association analyses. As both the score test and the likelihood ratio test (LRT) may be used in this context, and may differ in power, we characterize the behavior of both tests. We found that the two tests have equal power if the set of weights resembled the correct ones. However, if the weights are badly specified, the LRT shows superior power (due to its robustness to misspecification). With this data-driven weighting procedure the LRT detected significant signal in genes located in regions already confirmed as associated with schizophrenia – the PRRC2A (P=1.020E-06) and the VARS2 (P=2.383E-06) – in the Swedish schizophrenia case-control cohort of 11,040 individuals with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances the score test is the most powerful. However, LRT has the advantageous properties of being generally more robust and more powerful under weight misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches. PMID:28238293
NASA Astrophysics Data System (ADS)
Cabello, Violeta
2017-04-01
This communication will present the advancement of an innovative analytical framework for the analysis of Water-Energy-Food-Climate Nexus termed Quantitative Story Telling (QST). The methodology is currently under development within the H2020 project MAGIC - Moving Towards Adaptive Governance in Complexity: Informing Nexus Security (www.magic-nexus.eu). The key innovation of QST is that it bridges qualitative and quantitative analytical tools into an iterative research process in which each step is built and validated in interaction with stakeholders. The qualitative analysis focusses on the identification of the narratives behind the development of relevant WEFC-Nexus policies and innovations. The quantitative engine is the Multi-Scale Analysis of Societal and Ecosystem Metabolism (MuSIASEM), a resource accounting toolkit capable of integrating multiple analytical dimensions at different scales through relational analysis. Although QST may not be labelled a data-driven but a story-driven approach, I will argue that improving models per se may not lead to an improved understanding of WEF-Nexus problems unless we are capable of generating more robust narratives to frame them. The communication will cover an introduction to MAGIC project, the basic concepts of QST and a case study focussed on agricultural production in a semi-arid region in Southern Spain. Data requirements for this case study and the limitations to find, access or estimate them will be presented alongside a reflection on the relation between analytical scales and data availability.
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks
2017-01-01
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. PMID:28545083
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
Klinkenberg, Don; Backer, Jantien A; Didelot, Xavier; Colijn, Caroline; Wallinga, Jacco
2017-05-01
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
Disentangling niche competition from grazing mortality in phytoplankton dilution experiments
Weitz, Joshua S.
2017-01-01
The dilution method is the principal tool used to infer in situ microzooplankton grazing rates. However, grazing is the only mortality process considered in the theoretical model underlying the interpretation of dilution method experiments. Here we evaluate the robustness of mortality estimates inferred from dilution experiments when there is concurrent niche competition amongst phytoplankton. Using a combination of mathematical analysis and numerical simulations, we find that grazing rates may be overestimated—the degree of overestimation is related to the importance of niche competition relative to microzooplankton grazing. In response, we propose a conceptual method to disentangle the effects of niche competition and grazing by diluting out microzooplankton, but not phytoplankton. Our theoretical results suggest this revised “Z-dilution” method can robustly infer grazing mortality, regardless of the dominant phytoplankton mortality driver in our system. Further, we show it is possible to independently estimate both grazing mortality and niche competition if the classical and Z-dilution methods can be used in tandem. We discuss the significance of these results for quantifying phytoplankton mortality rates; and the feasibility of implementing the Z-dilution method in practice, whether in model systems or in complex communities with overlap in the size distributions of phytoplankton and microzooplankton. PMID:28505212
Gong, Bo; Schullcke, Benjamin; Krueger-Ziolek, Sabine; Mueller-Lisse, Ullrich; Moeller, Knut
2016-06-01
Electrical impedance tomography (EIT) reconstructs the conductivity distribution of a domain using electrical data on its boundary. This is an ill-posed inverse problem usually solved on a finite element mesh. For this article, a special regularization method incorporating structural information of the targeted domain is proposed and evaluated. Structural information was obtained either from computed tomography images or from preliminary EIT reconstructions by a modified k-means clustering. The proposed regularization method integrates this structural information into the reconstruction as a soft constraint preferring sparsity in group level. A first evaluation with Monte Carlo simulations indicated that the proposed solver is more robust to noise and the resulting images show fewer artifacts. This finding is supported by real data analysis. The structure based regularization has the potential to balance structural a priori information with data driven reconstruction. It is robust to noise, reduces artifacts and produces images that reflect anatomy and are thus easier to interpret for physicians.
Autonomous entropy-based intelligent experimental design
NASA Astrophysics Data System (ADS)
Malakar, Nabin Kumar
2011-07-01
The aim of this thesis is to explore the application of probability and information theory in experimental design, and to do so in a way that combines what we know about inference and inquiry in a comprehensive and consistent manner. Present day scientific frontiers involve data collection at an ever-increasing rate. This requires that we find a way to collect the most relevant data in an automated fashion. By following the logic of the scientific method, we couple an inference engine with an inquiry engine to automate the iterative process of scientific learning. The inference engine involves Bayesian machine learning techniques to estimate model parameters based upon both prior information and previously collected data, while the inquiry engine implements data-driven exploration. By choosing an experiment whose distribution of expected results has the maximum entropy, the inquiry engine selects the experiment that maximizes the expected information gain. The coupled inference and inquiry engines constitute an autonomous learning method for scientific exploration. We apply it to a robotic arm to demonstrate the efficacy of the method. Optimizing inquiry involves searching for an experiment that promises, on average, to be maximally informative. If the set of potential experiments is described by many parameters, the search involves a high-dimensional entropy space. In such cases, a brute force search method will be slow and computationally expensive. We develop an entropy-based search algorithm, called nested entropy sampling, to select the most informative experiment. This helps to reduce the number of computations necessary to find the optimal experiment. We also extended the method of maximizing entropy, and developed a method of maximizing joint entropy so that it could be used as a principle of collaboration between two robots. This is a major achievement of this thesis, as it allows the information-based collaboration between two robotic units towards a same goal in an automated fashion.
Bordier, Cecile; Puja, Francesco; Macaluso, Emiliano
2013-01-01
The investigation of brain activity using naturalistic, ecologically-valid stimuli is becoming an important challenge for neuroscience research. Several approaches have been proposed, primarily relying on data-driven methods (e.g. independent component analysis, ICA). However, data-driven methods often require some post-hoc interpretation of the imaging results to draw inferences about the underlying sensory, motor or cognitive functions. Here, we propose using a biologically-plausible computational model to extract (multi-)sensory stimulus statistics that can be used for standard hypothesis-driven analyses (general linear model, GLM). We ran two separate fMRI experiments, which both involved subjects watching an episode of a TV-series. In Exp 1, we manipulated the presentation by switching on-and-off color, motion and/or sound at variable intervals, whereas in Exp 2, the video was played in the original version, with all the consequent continuous changes of the different sensory features intact. Both for vision and audition, we extracted stimulus statistics corresponding to spatial and temporal discontinuities of low-level features, as well as a combined measure related to the overall stimulus saliency. Results showed that activity in occipital visual cortex and the superior temporal auditory cortex co-varied with changes of low-level features. Visual saliency was found to further boost activity in extra-striate visual cortex plus posterior parietal cortex, while auditory saliency was found to enhance activity in the superior temporal cortex. Data-driven ICA analyses of the same datasets also identified “sensory” networks comprising visual and auditory areas, but without providing specific information about the possible underlying processes, e.g., these processes could relate to modality, stimulus features and/or saliency. We conclude that the combination of computational modeling and GLM enables the tracking of the impact of bottom–up signals on brain activity during viewing of complex and dynamic multisensory stimuli, beyond the capability of purely data-driven approaches. PMID:23202431
Agile Text Mining for the 2014 i2b2/UTHealth Cardiac Risk Factors Challenge
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2016-01-01
This paper describes the use of an agile text mining platform (Linguamatics’ Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 Challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. PMID:26209007
Improving Photometric Redshifts for Hyper Suprime-Cam
NASA Astrophysics Data System (ADS)
Speagle, Josh S.; Leauthaud, Alexie; Eisenstein, Daniel; Bundy, Kevin; Capak, Peter L.; Leistedt, Boris; Masters, Daniel C.; Mortlock, Daniel; Peiris, Hiranya; HSC Photo-z Team; HSC Weak Lensing Team
2017-01-01
Deriving accurate photometric redshift (photo-z) probability distribution functions (PDFs) are crucial science components for current and upcoming large-scale surveys. We outline how rigorous Bayesian inference and machine learning can be combined to quickly derive joint photo-z PDFs to individual galaxies and their parent populations. Using the first 170 deg^2 of data from the ongoing Hyper Suprime-Cam survey, we demonstrate our method is able to generate accurate predictions and reliable credible intervals over ~370k high-quality redshifts. We then use galaxy-galaxy lensing to empirically validate our predicted photo-z's over ~14M objects, finding a robust signal.
Observed flow compensation associated with the MOC at 26.5 degrees N in the Atlantic.
Kanzow, Torsten; Cunningham, Stuart A; Rayner, Darren; Hirschi, Joël J-M; Johns, William E; Baringer, Molly O; Bryden, Harry L; Beal, Lisa M; Meinen, Christopher S; Marotzke, Jochem
2007-08-17
The Atlantic meridional overturning circulation (MOC), which provides one-quarter of the global meridional heat transport, is composed of a number of separate flow components. How changes in the strength of each of those components may affect that of the others has been unclear because of a lack of adequate data. We continuously observed the MOC at 26.5 degrees N for 1 year using end-point measurements of density, bottom pressure, and ocean currents; cable measurements across the Straits of Florida; and wind stress. The different transport components largely compensate for each other, thus confirming the validity of our monitoring approach. The MOC varied over the period of observation by +/-5.7 x 10(6) cubic meters per second, with density-inferred and wind-driven transports contributing equally to it. We find evidence for depth-independent compensation for the wind-driven surface flow.
NASA Astrophysics Data System (ADS)
Casdagli, M. C.
1997-09-01
We show that recurrence plots (RPs) give detailed characterizations of time series generated by dynamical systems driven by slowly varying external forces. For deterministic systems we show that RPs of the time series can be used to reconstruct the RP of the driving force if it varies sufficiently slowly. If the driving force is one-dimensional, its functional form can then be inferred up to an invertible coordinate transformation. The same results hold for stochastic systems if the RP of the time series is suitably averaged and transformed. These results are used to investigate the nonlinear prediction of time series generated by dynamical systems driven by slowly varying external forces. We also consider the problem of detecting a small change in the driving force, and propose a surrogate data technique for assessing statistical significance. Numerically simulated time series and a time series of respiration rates recorded from a subject with sleep apnea are used as illustrative examples.
Experimental Observation of a Current-Driven Instability in a Neutral Electron-Positron Beam
DOE Office of Scientific and Technical Information (OSTI.GOV)
Warwick, J.; Dzelzainis, T.; Dieckmann, M. E.
Here, we report on the first experimental observation of a current-driven instability developing in a quasineutral matter-antimatter beam. Strong magnetic fields (≥ 1T) are measured, via means of a proton radiography technique, after the propagation of a neutral electron-positron beam through a background electron-ion plasma. The experimentally determined equipartition parameter of ε B ≈ 10 -3 is typical of values inferred from models of astrophysical gamma-ray bursts, in which the relativistic flows are also expected to be pair dominated. The data, supported by particle-in-cell simulations and simple analytical estimates, indicate that these magnetic fields persist in the background plasma formore » thousands of inverse plasma frequencies. The existence of such long-lived magnetic fields can be related to analog astrophysical systems, such as those prevalent in lepton-dominated jets.« less
Experimental Observation of a Current-Driven Instability in a Neutral Electron-Positron Beam
Warwick, J.; Dzelzainis, T.; Dieckmann, M. E.; ...
2017-11-03
Here, we report on the first experimental observation of a current-driven instability developing in a quasineutral matter-antimatter beam. Strong magnetic fields (≥ 1T) are measured, via means of a proton radiography technique, after the propagation of a neutral electron-positron beam through a background electron-ion plasma. The experimentally determined equipartition parameter of ε B ≈ 10 -3 is typical of values inferred from models of astrophysical gamma-ray bursts, in which the relativistic flows are also expected to be pair dominated. The data, supported by particle-in-cell simulations and simple analytical estimates, indicate that these magnetic fields persist in the background plasma formore » thousands of inverse plasma frequencies. The existence of such long-lived magnetic fields can be related to analog astrophysical systems, such as those prevalent in lepton-dominated jets.« less
Campbell, Kieran R.
2016-01-01
Single cell gene expression profiling can be used to quantify transcriptional dynamics in temporal processes, such as cell differentiation, using computational methods to label each cell with a ‘pseudotime’ where true time series experimentation is too difficult to perform. However, owing to the high variability in gene expression between individual cells, there is an inherent uncertainty in the precise temporal ordering of the cells. Pre-existing methods for pseudotime estimation have predominantly given point estimates precluding a rigorous analysis of the implications of uncertainty. We use probabilistic modelling techniques to quantify pseudotime uncertainty and propagate this into downstream differential expression analysis. We demonstrate that reliance on a point estimate of pseudotime can lead to inflated false discovery rates and that probabilistic approaches provide greater robustness and measures of the temporal resolution that can be obtained from pseudotime inference. PMID:27870852
Improving Causal Inferences in Meta-analyses of Longitudinal Studies: Spanking as an Illustration.
Larzelere, Robert E; Gunnoe, Marjorie Lindner; Ferguson, Christopher J
2018-05-24
To evaluate and improve the validity of causal inferences from meta-analyses of longitudinal studies, two adjustments for Time-1 outcome scores and a temporally backwards test are demonstrated. Causal inferences would be supported by robust results across both adjustment methods, distinct from results run backwards. A systematic strategy for evaluating potential confounds is also introduced. The methods are illustrated by assessing the impact of spanking on subsequent externalizing problems (child age: 18 months to 11 years). Significant results indicated a small risk or a small benefit of spanking, depending on the adjustment method. These meta-analytic methods are applicable for research on alternatives to spanking and other developmental science topics. The underlying principles can also improve causal inferences in individual studies. © 2018 Society for Research in Child Development.
NASA Astrophysics Data System (ADS)
Zheng, Feifei; Maier, Holger R.; Wu, Wenyan; Dandy, Graeme C.; Gupta, Hoshin V.; Zhang, Tuqiao
2018-02-01
Hydrological models are used for a wide variety of engineering purposes, including streamflow forecasting and flood-risk estimation. To develop such models, it is common to allocate the available data to calibration and evaluation data subsets. Surprisingly, the issue of how this allocation can affect model evaluation performance has been largely ignored in the research literature. This paper discusses the evaluation performance bias that can arise from how available data are allocated to calibration and evaluation subsets. As a first step to assessing this issue in a statistically rigorous fashion, we present a comprehensive investigation of the influence of data allocation on the development of data-driven artificial neural network (ANN) models of streamflow. Four well-known formal data splitting methods are applied to 754 catchments from Australia and the U.S. to develop 902,483 ANN models. Results clearly show that the choice of the method used for data allocation has a significant impact on model performance, particularly for runoff data that are more highly skewed, highlighting the importance of considering the impact of data splitting when developing hydrological models. The statistical behavior of the data splitting methods investigated is discussed and guidance is offered on the selection of the most appropriate data splitting methods to achieve representative evaluation performance for streamflow data with different statistical properties. Although our results are obtained for data-driven models, they highlight the fact that this issue is likely to have a significant impact on all types of hydrological models, especially conceptual rainfall-runoff models.
Data-free and data-driven spectral perturbations for RANS UQ
NASA Astrophysics Data System (ADS)
Edeling, Wouter; Mishra, Aashwin; Iaccarino, Gianluca
2017-11-01
Despite recent developments in high-fidelity turbulent flow simulations, RANS modeling is still vastly used by industry, due to its inherent low cost. Since accuracy is a concern in RANS modeling, model-form UQ is an essential tool for assessing the impacts of this uncertainty on quantities of interest. Applying the spectral decomposition to the modeled Reynolds-Stress Tensor (RST) allows for the introduction of decoupled perturbations into the baseline intensity (kinetic energy), shape (eigenvalues), and orientation (eigenvectors). This constitutes a natural methodology to evaluate the model form uncertainty associated to different aspects of RST modeling. In a predictive setting, one frequently encounters an absence of any relevant reference data. To make data-free predictions with quantified uncertainty we employ physical bounds to a-priori define maximum spectral perturbations. When propagated, these perturbations yield intervals of engineering utility. High-fidelity data opens up the possibility of inferring a distribution of uncertainty, by means of various data-driven machine-learning techniques. We will demonstrate our framework on a number of flow problems where RANS models are prone to failure. This research was partially supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo), and the DOE PSAAP-II program.
Dark Energy Survey Year 1 Results: Multi-Probe Methodology and Simulated Likelihood Analyses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krause, E.; et al.
We present the methodology for and detail the implementation of the Dark Energy Survey (DES) 3x2pt DES Year 1 (Y1) analysis, which combines configuration-space two-point statistics from three different cosmological probes: cosmic shear, galaxy-galaxy lensing, and galaxy clustering, using data from the first year of DES observations. We have developed two independent modeling pipelines and describe the code validation process. We derive expressions for analytical real-space multi-probe covariances, and describe their validation with numerical simulations. We stress-test the inference pipelines in simulated likelihood analyses that vary 6-7 cosmology parameters plus 20 nuisance parameters and precisely resemble the analysis to be presented in the DES 3x2pt analysis paper, using a variety of simulated input data vectors with varying assumptions. We find that any disagreement between pipelines leads to changes in assigned likelihoodmore » $$\\Delta \\chi^2 \\le 0.045$$ with respect to the statistical error of the DES Y1 data vector. We also find that angular binning and survey mask do not impact our analytic covariance at a significant level. We determine lower bounds on scales used for analysis of galaxy clustering (8 Mpc$$~h^{-1}$$) and galaxy-galaxy lensing (12 Mpc$$~h^{-1}$$) such that the impact of modeling uncertainties in the non-linear regime is well below statistical errors, and show that our analysis choices are robust against a variety of systematics. These tests demonstrate that we have a robust analysis pipeline that yields unbiased cosmological parameter inferences for the flagship 3x2pt DES Y1 analysis. We emphasize that the level of independent code development and subsequent code comparison as demonstrated in this paper is necessary to produce credible constraints from increasingly complex multi-probe analyses of current data.« less
What Can Causal Networks Tell Us about Metabolic Pathways?
Blair, Rachael Hageman; Kliebenstein, Daniel J.; Churchill, Gary A.
2012-01-01
Graphical models describe the linear correlation structure of data and have been used to establish causal relationships among phenotypes in genetic mapping populations. Data are typically collected at a single point in time. Biological processes on the other hand are often non-linear and display time varying dynamics. The extent to which graphical models can recapitulate the architecture of an underlying biological processes is not well understood. We consider metabolic networks with known stoichiometry to address the fundamental question: “What can causal networks tell us about metabolic pathways?”. Using data from an Arabidopsis BaySha population and simulated data from dynamic models of pathway motifs, we assess our ability to reconstruct metabolic pathways using graphical models. Our results highlight the necessity of non-genetic residual biological variation for reliable inference. Recovery of the ordering within a pathway is possible, but should not be expected. Causal inference is sensitive to subtle patterns in the correlation structure that may be driven by a variety of factors, which may not emphasize the substrate-product relationship. We illustrate the effects of metabolic pathway architecture, epistasis and stochastic variation on correlation structure and graphical model-derived networks. We conclude that graphical models should be interpreted cautiously, especially if the implied causal relationships are to be used in the design of intervention strategies. PMID:22496633
Mutual Information in Frequency and Its Application to Measure Cross-Frequency Coupling in Epilepsy
NASA Astrophysics Data System (ADS)
Malladi, Rakesh; Johnson, Don H.; Kalamangalam, Giridhar P.; Tandon, Nitin; Aazhang, Behnaam
2018-06-01
We define a metric, mutual information in frequency (MI-in-frequency), to detect and quantify the statistical dependence between different frequency components in the data, referred to as cross-frequency coupling and apply it to electrophysiological recordings from the brain to infer cross-frequency coupling. The current metrics used to quantify the cross-frequency coupling in neuroscience cannot detect if two frequency components in non-Gaussian brain recordings are statistically independent or not. Our MI-in-frequency metric, based on Shannon's mutual information between the Cramer's representation of stochastic processes, overcomes this shortcoming and can detect statistical dependence in frequency between non-Gaussian signals. We then describe two data-driven estimators of MI-in-frequency: one based on kernel density estimation and the other based on the nearest neighbor algorithm and validate their performance on simulated data. We then use MI-in-frequency to estimate mutual information between two data streams that are dependent across time, without making any parametric model assumptions. Finally, we use the MI-in- frequency metric to investigate the cross-frequency coupling in seizure onset zone from electrocorticographic recordings during seizures. The inferred cross-frequency coupling characteristics are essential to optimize the spatial and spectral parameters of electrical stimulation based treatments of epilepsy.
The role of data assimilation in maximizing the utility of geospace observations (Invited)
NASA Astrophysics Data System (ADS)
Matsuo, T.
2013-12-01
Data assimilation can facilitate maximizing the utility of existing geospace observations by offering an ultimate marriage of inductive (data-driven) and deductive (first-principles based) approaches to addressing critical questions in space weather. Assimilative approaches that incorporate dynamical models are, in particular, capable of making a diverse set of observations consistent with physical processes included in a first-principles model, and allowing unobserved physical states to be inferred from observations. These points will be demonstrated in the context of the application of an ensemble Kalman filter (EnKF) to a thermosphere and ionosphere general circulation model. An important attribute of this approach is that the feedback between plasma and neutral variables is self-consistently treated both in the forecast model as well as in the assimilation scheme. This takes advantage of the intimate coupling between the thermosphere and ionosphere described in general circulation models to enable the inference of unobserved thermospheric states from the relatively plentiful observations of the ionosphere. Given the ever-growing infrastructure for the global navigation satellite system, this is indeed a promising prospect for geospace data assimilation. In principle, similar approaches can be applied to any geospace observing systems to extract more geophysical information from a given set of observations than would otherwise be possible.
A sensory-driven controller for quadruped locomotion.
Ferreira, César; Santos, Cristina P
2017-02-01
Locomotion of quadruped robots has not yet achieved the harmony, flexibility, efficiency and robustness of its biological counterparts. Biological research showed that spinal reflexes are crucial for a successful locomotion in the most varied terrains. In this context, the development of bio-inspired controllers seems to be a good way to move toward an efficient and robust robotic locomotion, by mimicking their biological counterparts. This contribution presents a sensory-driven controller designed for the simulated Oncilla quadruped robot. In the proposed reflex controller, movement is generated through the robot's interactions with the environment, and therefore, the controller is solely dependent on sensory information. The results show that the reflex controller is capable of producing stable quadruped locomotion with a regular stepping pattern. Furthermore, it is capable of dealing with slopes without changing the parameters and with small obstacles, overcoming them successfully. Finally, system robustness was verified by adding noise to sensors and actuators and also delays.
Receiver function deconvolution using transdimensional hierarchical Bayesian inference
NASA Astrophysics Data System (ADS)
Kolb, J. M.; Lekić, V.
2014-06-01
Teleseismic waves can convert from shear to compressional (Sp) or compressional to shear (Ps) across impedance contrasts in the subsurface. Deconvolving the parent waveforms (P for Ps or S for Sp) from the daughter waveforms (S for Ps or P for Sp) generates receiver functions which can be used to analyse velocity structure beneath the receiver. Though a variety of deconvolution techniques have been developed, they are all adversely affected by background and signal-generated noise. In order to take into account the unknown noise characteristics, we propose a method based on transdimensional hierarchical Bayesian inference in which both the noise magnitude and noise spectral character are parameters in calculating the likelihood probability distribution. We use a reversible-jump implementation of a Markov chain Monte Carlo algorithm to find an ensemble of receiver functions whose relative fits to the data have been calculated while simultaneously inferring the values of the noise parameters. Our noise parametrization is determined from pre-event noise so that it approximates observed noise characteristics. We test the algorithm on synthetic waveforms contaminated with noise generated from a covariance matrix obtained from observed noise. We show that the method retrieves easily interpretable receiver functions even in the presence of high noise levels. We also show that we can obtain useful estimates of noise amplitude and frequency content. Analysis of the ensemble solutions produced by our method can be used to quantify the uncertainties associated with individual receiver functions as well as with individual features within them, providing an objective way for deciding which features warrant geological interpretation. This method should make possible more robust inferences on subsurface structure using receiver function analysis, especially in areas of poor data coverage or under noisy station conditions.
Inferring influenza dynamics and control in households
Lau, Max S.Y.; Cowling, Benjamin J.; Cook, Alex R.; Riley, Steven
2015-01-01
Household-based interventions are the mainstay of public health policy against epidemic respiratory pathogens when vaccination is not available. Although the efficacy of these interventions has traditionally been measured by their ability to reduce the proportion of household contacts who exhibit symptoms [household secondary attack rate (hSAR)], this metric is difficult to interpret and makes only partial use of data collected by modern field studies. Here, we use Bayesian transmission model inference to analyze jointly both symptom reporting and viral shedding data from a three-armed study of influenza interventions. The reduction in hazard of infection in the increased hand hygiene intervention arm was 37.0% [8.3%, 57.8%], whereas the equivalent reduction in the other intervention arm was 27.2% [−0.46%, 52.3%] (increased hand hygiene and face masks). By imputing the presence and timing of unobserved infection, we estimated that only 61.7% [43.1%, 76.9%] of infections met the case criteria and were thus detected by the study design. An assessment of interventions using inferred infections produced more intuitively consistent attack rates when households were stratified by the speed of intervention, compared with the crude hSAR. Compared with adults, children were 2.29 [1.66, 3.23] times as infectious and 3.36 [2.31, 4.82] times as susceptible. The mean generation time was 3.39 d [3.06, 3.70]. Laboratory confirmation of infections by RT-PCR was only able to detect 79.6% [76.5%, 83.0%] of symptomatic infections, even at the peak of shedding. Our results highlight the potential use of robust inference with well-designed mechanistic transmission models to improve the design of intervention studies. PMID:26150502
DES Y1 Results: Validating Cosmological Parameter Estimation Using Simulated Dark Energy Surveys
DOE Office of Scientific and Technical Information (OSTI.GOV)
MacCrann, N.; et al.
We use mock galaxy survey simulations designed to resemble the Dark Energy Survey Year 1 (DES Y1) data to validate and inform cosmological parameter estimation. When similar analysis tools are applied to both simulations and real survey data, they provide powerful validation tests of the DES Y1 cosmological analyses presented in companion papers. We use two suites of galaxy simulations produced using different methods, which therefore provide independent tests of our cosmological parameter inference. The cosmological analysis we aim to validate is presented in DES Collaboration et al. (2017) and uses angular two-point correlation functions of galaxy number counts and weak lensing shear, as well as their cross-correlation, in multiple redshift bins. While our constraints depend on the specific set of simulated realisations available, for both suites of simulations we find that the input cosmology is consistent with the combined constraints from multiple simulated DES Y1 realizations in themore » $$\\Omega_m-\\sigma_8$$ plane. For one of the suites, we are able to show with high confidence that any biases in the inferred $$S_8=\\sigma_8(\\Omega_m/0.3)^{0.5}$$ and $$\\Omega_m$$ are smaller than the DES Y1 $$1-\\sigma$$ uncertainties. For the other suite, for which we have fewer realizations, we are unable to be this conclusive; we infer a roughly 70% probability that systematic biases in the recovered $$\\Omega_m$$ and $$S_8$$ are sub-dominant to the DES Y1 uncertainty. As cosmological analyses of this kind become increasingly more precise, validation of parameter inference using survey simulations will be essential to demonstrate robustness.« less
Supervised learning for infection risk inference using pathology data.
Hernandez, Bernard; Herrero, Pau; Rawson, Timothy Miles; Moore, Luke S P; Evans, Benjamin; Toumazou, Christofer; Holmes, Alison H; Georgiou, Pantelis
2017-12-08
Antimicrobial Resistance is threatening our ability to treat common infectious diseases and overuse of antimicrobials to treat human infections in hospitals is accelerating this process. Clinical Decision Support Systems (CDSSs) have been proven to enhance quality of care by promoting change in prescription practices through antimicrobial selection advice. However, bypassing an initial assessment to determine the existence of an underlying disease that justifies the need of antimicrobial therapy might lead to indiscriminate and often unnecessary prescriptions. From pathology laboratory tests, six biochemical markers were selected and combined with microbiology outcomes from susceptibility tests to create a unique dataset with over one and a half million daily profiles to perform infection risk inference. Outliers were discarded using the inter-quartile range rule and several sampling techniques were studied to tackle the class imbalance problem. The first phase selects the most effective and robust model during training using ten-fold stratified cross-validation. The second phase evaluates the final model after isotonic calibration in scenarios with missing inputs and imbalanced class distributions. More than 50% of infected profiles have daily requested laboratory tests for the six biochemical markers with very promising infection inference results: area under the receiver operating characteristic curve (0.80-0.83), sensitivity (0.64-0.75) and specificity (0.92-0.97). Standardization consistently outperforms normalization and sensitivity is enhanced by using the SMOTE sampling technique. Furthermore, models operated without noticeable loss in performance if at least four biomarkers were available. The selected biomarkers comprise enough information to perform infection risk inference with a high degree of confidence even in the presence of incomplete and imbalanced data. Since they are commonly available in hospitals, Clinical Decision Support Systems could benefit from these findings to assist clinicians in deciding whether or not to initiate antimicrobial therapy to improve prescription practices.
A Case Study: Analyzing City Vitality with Four Pillars of Activity-Live, Work, Shop, and Play.
Griffin, Matt; Nordstrom, Blake W; Scholes, Jon; Joncas, Kate; Gordon, Patrick; Krivenko, Elliott; Haynes, Winston; Higdon, Roger; Stewart, Elizabeth; Kolker, Natali; Montague, Elizabeth; Kolker, Eugene
2016-03-01
This case study evaluates and tracks vitality of a city (Seattle), based on a data-driven approach, using strategic, robust, and sustainable metrics. This case study was collaboratively conducted by the Downtown Seattle Association (DSA) and CDO Analytics teams. The DSA is a nonprofit organization focused on making the city of Seattle and its Downtown a healthy and vibrant place to Live, Work, Shop, and Play. DSA primarily operates through public policy advocacy, community and business development, and marketing. In 2010, the organization turned to CDO Analytics ( cdoanalytics.org ) to develop a process that can guide and strategically focus DSA efforts and resources for maximal benefit to the city of Seattle and its Downtown. CDO Analytics was asked to develop clear, easily understood, and robust metrics for a baseline evaluation of the health of the city, as well as for ongoing monitoring and comparisons of the vitality, sustainability, and growth. The DSA and CDO Analytics teams strategized on how to effectively assess and track the vitality of Seattle and its Downtown. The two teams filtered a variety of data sources, and evaluated the veracity of multiple diverse metrics. This iterative process resulted in the development of a small number of strategic, simple, reliable, and sustainable metrics across four pillars of activity: Live, Work, Shop, and Play. Data during the 5 years before 2010 were used for the development of the metrics and model and its training, and data during the 5 years from 2010 and on were used for testing and validation. This work enabled DSA to routinely track these strategic metrics, use them to monitor the vitality of Downtown Seattle, prioritize improvements, and identify new value-added programs. As a result, the four-pillar approach became an integral part of the data-driven decision-making and execution of the Seattle community's improvement activities. The approach described in this case study is actionable, robust, inexpensive, and easy to adopt and sustain. It can be applied to cities, districts, counties, regions, states, or countries, enabling cross-comparisons and improvements of vitality, sustainability, and growth.
Building social cognitive models of language change.
Hruschka, Daniel J; Christiansen, Morten H; Blythe, Richard A; Croft, William; Heggarty, Paul; Mufwene, Salikoko S; Pierrehumbert, Janet B; Poplack, Shana
2009-11-01
Studies of language change have begun to contribute to answering several pressing questions in cognitive sciences, including the origins of human language capacity, the social construction of cognition and the mechanisms underlying culture change in general. Here, we describe recent advances within a new emerging framework for the study of language change, one that models such change as an evolutionary process among competing linguistic variants. We argue that a crucial and unifying element of this framework is the use of probabilistic, data-driven models both to infer change and to compare competing claims about social and cognitive influences on language change.
NASA Astrophysics Data System (ADS)
Dahanayaka, Daminda; Wong, Andrew; Kaszuba, Philip; Moszkowicz, Leon; Slinkman, James; IBM SPV Lab Team
2014-03-01
Silicon-On-Insulator (SOI) technology has proved beneficial for RF cell phone technologies, which have equivalent performance to GaAs technologies. However, there is evident parasitic inversion layer under the Buried Oxide (BOX) at the interface with the high resistivity Si substrate. The latter is inferred from capacitance-voltage measurements on MOSCAPs. The inversion layer has adverse effects on RF device performance. We present data which, for the first time, show the extent of the inversion layer in the underlying substrate. This knowledge has driven processing techniques to suppress the inversion.
Indicators of ecosystem function identify alternate states in the sagebrush steppe.
Kachergis, Emily; Rocca, Monique E; Fernandez-Gimenez, Maria E
2011-10-01
Models of ecosystem change that incorporate nonlinear dynamics and thresholds, such as state-and-transition models (STMs), are increasingly popular tools for land management decision-making. However, few models are based on systematic collection and documentation of ecological data, and of these, most rely solely on structural indicators (species composition) to identify states and transitions. As STMs are adopted as an assessment framework throughout the United States, finding effective and efficient ways to create data-driven models that integrate ecosystem function and structure is vital. This study aims to (1) evaluate the utility of functional indicators (indicators of rangeland health, IRH) as proxies for more difficult ecosystem function measurements and (2) create a data-driven STM for the sagebrush steppe of Colorado, USA, that incorporates both ecosystem structure and function. We sampled soils, plant communities, and IRH at 41 plots with similar clayey soils but different site histories to identify potential states and infer the effects of management practices and disturbances on transitions. We found that many IRH were correlated with quantitative measures of functional indicators, suggesting that the IRH can be used to approximate ecosystem function. In addition to a reference state that functions as expected for this soil type, we identified four biotically and functionally distinct potential states, consistent with the theoretical concept of alternate states. Three potential states were related to management practices (chemical and mechanical shrub treatments and seeding history) while one was related only to ecosystem processes (erosion). IRH and potential states were also related to environmental variation (slope, soil texture), suggesting that there are environmental factors within areas with similar soils that affect ecosystem dynamics and should be noted within STMs. Our approach generated an objective, data-driven model of ecosystem dynamics for rangeland management. Our findings suggest that the IRH approximate ecosystem processes and can distinguish between alternate states and communities and identify transitions when building data-driven STMs. Functional indicators are a simple, efficient way to create data-driven models that are consistent with alternate state theory. Managers can use them to improve current model-building methods and thus apply state-and-transition models more broadly for land management decision-making.
BONNSAI: correlated stellar observables in Bayesian methods
NASA Astrophysics Data System (ADS)
Schneider, F. R. N.; Castro, N.; Fossati, L.; Langer, N.; de Koter, A.
2017-02-01
In an era of large spectroscopic surveys of stars and big data, sophisticated statistical methods become more and more important in order to infer fundamental stellar parameters such as mass and age. Bayesian techniques are powerful methods because they can match all available observables simultaneously to stellar models while taking prior knowledge properly into account. However, in most cases it is assumed that observables are uncorrelated which is generally not the case. Here, we include correlations in the Bayesian code Bonnsai by incorporating the covariance matrix in the likelihood function. We derive a parametrisation of the covariance matrix that, in addition to classical uncertainties, only requires the specification of a correlation parameter that describes how observables co-vary. Our correlation parameter depends purely on the method with which observables have been determined and can be analytically derived in some cases. This approach therefore has the advantage that correlations can be accounted for even if information for them are not available in specific cases but are known in general. Because the new likelihood model is a better approximation of the data, the reliability and robustness of the inferred parameters are improved. We find that neglecting correlations biases the most likely values of inferred stellar parameters and affects the precision with which these parameters can be determined. The importance of these biases depends on the strength of the correlations and the uncertainties. For example, we apply our technique to massive OB stars, but emphasise that it is valid for any type of stars. For effective temperatures and surface gravities determined from atmosphere modelling, we find that masses can be underestimated on average by 0.5σ and mass uncertainties overestimated by a factor of about 2 when neglecting correlations. At the same time, the age precisions are underestimated over a wide range of stellar parameters. We conclude that accounting for correlations is essential in order to derive reliable stellar parameters including robust uncertainties and will be vital when entering an era of precision stellar astrophysics thanks to the Gaia satellite.
A Hierarchical Framework for State-Space Matrix Inference and Clustering.
Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz
2016-09-01
In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.
Bayesian Model Selection in Geophysics: The evidence
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2016-12-01
Bayesian inference has found widespread application and use in science and engineering to reconcile Earth system models with data, including prediction in space (interpolation), prediction in time (forecasting), assimilation of observations and deterministic/stochastic model output, and inference of the model parameters. Per Bayes theorem, the posterior probability, , P(H|D), of a hypothesis, H, given the data D, is equivalent to the product of its prior probability, P(H), and likelihood, L(H|D), divided by a normalization constant, P(D). In geophysics, the hypothesis, H, often constitutes a description (parameterization) of the subsurface for some entity of interest (e.g. porosity, moisture content). The normalization constant, P(D), is not required for inference of the subsurface structure, yet of great value for model selection. Unfortunately, it is not particularly easy to estimate P(D) in practice. Here, I will introduce the various building blocks of a general purpose method which provides robust and unbiased estimates of the evidence, P(D). This method uses multi-dimensional numerical integration of the posterior (parameter) distribution. I will then illustrate this new estimator by application to three competing subsurface models (hypothesis) using GPR travel time data from the South Oyster Bacterial Transport Site, in Virginia, USA. The three subsurface models differ in their treatment of the porosity distribution and use (a) horizontal layering with fixed layer thicknesses, (b) vertical layering with fixed layer thicknesses and (c) a multi-Gaussian field. The results of the new estimator are compared against the brute force Monte Carlo method, and the Laplace-Metropolis method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruby, J. J.; Pak, A., E-mail: pak5@llnl.gov; Field, J. E.
2016-07-15
A technique for measuring residual motion during the stagnation phase of an indirectly driven inertial confinement experiment has been implemented. This method infers a velocity from spatially and temporally resolved images of the X-ray emission from two orthogonal lines of sight. This work investigates the accuracy of recovering spatially resolved velocities from the X-ray emission data. A detailed analytical and numerical modeling of the X-ray emission measurement shows that the accuracy of this method increases as the displacement that results from a residual velocity increase. For the typical experimental configuration, signal-to-noise ratios, and duration of X-ray emission, it is estimatedmore » that the fractional error in the inferred velocity rises above 50% as the velocity of emission falls below 24 μm/ns. By inputting measured parameters into this model, error estimates of the residual velocity as inferred from the X-ray emission measurements are now able to be generated for experimental data. Details of this analysis are presented for an implosion experiment conducted with an unintentional radiation flux asymmetry. The analysis shows a bright localized region of emission that moves through the larger emitting volume at a relatively higher velocity towards the location of the imposed flux deficit. This technique allows for the possibility of spatially resolving velocity flows within the so-called central hot spot of an implosion. This information would help to refine our interpretation of the thermal temperature inferred from the neutron time of flight detectors and the effect of localized hydrodynamic instabilities during the stagnation phase. Across several experiments, along a single line of sight, the average difference in magnitude and direction of the measured residual velocity as inferred from the X-ray and neutron time of flight detectors was found to be ∼13 μm/ns and ∼14°, respectively.« less
Ruby, J. J.; Pak, A.; Field, J. E.; ...
2016-07-01
A technique for measuring residual motion during the stagnation phase of an indirectly driven inertial confinement experiment has been implemented. Our method infers a velocity from spatially and temporally resolved images of the X-ray emission from two orthogonal lines of sight. This work investigates the accuracy of recovering spatially resolved velocities from the X-ray emission data. A detailed analytical and numerical modeling of the X-ray emission measurement shows that the accuracy of this method increases as the displacement that results from a residual velocity increase. For the typical experimental configuration, signal-to-noise ratios, and duration of X-ray emission, it is estimatedmore » that the fractional error in the inferred velocity rises above 50% as the velocity of emission falls below 24 μm/ns. Furthermore, by inputting measured parameters into this model, error estimates of the residual velocity as inferred from the X-ray emission measurements are now able to be generated for experimental data. Details of this analysis are presented for an implosion experiment conducted with an unintentional radiation flux asymmetry. The analysis shows a bright localized region of emission that moves through the larger emitting volume at a relatively higher velocity towards the location of the imposed flux deficit. Our technique allows for the possibility of spatially resolving velocity flows within the so-called central hot spot of an implosion. This information would help to refine our interpretation of the thermal temperature inferred from the neutron time of flight detectors and the effect of localized hydrodynamic instabilities during the stagnation phase. Across several experiments, along a single line of sight, the average difference in magnitude and direction of the measured residual velocity as inferred from the X-ray and neutron time of flight detectors was found to be ~13 μm/ns and ~14°, respectively.« less
Learning partial differential equations via data discovery and sparse optimization
NASA Astrophysics Data System (ADS)
Schaeffer, Hayden
2017-01-01
We investigate the problem of learning an evolution equation directly from some given data. This work develops a learning algorithm to identify the terms in the underlying partial differential equations and to approximate the coefficients of the terms only using data. The algorithm uses sparse optimization in order to perform feature selection and parameter estimation. The features are data driven in the sense that they are constructed using nonlinear algebraic equations on the spatial derivatives of the data. Several numerical experiments show the proposed method's robustness to data noise and size, its ability to capture the true features of the data, and its capability of performing additional analytics. Examples include shock equations, pattern formation, fluid flow and turbulence, and oscillatory convection.
Learning partial differential equations via data discovery and sparse optimization.
Schaeffer, Hayden
2017-01-01
We investigate the problem of learning an evolution equation directly from some given data. This work develops a learning algorithm to identify the terms in the underlying partial differential equations and to approximate the coefficients of the terms only using data. The algorithm uses sparse optimization in order to perform feature selection and parameter estimation. The features are data driven in the sense that they are constructed using nonlinear algebraic equations on the spatial derivatives of the data. Several numerical experiments show the proposed method's robustness to data noise and size, its ability to capture the true features of the data, and its capability of performing additional analytics. Examples include shock equations, pattern formation, fluid flow and turbulence, and oscillatory convection.
Learning partial differential equations via data discovery and sparse optimization
2017-01-01
We investigate the problem of learning an evolution equation directly from some given data. This work develops a learning algorithm to identify the terms in the underlying partial differential equations and to approximate the coefficients of the terms only using data. The algorithm uses sparse optimization in order to perform feature selection and parameter estimation. The features are data driven in the sense that they are constructed using nonlinear algebraic equations on the spatial derivatives of the data. Several numerical experiments show the proposed method's robustness to data noise and size, its ability to capture the true features of the data, and its capability of performing additional analytics. Examples include shock equations, pattern formation, fluid flow and turbulence, and oscillatory convection. PMID:28265183
Inference in a Synchronization Game with Social Interactions *
de Paula, Áureo
2009-01-01
This paper studies inference in a continuous time game where an agent's decision to quit an activity depends on the participation of other players. In equilibrium, similar actions can be explained not only by direct influences but also by correlated factors. Our model can be seen as a simultaneous duration model with multiple decision makers and interdependent durations. We study the problem of determining the existence and uniqueness of equilibrium stopping strategies in this setting. This paper provides results and conditions for the detection of these endogenous effects. First, we show that the presence of such effects is a necessary and sufficient condition for simultaneous exits. This allows us to set up a nonparametric test for the presence of such influences which is robust to multiple equilibria. Second, we provide conditions under which parameters in the game are identified. Finally, we apply the model to data on desertion in the Union Army during the American Civil War and find evidence of endogenous influences. PMID:20046804
Robust, Adaptive Functional Regression in Functional Mixed Model Framework.
Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S
2011-09-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework
Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.
2012-01-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015
Efficient Reverse-Engineering of a Developmental Gene Regulatory Network
Cicin-Sain, Damjan; Ashyraliyev, Maksat; Jaeger, Johannes
2012-01-01
Understanding the complex regulatory networks underlying development and evolution of multi-cellular organisms is a major problem in biology. Computational models can be used as tools to extract the regulatory structure and dynamics of such networks from gene expression data. This approach is called reverse engineering. It has been successfully applied to many gene networks in various biological systems. However, to reconstitute the structure and non-linear dynamics of a developmental gene network in its spatial context remains a considerable challenge. Here, we address this challenge using a case study: the gap gene network involved in segment determination during early development of Drosophila melanogaster. A major problem for reverse-engineering pattern-forming networks is the significant amount of time and effort required to acquire and quantify spatial gene expression data. We have developed a simplified data processing pipeline that considerably increases the throughput of the method, but results in data of reduced accuracy compared to those previously used for gap gene network inference. We demonstrate that we can infer the correct network structure using our reduced data set, and investigate minimal data requirements for successful reverse engineering. Our results show that timing and position of expression domain boundaries are the crucial features for determining regulatory network structure from data, while it is less important to precisely measure expression levels. Based on this, we define minimal data requirements for gap gene network inference. Our results demonstrate the feasibility of reverse-engineering with much reduced experimental effort. This enables more widespread use of the method in different developmental contexts and organisms. Such systematic application of data-driven models to real-world networks has enormous potential. Only the quantitative investigation of a large number of developmental gene regulatory networks will allow us to discover whether there are rules or regularities governing development and evolution of complex multi-cellular organisms. PMID:22807664
Robust and efficient estimation with weighted composite quantile regression
NASA Astrophysics Data System (ADS)
Jiang, Xuejun; Li, Jingzhi; Xia, Tian; Yan, Wanfeng
2016-09-01
In this paper we introduce a weighted composite quantile regression (CQR) estimation approach and study its application in nonlinear models such as exponential models and ARCH-type models. The weighted CQR is augmented by using a data-driven weighting scheme. With the error distribution unspecified, the proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the oracle maximum likelihood estimator (MLE) for a variety of error distributions including the normal, mixed-normal, Student's t, Cauchy distributions, etc. We also suggest an algorithm for the fast implementation of the proposed methodology. Simulations are carried out to compare the performance of different estimators, and the proposed approach is used to analyze the daily S&P 500 Composite index, which verifies the effectiveness and efficiency of our theoretical results.
Bayesian model selection: Evidence estimation based on DREAM simulation and bridge sampling
NASA Astrophysics Data System (ADS)
Volpi, Elena; Schoups, Gerrit; Firmani, Giovanni; Vrugt, Jasper A.
2017-04-01
Bayesian inference has found widespread application in Earth and Environmental Systems Modeling, providing an effective tool for prediction, data assimilation, parameter estimation, uncertainty analysis and hypothesis testing. Under multiple competing hypotheses, the Bayesian approach also provides an attractive alternative to traditional information criteria (e.g. AIC, BIC) for model selection. The key variable for Bayesian model selection is the evidence (or marginal likelihood) that is the normalizing constant in the denominator of Bayes theorem; while it is fundamental for model selection, the evidence is not required for Bayesian inference. It is computed for each hypothesis (model) by averaging the likelihood function over the prior parameter distribution, rather than maximizing it as by information criteria; the larger a model evidence the more support it receives among a collection of hypothesis as the simulated values assign relatively high probability density to the observed data. Hence, the evidence naturally acts as an Occam's razor, preferring simpler and more constrained models against the selection of over-fitted ones by information criteria that incorporate only the likelihood maximum. Since it is not particularly easy to estimate the evidence in practice, Bayesian model selection via the marginal likelihood has not yet found mainstream use. We illustrate here the properties of a new estimator of the Bayesian model evidence, which provides robust and unbiased estimates of the marginal likelihood; the method is coined Gaussian Mixture Importance Sampling (GMIS). GMIS uses multidimensional numerical integration of the posterior parameter distribution via bridge sampling (a generalization of importance sampling) of a mixture distribution fitted to samples of the posterior distribution derived from the DREAM algorithm (Vrugt et al., 2008; 2009). Some illustrative examples are presented to show the robustness and superiority of the GMIS estimator with respect to other commonly used approaches in the literature.
Updated Magmatic Flux Rate Estimates for the Hawaii Plume
NASA Astrophysics Data System (ADS)
Wessel, P.
2013-12-01
Several studies have estimated the magmatic flux rate along the Hawaiian-Emperor Chain using a variety of methods and arriving at different results. These flux rate estimates have weaknesses because of incomplete data sets and different modeling assumptions, especially for the youngest portion of the chain (<3 Ma). While they generally agree on the 1st order features, there is less agreement on the magnitude and relative size of secondary flux variations. Some of these differences arise from the use of different methodologies, but the significance of this variability is difficult to assess due to a lack of confidence bounds on the estimates obtained with these disparate methods. All methods introduce some error, but to date there has been little or no quantification of error estimates for the inferred melt flux, making an assessment problematic. Here we re-evaluate the melt flux for the Hawaii plume with the latest gridded data sets (SRTM30+ and FAA 21.1) using several methods, including the optimal robust separator (ORS) and directional median filtering techniques (DiM). We also compute realistic confidence limits on the results. In particular, the DiM technique was specifically developed to aid in the estimation of surface loads that are superimposed on wider bathymetric swells and it provides error estimates on the optimal residuals. Confidence bounds are assigned separately for the estimated surface load (obtained from the ORS regional/residual separation techniques) and the inferred subsurface volume (from gravity-constrained isostasy and plate flexure optimizations). These new and robust estimates will allow us to assess which secondary features in the resulting melt flux curve are significant and should be incorporated when correlating melt flux variations with other geophysical and geochemical observations.
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-01-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-09-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
Wong, Jessica J; McGregor, Marion; Mior, Silvano A; Loisel, Patrick
2014-01-01
The purpose of this study was to develop a model that evaluates the impact of policy changes on the number of workers' compensation lost-time back claims in Ontario, Canada, over a 30-year timeframe. The model was used to test the hypothesis that a theory- and policy-driven model would be sufficient in reproducing historical claims data in a robust manner and that policy changes would have a major impact on modeled data. The model was developed using system dynamics methods in the Vensim simulation program. The theoretical effects of policies for compensation benefit levels and experience rating fees were modeled. The model was built and validated using historical claims data from 1980 to 2009. Sensitivity analysis was used to evaluate the modeled data at extreme end points of variable input and timeframes. The degree of predictive value of the modeled data was measured by the coefficient of determination, root mean square error, and Theil's inequality coefficients. Correlation between modeled data and actual data was found to be meaningful (R(2) = 0.934), and the modeled data were stable at extreme end points. Among the effects explored, policy changes were found to be relatively minor drivers of back claims data, accounting for a 13% improvement in error. Simulation results suggested that unemployment, number of no-lost-time claims, number of injuries per worker, and recovery rate from back injuries outside of claims management to be sensitive drivers of back claims data. A robust systems-based model was developed and tested for use in future policy research in Ontario's workers' compensation. The study findings suggest that certain areas within and outside the workers' compensation system need to be considered when evaluating and changing policies around back claims. © 2014. Published by National University of Health Sciences All rights reserved.
Polansky, Leo; Douglas-Hamilton, Iain; Wittemyer, George
2013-01-01
Adaptive movement behaviors allow individuals to respond to fluctuations in resource quality and distribution in order to maintain fitness. Classically, studies of the interaction between ecological conditions and movement behavior have focused on such metrics as travel distance, velocity, home range size or patch occupancy time as the salient metrics of behavior. Driven by the emergence of very regular high frequency data, more recently the importance of interpreting the autocorrelation structure of movement as a behavioral metric has become apparent. Studying movement of a free ranging African savannah elephant population, we evaluated how two movement metrics, diel displacement (DD) and movement predictability (MP - the degree of autocorrelated movement activity at diel time scales), changed in response to variation in resource availability as measured by the Normalized Difference Vegetation Index. We were able to capitalize on long term (multi-year) yet high resolution (hourly) global positioning system tracking datasets, the sample size of which allows robust analysis of complex models. We use optimal foraging theory predictions as a framework to interpret our results, in particular contrasting the behaviors across changes in social rank and resource availability to infer which movement behaviors at diel time scales may be optimal in this highly social species. Both DD and MP increased with increasing forage availability, irrespective of rank, reflecting increased energy expenditure and movement predictability during time periods of overall high resource availability. However, significant interactions between forage availability and social rank indicated a stronger response in DD, and a weaker response in MP, with increasing social status. Relative to high ranking individuals, low ranking individuals expended more energy and exhibited less behavioral movement autocorrelation during lower forage availability conditions, likely reflecting sub-optimal movement behavior. Beyond situations of contest competition, rank status appears to influence the extent to which individuals can modify their movement strategies across periods with differing forage availability. Large-scale spatiotemporal resource complexity not only impacts fine scale movement and optimal foraging strategies directly, but likely impacts rates of inter- and intra-specific interactions and competition resulting in socially based movement responses to ecological dynamics.
Top-down Estimates of Isoprene Emissions in Australia Inferred from OMI Satellite Data.
NASA Astrophysics Data System (ADS)
Greenslade, J.; Fisher, J. A.; Surl, L.; Palmer, P. I.
2017-12-01
Australia is a global hotspot for biogenic isoprene emission factors predicted by process-based models such as the Model of Emissions of Gases and Aerosols from Nature (MEGAN). It is also prone to increasingly frequent temperature extremes that can drive episodically high emissions. Estimates of biogenic isoprene emissions from Australia are poorly constrained, with the frequently used MEGAN model overestimating emissions by a factor of 4-6 in some areas. Evaluating MEGAN and other models in Australia is difficult due to sparse measurements of emissions and their ensuing chemical products. In this talk, we will describe efforts to better quantify Australian isoprene emissions using top-down estimates based on formaldehyde (HCHO) observations from the OMI satellite instrument, combined with modelled isoprene to HCHO yields obtained from the GEOS-Chem chemical transport model. The OMI-based estimates are evaluated using in situ observations from field campaigns conducted in southeast Australia. We also investigate the impact on the inferred emission of horizontal resolution used for the yield calculations, particularly in regions on the boundary between low- and high-NOx chemistry. The prevalence of fire smoke plumes roughly halves the available satellite dataset over Australia for much of the year; however, seasonal averages remain robust. Preliminary results show that the top-down isoprene emissions are lower than MEGAN estimates by up to 90% in summer. The overestimates are greatest along the eastern coast, including areas surrounding Australia's major population centres in Sydney, Melbourne, and Brisbane. The coarse horizontal resolution of the model significantly affects the emissions estimates, as many biogenic emitting regions lie along narrow coastal stretches. Our results confirm previous findings that the MEGAN biogenic emission model is poorly calibrated for the Australian environment and suggests that chemical transport models driven by MEGAN are likely to overpredict ozone and secondary organic aerosols from biogenic sources in the Australian environment. Further measurements of biogenic gases are critical to improving biogenic emissions and follow-on chemical transport modelling, in this region. We hope to quantify this overestimation and its flow-on effects in future work.
Ecology-driven stereotypes override race stereotypes.
Williams, Keelah E G; Sng, Oliver; Neuberg, Steven L
2016-01-12
Why do race stereotypes take the forms they do? Life history theory posits that features of the ecology shape individuals' behavior. Harsh and unpredictable ("desperate") ecologies induce fast strategy behaviors such as impulsivity, whereas resource-sufficient and predictable ("hopeful") ecologies induce slow strategy behaviors such as future focus. We suggest that individuals possess a lay understanding of ecology's influence on behavior, resulting in ecology-driven stereotypes. Importantly, because race is confounded with ecology in the United States, we propose that Americans' stereotypes about racial groups actually reflect stereotypes about these groups' presumed home ecologies. Study 1 demonstrates that individuals hold ecology stereotypes, stereotyping people from desperate ecologies as possessing faster life history strategies than people from hopeful ecologies. Studies 2-4 rule out alternative explanations for those findings. Study 5, which independently manipulates race and ecology information, demonstrates that when provided with information about a person's race (but not ecology), individuals' inferences about blacks track stereotypes of people from desperate ecologies, and individuals' inferences about whites track stereotypes of people from hopeful ecologies. However, when provided with information about both the race and ecology of others, individuals' inferences reflect the targets' ecology rather than their race: black and white targets from desperate ecologies are stereotyped as equally fast life history strategists, whereas black and white targets from hopeful ecologies are stereotyped as equally slow life history strategists. These findings suggest that the content of several predominant race stereotypes may not reflect race, per se, but rather inferences about how one's ecology influences behavior.
Autoclass: An automatic classification system
NASA Technical Reports Server (NTRS)
Stutz, John; Cheeseman, Peter; Hanson, Robin
1991-01-01
The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework, and using various mathematical and algorithmic approximations, the AutoClass System searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit, or share, model parameters through a class hierarchy. The mathematical foundations of AutoClass are summarized.
Creating Compositionally-Driven Debris Disk Dust Models
NASA Astrophysics Data System (ADS)
Zimmerman, Mara; Jang-Condell, Hannah; Schneider, Glenn; Chen, Christine; Stark, Chris
2018-06-01
Debris disks play a key role in exoplanet research; planetary formation and composition can be inferred from the nature of the circumstellar disk. In order to characterize the properties of the circumstellar dust, we create models of debris disks in order to find the composition. We apply Mie theory to calculate the dust absorption and emission within debris disks. We have data on nine targets from Spitzer and Hubble Space Telescope. The Spitzer data includes mid-IR spectroscopy and photometry. We have spatially-resolved optical and near-IR images of the disks from HST. Our goal is to compare this data to the model. By using a model that fits for photometric and mid-IR datasimultaneously, we gain a deeper understanding of the structure and composition of the debris disk systems.
Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antzack, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J.; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco
2016-01-01
Abstract The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks in a wide spectrum of biological systems. PMID:27124473
Trevino, Victor; Cassese, Alberto; Nagy, Zsuzsanna; Zhuang, Xiaodong; Herbert, John; Antczak, Philipp; Clarke, Kim; Davies, Nicholas; Rahman, Ayesha; Campbell, Moray J; Guindani, Michele; Bicknell, Roy; Vannucci, Marina; Falciani, Francesco
2016-04-01
The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks in a wide spectrum of biological systems.
A Multi-Method Approach for Proteomic Network Inference in 11 Human Cancers.
Şenbabaoğlu, Yasin; Sümer, Selçuk Onur; Sánchez-Vega, Francisco; Bemis, Debra; Ciriello, Giovanni; Schultz, Nikolaus; Sander, Chris
2016-02-01
Protein expression and post-translational modification levels are tightly regulated in neoplastic cells to maintain cellular processes known as 'cancer hallmarks'. The first Pan-Cancer initiative of The Cancer Genome Atlas (TCGA) Research Network has aggregated protein expression profiles for 3,467 patient samples from 11 tumor types using the antibody based reverse phase protein array (RPPA) technology. The resultant proteomic data can be utilized to computationally infer protein-protein interaction (PPI) networks and to study the commonalities and differences across tumor types. In this study, we compare the performance of 13 established network inference methods in their capacity to retrieve the curated Pathway Commons interactions from RPPA data. We observe that no single method has the best performance in all tumor types, but a group of six methods, including diverse techniques such as correlation, mutual information, and regression, consistently rank highly among the tested methods. We utilize the high performing methods to obtain a consensus network; and identify four robust and densely connected modules that reveal biological processes as well as suggest antibody-related technical biases. Mapping the consensus network interactions to Reactome gene lists confirms the pan-cancer importance of signal transduction pathways, innate and adaptive immune signaling, cell cycle, metabolism, and DNA repair; and also suggests several biological processes that may be specific to a subset of tumor types. Our results illustrate the utility of the RPPA platform as a tool to study proteomic networks in cancer.
Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses
Stephen, Emily P.; Lepage, Kyle Q.; Eden, Uri T.; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S.; Guenther, Frank H.; Kramer, Mark A.
2014-01-01
The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty—both in the functional network edges and the corresponding aggregate measures of network topology—are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here—appropriate for static and dynamic network inference and different statistical measures of coupling—permits the evaluation of confidence in network measures in a variety of settings common to neuroscience. PMID:24678295
Assessing dynamics, spatial scale, and uncertainty in task-related brain network analyses.
Stephen, Emily P; Lepage, Kyle Q; Eden, Uri T; Brunner, Peter; Schalk, Gerwin; Brumberg, Jonathan S; Guenther, Frank H; Kramer, Mark A
2014-01-01
The brain is a complex network of interconnected elements, whose interactions evolve dynamically in time to cooperatively perform specific functions. A common technique to probe these interactions involves multi-sensor recordings of brain activity during a repeated task. Many techniques exist to characterize the resulting task-related activity, including establishing functional networks, which represent the statistical associations between brain areas. Although functional network inference is commonly employed to analyze neural time series data, techniques to assess the uncertainty-both in the functional network edges and the corresponding aggregate measures of network topology-are lacking. To address this, we describe a statistically principled approach for computing uncertainty in functional networks and aggregate network measures in task-related data. The approach is based on a resampling procedure that utilizes the trial structure common in experimental recordings. We show in simulations that this approach successfully identifies functional networks and associated measures of confidence emergent during a task in a variety of scenarios, including dynamically evolving networks. In addition, we describe a principled technique for establishing functional networks based on predetermined regions of interest using canonical correlation. Doing so provides additional robustness to the functional network inference. Finally, we illustrate the use of these methods on example invasive brain voltage recordings collected during an overt speech task. The general strategy described here-appropriate for static and dynamic network inference and different statistical measures of coupling-permits the evaluation of confidence in network measures in a variety of settings common to neuroscience.
DeepInfer: Open-Source Deep Learning Deployment Toolkit for Image-Guided Therapy
Mehrtash, Alireza; Pesteie, Mehran; Hetherington, Jorden; Behringer, Peter A.; Kapur, Tina; Wells, William M.; Rohling, Robert; Fedorov, Andriy; Abolmaesumi, Purang
2017-01-01
Deep learning models have outperformed some of the previous state-of-the-art approaches in medical image analysis. Instead of using hand-engineered features, deep models attempt to automatically extract hierarchical representations at multiple levels of abstraction from the data. Therefore, deep models are usually considered to be more flexible and robust solutions for image analysis problems compared to conventional computer vision models. They have demonstrated significant improvements in computer-aided diagnosis and automatic medical image analysis applied to such tasks as image segmentation, classification and registration. However, deploying deep learning models often has a steep learning curve and requires detailed knowledge of various software packages. Thus, many deep models have not been integrated into the clinical research workflows causing a gap between the state-of-the-art machine learning in medical applications and evaluation in clinical research procedures. In this paper, we propose “DeepInfer” – an open-source toolkit for developing and deploying deep learning models within the 3D Slicer medical image analysis platform. Utilizing a repository of task-specific models, DeepInfer allows clinical researchers and biomedical engineers to deploy a trained model selected from the public registry, and apply it to new data without the need for software development or configuration. As two practical use cases, we demonstrate the application of DeepInfer in prostate segmentation for targeted MRI-guided biopsy and identification of the target plane in 3D ultrasound for spinal injections. PMID:28615794
Multiple Illuminant Colour Estimation via Statistical Inference on Factor Graphs.
Mutimbu, Lawrence; Robles-Kelly, Antonio
2016-08-31
This paper presents a method to recover a spatially varying illuminant colour estimate from scenes lit by multiple light sources. Starting with the image formation process, we formulate the illuminant recovery problem in a statistically datadriven setting. To do this, we use a factor graph defined across the scale space of the input image. In the graph, we utilise a set of illuminant prototypes computed using a data driven approach. As a result, our method delivers a pixelwise illuminant colour estimate being devoid of libraries or user input. The use of a factor graph also allows for the illuminant estimates to be recovered making use of a maximum a posteriori (MAP) inference process. Moreover, we compute the probability marginals by performing a Delaunay triangulation on our factor graph. We illustrate the utility of our method for pixelwise illuminant colour recovery on widely available datasets and compare against a number of alternatives. We also show sample colour correction results on real-world images.
NASA Astrophysics Data System (ADS)
Barati Farimani, Amir; Gomes, Joseph; Pande, Vijay
2017-11-01
We have developed a new data-driven model paradigm for the rapid inference and solution of the constitutive equations of fluid mechanic by deep learning models. Using generative adversarial networks (GAN), we train models for the direct generation of solutions to steady state heat conduction and incompressible fluid flow without knowledge of the underlying governing equations. Rather than using artificial neural networks to approximate the solution of the constitutive equations, GANs can directly generate the solutions to these equations conditional upon an arbitrary set of boundary conditions. Both models predict temperature, velocity and pressure fields with great test accuracy (>99.5%). The application of our framework for inferring and generating the solutions of partial differential equations can be applied to any physical phenomena and can be used to learn directly from experiments where the underlying physical model is complex or unknown. We also have shown that our framework can be used to couple multiple physics simultaneously, making it amenable to tackle multi-physics problems.
Linking Europa’s Plume Activity to Tides, Tectonics, and Liquid Water
NASA Astrophysics Data System (ADS)
Rhoden, Alyssa R.; Hurford, Terry; Roth, Lorenz; Retherford, Kurt
2014-11-01
Much of the geologic activity preserved on Europa’s icy surface has been attributed to tidal deformation, mainly due to Europa’s eccentric orbit. Although the surface is geologically young, evidence of ongoing tidally-driven processes has been lacking. However, a recent observation of water vapor near Europa’s south pole suggests that it may be geologically active. Non-detections in previous and follow-up observations indicate a temporal variation in plume visibility and suggests a relationship to Europa’s tidal cycle. Similarly, the Cassini spacecraft has observed plumes emanating from the south pole of Saturn’s moon, Enceladus, and variability in the intensity of eruptions has been linked to its tidal cycle. The inference that a similar mechanism controls plumes at both Europa and Enceladus motivates further analysis of Europa’s plume behavior and the relationship between plumes, tides, and liquid water on these two satellites.We determine the locations and orientations of hypothetical tidally-driven fractures that best match the temporal variability of the plumes observed at Europa. Specifically, we identify model faults that are in tension at the time in Europa’s orbit when a plume was detected and in compression at times when the plume was not detected. We find that tidal stress driven solely by eccentricity is incompatible with the observations unless additional mechanisms are controlling the eruption timing or restricting the longevity of the plumes. In contrast, the addition of obliquity tides, and corresponding precession of the spin pole, can generate a number of model faults that are consistent with the pattern of plume detections. The locations and orientations of the model faults are robust across a broad range of precession rates and spin pole directions. Analysis of the stress variations across model faults suggests that the plumes would be best observed earlier in Europa’s orbit. Our results indicate that Europa’s plumes, if confirmed, differ in many respects from the Enceladean plumes and that either active fractures or volatile sources are rare.
Conversion of Phase Information into a Spike-Count Code by Bursting Neurons
Samengo, Inés; Montemurro, Marcelo A.
2010-01-01
Single neurons in the cerebral cortex are immersed in a fluctuating electric field, the local field potential (LFP), which mainly originates from synchronous synaptic input into the local neural neighborhood. As shown by recent studies in visual and auditory cortices, the angular phase of the LFP at the time of spike generation adds significant extra information about the external world, beyond the one contained in the firing rate alone. However, no biologically plausible mechanism has yet been suggested that allows downstream neurons to infer the phase of the LFP at the soma of their pre-synaptic afferents. Therefore, so far there is no evidence that the nervous system can process phase information. Here we study a model of a bursting pyramidal neuron, driven by a time-dependent stimulus. We show that the number of spikes per burst varies systematically with the phase of the fluctuating input at the time of burst onset. The mapping between input phase and number of spikes per burst is a robust response feature for a broad range of stimulus statistics. Our results suggest that cortical bursting neurons could play a crucial role in translating LFP phase information into an easily decodable spike count code. PMID:20300632
Mediterranean sea water budget long-term trend inferred from salinity observations
NASA Astrophysics Data System (ADS)
Skliris, N.; Zika, J. D.; Herold, L.; Josey, S. A.; Marsh, R.
2018-01-01
Changes in the Mediterranean water cycle since 1950 are investigated using salinity and reanalysis based air-sea freshwater flux datasets. Salinity observations indicate a strong basin-scale multi-decadal salinification, particularly in the intermediate and deep layers. Evaporation, precipitation and river runoff variations are all shown to contribute to a very strong increase in net evaporation of order 20-30%. While large temporal uncertainties and discrepancies are found between E-P multi-decadal trend patterns in the reanalysis datasets, a more robust and spatially coherent structure of multi-decadal change is obtained for the salinity field. Salinity change implies an increase in net evaporation of 8 to 12% over 1950-2010, which is considerably lower than that suggested by air-sea freshwater flux products, but still largely exceeding estimates of global water cycle amplification. A new method based on water mass transformation theory is used to link changes in net evaporation over the Mediterranean Sea with changes in the volumetric distribution of salinity. The water mass transformation distribution in salinity coordinates suggests that the Mediterranean basin salinification is driven by changes in the regional water cycle rather than changes in salt transports at the straits.
Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.
Li, Yan; Gu, Leon; Kanade, Takeo
2011-09-01
Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.
Díaz-Rodríguez, Natalia; Cadahía, Olmo León; Cuéllar, Manuel Pegalajar; Lilius, Johan; Calvo-Flores, Miguel Delgado
2014-01-01
Human activity recognition is a key task in ambient intelligence applications to achieve proper ambient assisted living. There has been remarkable progress in this domain, but some challenges still remain to obtain robust methods. Our goal in this work is to provide a system that allows the modeling and recognition of a set of complex activities in real life scenarios involving interaction with the environment. The proposed framework is a hybrid model that comprises two main modules: a low level sub-activity recognizer, based on data-driven methods, and a high-level activity recognizer, implemented with a fuzzy ontology to include the semantic interpretation of actions performed by users. The fuzzy ontology is fed by the sub-activities recognized by the low level data-driven component and provides fuzzy ontological reasoning to recognize both the activities and their influence in the environment with semantics. An additional benefit of the approach is the ability to handle vagueness and uncertainty in the knowledge-based module, which substantially outperforms the treatment of incomplete and/or imprecise data with respect to classic crisp ontologies. We validate these advantages with the public CAD-120 dataset (Cornell Activity Dataset), achieving an accuracy of 90.1% and 91.07% for low-level and high-level activities, respectively. This entails an improvement over fully data-driven or ontology-based approaches. PMID:25268914
Marginal Structural Models with Counterfactual Effect Modifiers.
Zheng, Wenjing; Luo, Zhehui; van der Laan, Mark J
2018-06-08
In health and social sciences, research questions often involve systematic assessment of the modification of treatment causal effect by patient characteristics. In longitudinal settings, time-varying or post-intervention effect modifiers are also of interest. In this work, we investigate the robust and efficient estimation of the Counterfactual-History-Adjusted Marginal Structural Model (van der Laan MJ, Petersen M. Statistical learning of origin-specific statically optimal individualized treatment rules. Int J Biostat. 2007;3), which models the conditional intervention-specific mean outcome given a counterfactual modifier history in an ideal experiment. We establish the semiparametric efficiency theory for these models, and present a substitution-based, semiparametric efficient and doubly robust estimator using the targeted maximum likelihood estimation methodology (TMLE, e.g. van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. Int J Biostat. 2006;2, van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data, 1st ed. Springer Series in Statistics. Springer, 2011). To facilitate implementation in applications where the effect modifier is high dimensional, our third contribution is a projected influence function (and the corresponding projected TMLE estimator), which retains most of the robustness of its efficient peer and can be easily implemented in applications where the use of the efficient influence function becomes taxing. We compare the projected TMLE estimator with an Inverse Probability of Treatment Weighted estimator (e.g. Robins JM. Marginal structural models. In: Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1-10. 1997a, Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. 2000;11:561-570), and a non-targeted G-computation estimator (Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Math Modell. 1986;7:1393-1512.). The comparative performance of these estimators is assessed in a simulation study. The use of the projected TMLE estimator is illustrated in a secondary data analysis for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial where effect modifiers are subject to missing at random.
Sun, Xiaoqiang; Xian, Huifang; Tian, Shuo; Sun, Tingzhe; Qin, Yunfei; Zhang, Shoutao; Cui, Jun
2016-07-08
RIG-I is an essential receptor in the initiation of the type I interferon (IFN) signaling pathway upon viral infection. Although K63-linked ubiquitination plays an important role in RIG-I activation, the optimal modulation of conjugated and unanchored ubiquitination of RIG-I as well as its functional implications remains unclear. In this study, we determined that, in contrast to the RIG-I CARD domain, full-length RIG-I must undergo K63-linked ubiquitination at multiple sites to reach full activity. A systems biology approach was designed based on experiments using full-length RIG-I. Model selection for 7 candidate mechanisms of RIG-I ubiquitination inferred a hierarchical architecture of the RIG-I ubiquitination mode, which was then experimentally validated. Compared with other mechanisms, the selected hierarchical mechanism exhibited superior sensitivity and robustness in RIG-I-induced type I IFN activation. Furthermore, our model analysis and experimental data revealed that TRIM4 and TRIM25 exhibited dose-dependent synergism. These results demonstrated that the hierarchical mechanism of multi-site/type ubiquitination of RIG-I provides an efficient, robust and optimal synergistic regulatory module in antiviral immune responses.
Sun, Xiaoqiang; Xian, Huifang; Tian, Shuo; Sun, Tingzhe; Qin, Yunfei; Zhang, Shoutao; Cui, Jun
2016-01-01
RIG-I is an essential receptor in the initiation of the type I interferon (IFN) signaling pathway upon viral infection. Although K63-linked ubiquitination plays an important role in RIG-I activation, the optimal modulation of conjugated and unanchored ubiquitination of RIG-I as well as its functional implications remains unclear. In this study, we determined that, in contrast to the RIG-I CARD domain, full-length RIG-I must undergo K63-linked ubiquitination at multiple sites to reach full activity. A systems biology approach was designed based on experiments using full-length RIG-I. Model selection for 7 candidate mechanisms of RIG-I ubiquitination inferred a hierarchical architecture of the RIG-I ubiquitination mode, which was then experimentally validated. Compared with other mechanisms, the selected hierarchical mechanism exhibited superior sensitivity and robustness in RIG-I-induced type I IFN activation. Furthermore, our model analysis and experimental data revealed that TRIM4 and TRIM25 exhibited dose-dependent synergism. These results demonstrated that the hierarchical mechanism of multi-site/type ubiquitination of RIG-I provides an efficient, robust and optimal synergistic regulatory module in antiviral immune responses. PMID:27387525
NASA Astrophysics Data System (ADS)
Sun, Xiaoqiang; Xian, Huifang; Tian, Shuo; Sun, Tingzhe; Qin, Yunfei; Zhang, Shoutao; Cui, Jun
2016-07-01
RIG-I is an essential receptor in the initiation of the type I interferon (IFN) signaling pathway upon viral infection. Although K63-linked ubiquitination plays an important role in RIG-I activation, the optimal modulation of conjugated and unanchored ubiquitination of RIG-I as well as its functional implications remains unclear. In this study, we determined that, in contrast to the RIG-I CARD domain, full-length RIG-I must undergo K63-linked ubiquitination at multiple sites to reach full activity. A systems biology approach was designed based on experiments using full-length RIG-I. Model selection for 7 candidate mechanisms of RIG-I ubiquitination inferred a hierarchical architecture of the RIG-I ubiquitination mode, which was then experimentally validated. Compared with other mechanisms, the selected hierarchical mechanism exhibited superior sensitivity and robustness in RIG-I-induced type I IFN activation. Furthermore, our model analysis and experimental data revealed that TRIM4 and TRIM25 exhibited dose-dependent synergism. These results demonstrated that the hierarchical mechanism of multi-site/type ubiquitination of RIG-I provides an efficient, robust and optimal synergistic regulatory module in antiviral immune responses.
On Statistical Analysis of Neuroimages with Imperfect Registration
Kim, Won Hwa; Ravi, Sathya N.; Johnson, Sterling C.; Okonkwo, Ozioma C.; Singh, Vikas
2016-01-01
A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful comparison of the intensities at each voxel across groups (e.g., diseased versus healthy) to evaluate the effects of the disease and/or use machine learning algorithms in a subsequent step. But errors in the underlying registration make this problematic, they either decrease the statistical power or make the follow-up inference tasks less effective/accurate. In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. By deriving a deformation invariant representation of the image, the downstream analysis can be made more robust as if one had access to a (hypothetical) far superior registration procedure. Our algorithm is based on recent work on scattering transform. Using this as a starting point, we show how results from harmonic analysis (especially, non-Euclidean wavelets) yields strategies for designing deformation and additive noise invariant representations of large 3-D brain image volumes. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. PMID:27042168
Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.
Cormack, James; Nath, Chinmoy; Milward, David; Raja, Kalpana; Jonnalagadda, Siddhartha R
2015-12-01
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system. Copyright © 2015 Elsevier Inc. All rights reserved.
Fourtune, Lisa; Prunier, Jérôme G; Paz-Vinas, Ivan; Loot, Géraldine; Veyssière, Charlotte; Blanchet, Simon
2018-04-01
Identifying landscape features that affect functional connectivity among populations is a major challenge in fundamental and applied sciences. Landscape genetics combines landscape and genetic data to address this issue, with the main objective of disentangling direct and indirect relationships among an intricate set of variables. Causal modeling has strong potential to address the complex nature of landscape genetic data sets. However, this statistical approach was not initially developed to address the pairwise distance matrices commonly used in landscape genetics. Here, we aimed to extend the applicability of two causal modeling methods-that is, maximum-likelihood path analysis and the directional separation test-by developing statistical approaches aimed at handling distance matrices and improving functional connectivity inference. Using simulations, we showed that these approaches greatly improved the robustness of the absolute (using a frequentist approach) and relative (using an information-theoretic approach) fits of the tested models. We used an empirical data set combining genetic information on a freshwater fish species (Gobio occitaniae) and detailed landscape descriptors to demonstrate the usefulness of causal modeling to identify functional connectivity in wild populations. Specifically, we demonstrated how direct and indirect relationships involving altitude, temperature, and oxygen concentration influenced within- and between-population genetic diversity of G. occitaniae.
Annual Rainfall Forecasting by Using Mamdani Fuzzy Inference System
NASA Astrophysics Data System (ADS)
Fallah-Ghalhary, G.-A.; Habibi Nokhandan, M.; Mousavi Baygi, M.
2009-04-01
Long-term rainfall prediction is very important to countries thriving on agro-based economy. In general, climate and rainfall are highly non-linear phenomena in nature giving rise to what is known as "butterfly effect". The parameters that are required to predict the rainfall are enormous even for a short period. Soft computing is an innovative approach to construct computationally intelligent systems that are supposed to possess humanlike expertise within a specific domain, adapt themselves and learn to do better in changing environments, and explain how they make decisions. Unlike conventional artificial intelligence techniques the guiding principle of soft computing is to exploit tolerance for imprecision, uncertainty, robustness, partial truth to achieve tractability, and better rapport with reality. In this paper, 33 years of rainfall data analyzed in khorasan state, the northeastern part of Iran situated at latitude-longitude pairs (31°-38°N, 74°- 80°E). this research attempted to train Fuzzy Inference System (FIS) based prediction models with 33 years of rainfall data. For performance evaluation, the model predicted outputs were compared with the actual rainfall data. Simulation results reveal that soft computing techniques are promising and efficient. The test results using by FIS model showed that the RMSE was obtained 52 millimeter.
Density Imaging of Puy de Dôme Volcano by Joint Inversion of Muographic and Gravimetric Data
NASA Astrophysics Data System (ADS)
Barnoud, A.; Niess, V.; Le Ménédeu, E.; Cayol, V.; Carloganu, C.
2016-12-01
We aim at jointly inverting high density muographic and gravimetric data to robustly infer the density structure of volcanoes. We use the puy de Dôme volcano in France as a proof of principle since high quality data sets are available for both muography and gravimetry. Gravimetric inversion and muography are independent methods that provide an estimation of density distributions. On the one hand, gravimetry allows to reconstruct 3D density variations by inversion. This process is well known to be ill-posed and intrinsically non unique, thus it requires additional constraints (eg. a priori density model). On the other hand, muography provides a direct measurement of 2D mean densities (radiographic images) from the detection of high energy atmospheric muons crossing the volcanic edifice. 3D density distributions can be computed from several radiographic images, but the number of images is generally limited by field constraints and by the limited number of available telescopes. Thus, muon tomography is also ill-posed in practice.In the case of the puy de Dôme volcano, the density structures inferred from gravimetric data (Portal et al. 2016) and from muographic data (Le Ménédeu et al. 2016) show a qualitative agreement but cannot be compared quantitatively. Because each method has different intrinsic resolutions due to the physics (Jourde et al., 2015), the joint inversion is expected to improve the robustness of the inversion. Such joint inversion has already been applied in a volcanic context (Nishiyama et al., 2013).Volcano muography requires state-of-art, high-resolution and large-scale muon detectors (Ambrosino et al., 2015). Instrumental uncertainties and systematic errors may constitute an important limitation for muography and should not be overlooked. For instance, low-energy muons are detected together with ballistic high-energy muons, decreasing the measured value of the mean density closed to the topography.Here, we jointly invert the gravimetric and muographic data to characterize the 3D density distribution of the puy de Dôme volcano. We attempt to precisely identify and estimate the different uncertainties and systematic errors so that they can be accounted for in the inversion scheme.
Towards Inferring Protein Interactions: Challenges and Solutions
NASA Astrophysics Data System (ADS)
Zhang, Ya; Zha, Hongyuan; Chu, Chao-Hsien; Ji, Xiang
2006-12-01
Discovering interacting proteins has been an essential part of functional genomics. However, existing experimental techniques only uncover a small portion of any interactome. Furthermore, these data often have a very high false rate. By conceptualizing the interactions at domain level, we provide a more abstract representation of interactome, which also facilitates the discovery of unobserved protein-protein interactions. Although several domain-based approaches have been proposed to predict protein-protein interactions, they usually assume that domain interactions are independent on each other for the convenience of computational modeling. A new framework to predict protein interactions is proposed in this paper, where no assumption is made about domain interactions. Protein interactions may be the result of multiple domain interactions which are dependent on each other. A conjunctive norm form representation is used to capture the relationships between protein interactions and domain interactions. The problem of interaction inference is then modeled as a constraint satisfiability problem and solved via linear programing. Experimental results on a combined yeast data set have demonstrated the robustness and the accuracy of the proposed algorithm. Moreover, we also map some predicted interacting domains to three-dimensional structures of protein complexes to show the validity of our predictions.
Probing the Small-scale Structure in Strongly Lensed Systems via Transdimensional Inference
NASA Astrophysics Data System (ADS)
Daylan, Tansu; Cyr-Racine, Francis-Yan; Diaz Rivero, Ana; Dvorkin, Cora; Finkbeiner, Douglas P.
2018-02-01
Strong lensing is a sensitive probe of the small-scale density fluctuations in the Universe. We implement a pipeline to model strongly lensed systems using probabilistic cataloging, which is a transdimensional, hierarchical, and Bayesian framework to sample from a metamodel (union of models with different dimensionality) consistent with observed photon count maps. Probabilistic cataloging allows one to robustly characterize modeling covariances within and across lens models with different numbers of subhalos. Unlike traditional cataloging of subhalos, it does not require model subhalos to improve the goodness of fit above the detection threshold. Instead, it allows the exploitation of all information contained in the photon count maps—for instance, when constraining the subhalo mass function. We further show that, by not including these small subhalos in the lens model, fixed-dimensional inference methods can significantly mismodel the data. Using a simulated Hubble Space Telescope data set, we show that the subhalo mass function can be probed even when many subhalos in the sample catalogs are individually below the detection threshold and would be absent in a traditional catalog. The implemented software, Probabilistic Cataloger (PCAT) is made publicly available at https://github.com/tdaylan/pcat.
Modeling Post-death Transmission of Ebola: Challenges for Inference and Opportunities for Control
NASA Astrophysics Data System (ADS)
Weitz, Joshua S.; Dushoff, Jonathan
2015-03-01
Multiple epidemiological models have been proposed to predict the spread of Ebola in West Africa. These models include consideration of counter-measures meant to slow and, eventually, stop the spread of the disease. Here, we examine one component of Ebola dynamics that is of ongoing concern - the transmission of Ebola from the dead to the living. We do so by applying the toolkit of mathematical epidemiology to analyze the consequences of post-death transmission. We show that underlying disease parameters cannot be inferred with confidence from early-stage incidence data (that is, they are not ``identifiable'') because different parameter combinations can produce virtually the same epidemic trajectory. Despite this identifiability problem, we find robustly that inferences that don't account for post-death transmission tend to underestimate the basic reproductive number - thus, given the observed rate of epidemic growth, larger amounts of post-death transmission imply larger reproductive numbers. From a control perspective, we explain how improvements in reducing post-death transmission of Ebola may reduce the overall epidemic spread and scope substantially. Increased attention to the proportion of post-death transmission has the potential to aid both in projecting the course of the epidemic and in evaluating a portfolio of control strategies.