Sample records for algorithm inferring functional

  1. Image-Data Compression Using Edge-Optimizing Algorithm for WFA Inference.

    ERIC Educational Resources Information Center

    Culik, Karel II; Kari, Jarkko

    1994-01-01

    Presents an inference algorithm that produces a weighted finite automata (WFA), in particular, the grayness functions of graytone images. Image-data compression results based on the new inference algorithm produces a WFA with a relatively small number of edges. Image-data compression results alone and in combination with wavelets are discussed.…

  2. Gene-network inference by message passing

    NASA Astrophysics Data System (ADS)

    Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.

    2008-01-01

    The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.

  3. Analytic continuation of quantum Monte Carlo data by stochastic analytical inference.

    PubMed

    Fuchs, Sebastian; Pruschke, Thomas; Jarrell, Mark

    2010-05-01

    We present an algorithm for the analytic continuation of imaginary-time quantum Monte Carlo data which is strictly based on principles of Bayesian statistical inference. Within this framework we are able to obtain an explicit expression for the calculation of a weighted average over possible energy spectra, which can be evaluated by standard Monte Carlo simulations, yielding as by-product also the distribution function as function of the regularization parameter. Our algorithm thus avoids the usual ad hoc assumptions introduced in similar algorithms to fix the regularization parameter. We apply the algorithm to imaginary-time quantum Monte Carlo data and compare the resulting energy spectra with those from a standard maximum-entropy calculation.

  4. Efficient Exact Inference With Loss Augmented Objective in Structured Learning.

    PubMed

    Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert

    2016-08-19

    Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.

  5. Marginal Consistency: Upper-Bounding Partition Functions over Commutative Semirings.

    PubMed

    Werner, Tomás

    2015-07-01

    Many inference tasks in pattern recognition and artificial intelligence lead to partition functions in which addition and multiplication are abstract binary operations forming a commutative semiring. By generalizing max-sum diffusion (one of convergent message passing algorithms for approximate MAP inference in graphical models), we propose an iterative algorithm to upper bound such partition functions over commutative semirings. The iteration of the algorithm is remarkably simple: change any two factors of the partition function such that their product remains the same and their overlapping marginals become equal. In many commutative semirings, repeating this iteration for different pairs of factors converges to a fixed point when the overlapping marginals of every pair of factors coincide. We call this state marginal consistency. During that, an upper bound on the partition function monotonically decreases. This abstract algorithm unifies several existing algorithms, including max-sum diffusion and basic constraint propagation (or local consistency) algorithms in constraint programming. We further construct a hierarchy of marginal consistencies of increasingly higher levels and show than any such level can be enforced by adding identity factors of higher arity (order). Finally, we discuss instances of the framework for several semirings, including the distributive lattice and the max-sum and sum-product semirings.

  6. Transcriptional network inference from functional similarity and expression data: a global supervised approach.

    PubMed

    Ambroise, Jérôme; Robert, Annie; Macq, Benoit; Gala, Jean-Luc

    2012-01-06

    An important challenge in system biology is the inference of biological networks from postgenomic data. Among these biological networks, a gene transcriptional regulatory network focuses on interactions existing between transcription factors (TFs) and and their corresponding target genes. A large number of reverse engineering algorithms were proposed to infer such networks from gene expression profiles, but most current methods have relatively low predictive performances. In this paper, we introduce the novel TNIFSED method (Transcriptional Network Inference from Functional Similarity and Expression Data), that infers a transcriptional network from the integration of correlations and partial correlations of gene expression profiles and gene functional similarities through a supervised classifier. In the current work, TNIFSED was applied to predict the transcriptional network in Escherichia coli and in Saccharomyces cerevisiae, using datasets of 445 and 170 affymetrix arrays, respectively. Using the area under the curve of the receiver operating characteristics and the F-measure as indicators, we showed the predictive performance of TNIFSED to be better than unsupervised state-of-the-art methods. TNIFSED performed slightly worse than the supervised SIRENE algorithm for the target genes identification of the TF having a wide range of yet identified target genes but better for TF having only few identified target genes. Our results indicate that TNIFSED is complementary to the SIRENE algorithm, and particularly suitable to discover target genes of "orphan" TFs.

  7. Approximation Of Multi-Valued Inverse Functions Using Clustering And Sugeno Fuzzy Inference

    NASA Technical Reports Server (NTRS)

    Walden, Maria A.; Bikdash, Marwan; Homaifar, Abdollah

    1998-01-01

    Finding the inverse of a continuous function can be challenging and computationally expensive when the inverse function is multi-valued. Difficulties may be compounded when the function itself is difficult to evaluate. We show that we can use fuzzy-logic approximators such as Sugeno inference systems to compute the inverse on-line. To do so, a fuzzy clustering algorithm can be used in conjunction with a discriminating function to split the function data into branches for the different values of the forward function. These data sets are then fed into a recursive least-squares learning algorithm that finds the proper coefficients of the Sugeno approximators; each Sugeno approximator finds one value of the inverse function. Discussions about the accuracy of the approximation will be included.

  8. Methodology for the inference of gene function from phenotype data.

    PubMed

    Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

    2014-12-12

    Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.

  9. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees.

    PubMed

    Stolzer, Maureen; Lai, Han; Xu, Minli; Sathaye, Deepa; Vernot, Benjamin; Durand, Dannie

    2012-09-15

    Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes. We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions. Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at http://www.cs.cmu.edu/~durand/Notung. mstolzer@andrew.cmu.edu.

  10. A Prize-Collecting Steiner Tree Approach for Transduction Network Inference

    NASA Astrophysics Data System (ADS)

    Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo

    Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.

  11. Learning control of inverted pendulum system by neural network driven fuzzy reasoning: The learning function of NN-driven fuzzy reasoning under changes of reasoning environment

    NASA Technical Reports Server (NTRS)

    Hayashi, Isao; Nomura, Hiroyoshi; Wakami, Noboru

    1991-01-01

    Whereas conventional fuzzy reasonings are associated with tuning problems, which are lack of membership functions and inference rule designs, a neural network driven fuzzy reasoning (NDF) capable of determining membership functions by neural network is formulated. In the antecedent parts of the neural network driven fuzzy reasoning, the optimum membership function is determined by a neural network, while in the consequent parts, an amount of control for each rule is determined by other plural neural networks. By introducing an algorithm of neural network driven fuzzy reasoning, inference rules for making a pendulum stand up from its lowest suspended point are determined for verifying the usefulness of the algorithm.

  12. Analysis of Bioactive Amino Acids from Fish Hydrolysates with a New Bioinformatic Intelligent System Approach.

    PubMed

    Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu

    2017-09-07

    The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.

  13. Robust functional regression model for marginal mean and subject-specific inferences.

    PubMed

    Cao, Chunzheng; Shi, Jian Qing; Lee, Youngjo

    2017-01-01

    We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well as interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals (PIs) for conditional mean curves. Numerical studies show that the proposed model provides a robust approach against data contamination or distribution misspecification, and the proposed PIs maintain the nominal confidence levels. A real data application is presented as an illustrative example.

  14. Inhomogeneous Poisson process rate function inference from dead-time limited observations.

    PubMed

    Verma, Gunjan; Drost, Robert J

    2017-05-01

    The estimation of an inhomogeneous Poisson process (IHPP) rate function from a set of process observations is an important problem arising in optical communications and a variety of other applications. However, because of practical limitations of detector technology, one is often only able to observe a corrupted version of the original process. In this paper, we consider how inference of the rate function is affected by dead time, a period of time after the detection of an event during which a sensor is insensitive to subsequent IHPP events. We propose a flexible nonparametric Bayesian approach to infer an IHPP rate function given dead-time limited process realizations. Simulation results illustrate the effectiveness of our inference approach and suggest its ability to extend the utility of existing sensor technology by permitting more accurate inference on signals whose observations are dead-time limited. We apply our inference algorithm to experimentally collected optical communications data, demonstrating the practical utility of our approach in the context of channel modeling and validation.

  15. A comparison of algorithms for inference and learning in probabilistic graphical models.

    PubMed

    Frey, Brendan J; Jojic, Nebojsa

    2005-09-01

    Research into methods for reasoning under uncertainty is currently one of the most exciting areas of artificial intelligence, largely because it has recently become possible to record, store, and process large amounts of data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition, face detection, speaker identification, and prediction of gene function, it is even more exciting that researchers are on the verge of introducing systems that can perform large-scale combinatorial analyses of data, decomposing the data into interacting components. For example, computational methods for automatic scene analysis are now emerging in the computer vision community. These methods decompose an input image into its constituent objects, lighting conditions, motion patterns, etc. Two of the main challenges are finding effective representations and models in specific applications and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms. We review exact techniques and various approximate, computationally efficient techniques, including iterated conditional modes, the expectation maximization (EM) algorithm, Gibbs sampling, the mean field method, variational techniques, structured variational techniques and the sum-product algorithm ("loopy" belief propagation). We describe how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.

  16. Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks

    NASA Astrophysics Data System (ADS)

    Sun, Wei; Chang, K. C.

    2005-05-01

    Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.

  17. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    PubMed

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  18. Inferring consistent functional interaction patterns from natural stimulus FMRI data

    PubMed Central

    Sun, Jiehuan; Hu, Xintao; Huang, Xiu; Liu, Yang; Li, Kaiming; Li, Xiang; Han, Junwei; Guo, Lei

    2014-01-01

    There has been increasing interest in how the human brain responds to natural stimulus such as video watching in the neuroimaging field. Along this direction, this paper presents our effort in inferring consistent and reproducible functional interaction patterns under natural stimulus of video watching among known functional brain regions identified by task-based fMRI. Then, we applied and compared four statistical approaches, including Bayesian network modeling with searching algorithms: greedy equivalence search (GES), Peter and Clark (PC) analysis, independent multiple greedy equivalence search (IMaGES), and the commonly used Granger causality analysis (GCA), to infer consistent and reproducible functional interaction patterns among these brain regions. It is interesting that a number of reliable and consistent functional interaction patterns were identified by the GES, PC and IMaGES algorithms in different participating subjects when they watched multiple video shots of the same semantic category. These interaction patterns are meaningful given current neuroscience knowledge and are reasonably reproducible across different brains and video shots. In particular, these consistent functional interaction patterns are supported by structural connections derived from diffusion tensor imaging (DTI) data, suggesting the structural underpinnings of consistent functional interactions. Our work demonstrates that specific consistent patterns of functional interactions among relevant brain regions might reflect the brain's fundamental mechanisms of online processing and comprehension of video messages. PMID:22440644

  19. Theory of mind broad and narrow: reasoning about social exchange engages ToM areas, precautionary reasoning does not.

    PubMed

    Ermer, Elsa; Guerin, Scott A; Cosmides, Leda; Tooby, John; Miller, Michael B

    2006-01-01

    Baron-Cohen (1995) proposed that the theory of mind (ToM) inference system evolved to promote strategic social interaction. Social exchange--a form of co-operation for mutual benefit--involves strategic social interaction and requires ToM inferences about the contents of other individuals' mental states, especially their desires, goals, and intentions. There are behavioral and neuropsychological dissociations between reasoning about social exchange and reasoning about equivalent problems tapping other, more general content domains. It has therefore been proposed that social exchange behavior is regulated by social contract algorithms: a domain-specific inference system that is functionally specialized for reasoning about social exchange. We report an fMRI study using the Wason selection task that provides further support for this hypothesis. Precautionary rules share so many properties with social exchange rules--they are conditional, deontic, and involve subjective utilities--that most reasoning theories claim they are processed by the same neurocomputational machinery. Nevertheless, neuroimaging shows that reasoning about social exchange activates brain areas not activated by reasoning about precautionary rules, and vice versa. As predicted, neural correlates of ToM (anterior and posterior temporal cortex) were activated when subjects interpreted social exchange rules, but not precautionary rules (where ToM inferences are unnecessary). We argue that the interaction between ToM and social contract algorithms can be reciprocal: social contract algorithms requires ToM inferences, but their functional logic also allows ToM inferences to be made. By considering interactions between ToM in the narrower sense (belief-desire reasoning) and all the social inference systems that create the logic of human social interaction--ones that enable as well as use inferences about the content of mental states--a broader conception of ToM may emerge: a computational model embodying a Theory of Human Nature (ToHN).

  20. A prior-based integrative framework for functional transcriptional regulatory network inference

    PubMed Central

    Siahpirani, Alireza F.

    2017-01-01

    Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550

  1. MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics.

    PubMed

    Helaers, Raphaël; Milinkovitch, Michel C

    2010-07-15

    The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org.

  2. MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics

    PubMed Central

    2010-01-01

    Background The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Results Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. Conclusions The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org. PMID:20633263

  3. Inferring Boolean network states from partial information

    PubMed Central

    2013-01-01

    Networks of molecular interactions regulate key processes in living cells. Therefore, understanding their functionality is a high priority in advancing biological knowledge. Boolean networks are often used to describe cellular networks mathematically and are fitted to experimental datasets. The fitting often results in ambiguities since the interpretation of the measurements is not straightforward and since the data contain noise. In order to facilitate a more reliable mapping between datasets and Boolean networks, we develop an algorithm that infers network trajectories from a dataset distorted by noise. We analyze our algorithm theoretically and demonstrate its accuracy using simulation and microarray expression data. PMID:24006954

  4. Inferring Gene Regulatory Networks by Singular Value Decomposition and Gravitation Field Algorithm

    PubMed Central

    Zheng, Ming; Wu, Jia-nan; Huang, Yan-xin; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

    2012-01-01

    Reconstruction of gene regulatory networks (GRNs) is of utmost interest and has become a challenge computational problem in system biology. However, every existing inference algorithm from gene expression profiles has its own advantages and disadvantages. In particular, the effectiveness and efficiency of every previous algorithm is not high enough. In this work, we proposed a novel inference algorithm from gene expression data based on differential equation model. In this algorithm, two methods were included for inferring GRNs. Before reconstructing GRNs, singular value decomposition method was used to decompose gene expression data, determine the algorithm solution space, and get all candidate solutions of GRNs. In these generated family of candidate solutions, gravitation field algorithm was modified to infer GRNs, used to optimize the criteria of differential equation model, and search the best network structure result. The proposed algorithm is validated on both the simulated scale-free network and real benchmark gene regulatory network in networks database. Both the Bayesian method and the traditional differential equation model were also used to infer GRNs, and the results were used to compare with the proposed algorithm in our work. And genetic algorithm and simulated annealing were also used to evaluate gravitation field algorithm. The cross-validation results confirmed the effectiveness of our algorithm, which outperforms significantly other previous algorithms. PMID:23226565

  5. Inference of gene regulatory networks from time series by Tsallis entropy

    PubMed Central

    2011-01-01

    Background The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/. PMID:21545720

  6. Inferring evolution of gene duplicates using probabilistic models and nonparametric belief propagation.

    PubMed

    Zeng, Jia; Hannenhalli, Sridhar

    2013-01-01

    Gene duplication, followed by functional evolution of duplicate genes, is a primary engine of evolutionary innovation. In turn, gene expression evolution is a critical component of overall functional evolution of paralogs. Inferring evolutionary history of gene expression among paralogs is therefore a problem of considerable interest. It also represents significant challenges. The standard approaches of evolutionary reconstruction assume that at an internal node of the duplication tree, the two duplicates evolve independently. However, because of various selection pressures functional evolution of the two paralogs may be coupled. The coupling of paralog evolution corresponds to three major fates of gene duplicates: subfunctionalization (SF), conserved function (CF) or neofunctionalization (NF). Quantitative analysis of these fates is of great interest and clearly influences evolutionary inference of expression. These two interrelated problems of inferring gene expression and evolutionary fates of gene duplicates have not been studied together previously and motivate the present study. Here we propose a novel probabilistic framework and algorithm to simultaneously infer (i) ancestral gene expression and (ii) the likely fate (SF, NF, CF) at each duplication event during the evolution of gene family. Using tissue-specific gene expression data, we develop a nonparametric belief propagation (NBP) algorithm to predict the ancestral expression level as a proxy for function, and describe a novel probabilistic model that relates the predicted and known expression levels to the possible evolutionary fates. We validate our model using simulation and then apply it to a genome-wide set of gene duplicates in human. Our results suggest that SF tends to be more frequent at the earlier stage of gene family expansion, while NF occurs more frequently later on.

  7. Community-based benchmarking improves spike rate inference from two-photon calcium imaging data.

    PubMed

    Berens, Philipp; Freeman, Jeremy; Deneux, Thomas; Chenkov, Nikolay; McColgan, Thomas; Speiser, Artur; Macke, Jakob H; Turaga, Srinivas C; Mineault, Patrick; Rupprecht, Peter; Gerhard, Stephan; Friedrich, Rainer W; Friedrich, Johannes; Paninski, Liam; Pachitariu, Marius; Harris, Kenneth D; Bolte, Ben; Machado, Timothy A; Ringach, Dario; Stone, Jasmine; Rogerson, Luke E; Sofroniew, Nicolas J; Reimer, Jacob; Froudarakis, Emmanouil; Euler, Thomas; Román Rosón, Miroslav; Theis, Lucas; Tolias, Andreas S; Bethge, Matthias

    2018-05-01

    In recent years, two-photon calcium imaging has become a standard tool to probe the function of neural circuits and to study computations in neuronal populations. However, the acquired signal is only an indirect measurement of neural activity due to the comparatively slow dynamics of fluorescent calcium indicators. Different algorithms for estimating spike rates from noisy calcium measurements have been proposed in the past, but it is an open question how far performance can be improved. Here, we report the results of the spikefinder challenge, launched to catalyze the development of new spike rate inference algorithms through crowd-sourcing. We present ten of the submitted algorithms which show improved performance compared to previously evaluated methods. Interestingly, the top-performing algorithms are based on a wide range of principles from deep neural networks to generative models, yet provide highly correlated estimates of the neural activity. The competition shows that benchmark challenges can drive algorithmic developments in neuroscience.

  8. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    PubMed

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  9. Receiver function deconvolution using transdimensional hierarchical Bayesian inference

    NASA Astrophysics Data System (ADS)

    Kolb, J. M.; Lekić, V.

    2014-06-01

    Teleseismic waves can convert from shear to compressional (Sp) or compressional to shear (Ps) across impedance contrasts in the subsurface. Deconvolving the parent waveforms (P for Ps or S for Sp) from the daughter waveforms (S for Ps or P for Sp) generates receiver functions which can be used to analyse velocity structure beneath the receiver. Though a variety of deconvolution techniques have been developed, they are all adversely affected by background and signal-generated noise. In order to take into account the unknown noise characteristics, we propose a method based on transdimensional hierarchical Bayesian inference in which both the noise magnitude and noise spectral character are parameters in calculating the likelihood probability distribution. We use a reversible-jump implementation of a Markov chain Monte Carlo algorithm to find an ensemble of receiver functions whose relative fits to the data have been calculated while simultaneously inferring the values of the noise parameters. Our noise parametrization is determined from pre-event noise so that it approximates observed noise characteristics. We test the algorithm on synthetic waveforms contaminated with noise generated from a covariance matrix obtained from observed noise. We show that the method retrieves easily interpretable receiver functions even in the presence of high noise levels. We also show that we can obtain useful estimates of noise amplitude and frequency content. Analysis of the ensemble solutions produced by our method can be used to quantify the uncertainties associated with individual receiver functions as well as with individual features within them, providing an objective way for deciding which features warrant geological interpretation. This method should make possible more robust inferences on subsurface structure using receiver function analysis, especially in areas of poor data coverage or under noisy station conditions.

  10. al3c: high-performance software for parameter inference using Approximate Bayesian Computation.

    PubMed

    Stram, Alexander H; Marjoram, Paul; Chen, Gary K

    2015-11-01

    The development of Approximate Bayesian Computation (ABC) algorithms for parameter inference which are both computationally efficient and scalable in parallel computing environments is an important area of research. Monte Carlo rejection sampling, a fundamental component of ABC algorithms, is trivial to distribute over multiple processors but is inherently inefficient. While development of algorithms such as ABC Sequential Monte Carlo (ABC-SMC) help address the inherent inefficiencies of rejection sampling, such approaches are not as easily scaled on multiple processors. As a result, current Bayesian inference software offerings that use ABC-SMC lack the ability to scale in parallel computing environments. We present al3c, a C++ framework for implementing ABC-SMC in parallel. By requiring only that users define essential functions such as the simulation model and prior distribution function, al3c abstracts the user from both the complexities of parallel programming and the details of the ABC-SMC algorithm. By using the al3c framework, the user is able to scale the ABC-SMC algorithm in parallel computing environments for his or her specific application, with minimal programming overhead. al3c is offered as a static binary for Linux and OS-X computing environments. The user completes an XML configuration file and C++ plug-in template for the specific application, which are used by al3c to obtain the desired results. Users can download the static binaries, source code, reference documentation and examples (including those in this article) by visiting https://github.com/ahstram/al3c. astram@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. NIFTY - Numerical Information Field Theory. A versatile PYTHON library for signal inference

    NASA Astrophysics Data System (ADS)

    Selig, M.; Bell, M. R.; Junklewitz, H.; Oppermann, N.; Reinecke, M.; Greiner, M.; Pachajoa, C.; Enßlin, T. A.

    2013-06-01

    NIFTy (Numerical Information Field Theory) is a software package designed to enable the development of signal inference algorithms that operate regardless of the underlying spatial grid and its resolution. Its object-oriented framework is written in Python, although it accesses libraries written in Cython, C++, and C for efficiency. NIFTy offers a toolkit that abstracts discretized representations of continuous spaces, fields in these spaces, and operators acting on fields into classes. Thereby, the correct normalization of operations on fields is taken care of automatically without concerning the user. This allows for an abstract formulation and programming of inference algorithms, including those derived within information field theory. Thus, NIFTy permits its user to rapidly prototype algorithms in 1D, and then apply the developed code in higher-dimensional settings of real world problems. The set of spaces on which NIFTy operates comprises point sets, n-dimensional regular grids, spherical spaces, their harmonic counterparts, and product spaces constructed as combinations of those. The functionality and diversity of the package is demonstrated by a Wiener filter code example that successfully runs without modification regardless of the space on which the inference problem is defined. NIFTy homepage http://www.mpa-garching.mpg.de/ift/nifty/; Excerpts of this paper are part of the NIFTy source code and documentation.

  12. Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information.

    PubMed

    Fan, Yue; Wang, Xiao; Peng, Qinke

    2017-01-01

    Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.

  13. Inference of neuronal network spike dynamics and topology from calcium imaging data

    PubMed Central

    Lütcke, Henry; Gerhard, Felipe; Zenke, Friedemann; Gerstner, Wulfram; Helmchen, Fritjof

    2013-01-01

    Two-photon calcium imaging enables functional analysis of neuronal circuits by inferring action potential (AP) occurrence (“spike trains”) from cellular fluorescence signals. It remains unclear how experimental parameters such as signal-to-noise ratio (SNR) and acquisition rate affect spike inference and whether additional information about network structure can be extracted. Here we present a simulation framework for quantitatively assessing how well spike dynamics and network topology can be inferred from noisy calcium imaging data. For simulated AP-evoked calcium transients in neocortical pyramidal cells, we analyzed the quality of spike inference as a function of SNR and data acquisition rate using a recently introduced peeling algorithm. Given experimentally attainable values of SNR and acquisition rate, neural spike trains could be reconstructed accurately and with up to millisecond precision. We then applied statistical neuronal network models to explore how remaining uncertainties in spike inference affect estimates of network connectivity and topological features of network organization. We define the experimental conditions suitable for inferring whether the network has a scale-free structure and determine how well hub neurons can be identified. Our findings provide a benchmark for future calcium imaging studies that aim to reliably infer neuronal network properties. PMID:24399936

  14. Functional equivalency inferred from "authoritative sources" in networks of homologous proteins.

    PubMed

    Natarajan, Shreedhar; Jakobsson, Eric

    2009-06-12

    A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.

  15. Functional Equivalency Inferred from “Authoritative Sources” in Networks of Homologous Proteins

    PubMed Central

    Natarajan, Shreedhar; Jakobsson, Eric

    2009-01-01

    A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods. PMID:19521530

  16. A Self-Organizing State-Space-Model Approach for Parameter Estimation in Hodgkin-Huxley-Type Models of Single Neurons

    PubMed Central

    Vavoulis, Dimitrios V.; Straub, Volko A.; Aston, John A. D.; Feng, Jianfeng

    2012-01-01

    Traditional approaches to the problem of parameter estimation in biophysical models of neurons and neural networks usually adopt a global search algorithm (for example, an evolutionary algorithm), often in combination with a local search method (such as gradient descent) in order to minimize the value of a cost function, which measures the discrepancy between various features of the available experimental data and model output. In this study, we approach the problem of parameter estimation in conductance-based models of single neurons from a different perspective. By adopting a hidden-dynamical-systems formalism, we expressed parameter estimation as an inference problem in these systems, which can then be tackled using a range of well-established statistical inference methods. The particular method we used was Kitagawa's self-organizing state-space model, which was applied on a number of Hodgkin-Huxley-type models using simulated or actual electrophysiological data. We showed that the algorithm can be used to estimate a large number of parameters, including maximal conductances, reversal potentials, kinetics of ionic currents, measurement and intrinsic noise, based on low-dimensional experimental data and sufficiently informative priors in the form of pre-defined constraints imposed on model parameters. The algorithm remained operational even when very noisy experimental data were used. Importantly, by combining the self-organizing state-space model with an adaptive sampling algorithm akin to the Covariance Matrix Adaptation Evolution Strategy, we achieved a significant reduction in the variance of parameter estimates. The algorithm did not require the explicit formulation of a cost function and it was straightforward to apply on compartmental models and multiple data sets. Overall, the proposed methodology is particularly suitable for resolving high-dimensional inference problems based on noisy electrophysiological data and, therefore, a potentially useful tool in the construction of biophysical neuron models. PMID:22396632

  17. cosmoabc: Likelihood-free inference for cosmology

    NASA Astrophysics Data System (ADS)

    Ishida, Emille E. O.; Vitenti, Sandro D. P.; Penna-Lima, Mariana; Trindade, Arlindo M.; Cisewski, Jessi; M.; de Souza, Rafael; Cameron, Ewan; Busti, Vinicius C.

    2015-05-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogs. cosmoabc is a Python Approximate Bayesian Computation (ABC) sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code can be coupled to an external simulator to allow incorporation of arbitrary distance and prior functions. When coupled with the numcosmo library, it has been used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function.

  18. Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland

    1998-01-01

    Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.

  19. Stan : A Probabilistic Programming Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.

    Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less

  20. Stan : A Probabilistic Programming Language

    DOE PAGES

    Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.; ...

    2017-01-01

    Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less

  1. Win-Stay, Lose-Sample: a simple sequential algorithm for approximating Bayesian inference.

    PubMed

    Bonawitz, Elizabeth; Denison, Stephanie; Gopnik, Alison; Griffiths, Thomas L

    2014-11-01

    People can behave in a way that is consistent with Bayesian models of cognition, despite the fact that performing exact Bayesian inference is computationally challenging. What algorithms could people be using to make this possible? We show that a simple sequential algorithm "Win-Stay, Lose-Sample", inspired by the Win-Stay, Lose-Shift (WSLS) principle, can be used to approximate Bayesian inference. We investigate the behavior of adults and preschoolers on two causal learning tasks to test whether people might use a similar algorithm. These studies use a "mini-microgenetic method", investigating how people sequentially update their beliefs as they encounter new evidence. Experiment 1 investigates a deterministic causal learning scenario and Experiments 2 and 3 examine how people make inferences in a stochastic scenario. The behavior of adults and preschoolers in these experiments is consistent with our Bayesian version of the WSLS principle. This algorithm provides both a practical method for performing Bayesian inference and a new way to understand people's judgments. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. A Coalitional Game for Distributed Inference in Sensor Networks With Dependent Observations

    NASA Astrophysics Data System (ADS)

    He, Hao; Varshney, Pramod K.

    2016-04-01

    We consider the problem of collaborative inference in a sensor network with heterogeneous and statistically dependent sensor observations. Each sensor aims to maximize its inference performance by forming a coalition with other sensors and sharing information within the coalition. It is proved that the inference performance is a nondecreasing function of the coalition size. However, in an energy constrained network, the energy consumption of inter-sensor communication also increases with increasing coalition size, which discourages the formation of the grand coalition (the set of all sensors). In this paper, the formation of non-overlapping coalitions with statistically dependent sensors is investigated under a specific communication constraint. We apply a game theoretical approach to fully explore and utilize the information contained in the spatial dependence among sensors to maximize individual sensor performance. Before formulating the distributed inference problem as a coalition formation game, we first quantify the gain and loss in forming a coalition by introducing the concepts of diversity gain and redundancy loss for both estimation and detection problems. These definitions, enabled by the statistical theory of copulas, allow us to characterize the influence of statistical dependence among sensor observations on inference performance. An iterative algorithm based on merge-and-split operations is proposed for the solution and the stability of the proposed algorithm is analyzed. Numerical results are provided to demonstrate the superiority of our proposed game theoretical approach.

  3. Approximate inference on planar graphs using loop calculus and belief progagation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chertkov, Michael; Gomez, Vicenc; Kappen, Hilbert

    We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006b) allows to express the exact partition function Z of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyzemore » in detail both the loop series and the Pfaffian series for models with binary variables and pairwise interactions, and show that the first term of the Pfaffian series can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state-of-the-art methods for approximate inference.« less

  4. A review of predictive coding algorithms.

    PubMed

    Spratling, M W

    2017-03-01

    Predictive coding is a leading theory of how the brain performs probabilistic inference. However, there are a number of distinct algorithms which are described by the term "predictive coding". This article provides a concise review of these different predictive coding algorithms, highlighting their similarities and differences. Five algorithms are covered: linear predictive coding which has a long and influential history in the signal processing literature; the first neuroscience-related application of predictive coding to explaining the function of the retina; and three versions of predictive coding that have been proposed to model cortical function. While all these algorithms aim to fit a generative model to sensory data, they differ in the type of generative model they employ, in the process used to optimise the fit between the model and sensory data, and in the way that they are related to neurobiology. Copyright © 2016 Elsevier Inc. All rights reserved.

  5. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

    PubMed

    Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

    2017-10-03

    Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.

  6. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

    PubMed

    Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

    2014-01-16

    To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.

  7. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

    PubMed Central

    2014-01-01

    Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926

  8. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

    PubMed

    Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

    2015-07-01

    Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

  9. Performance analysis of a fault inferring nonlinear detection system algorithm with integrated avionics flight data

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Godiwala, P. M.; Morrell, F. R.

    1985-01-01

    This paper presents the performance analysis results of a fault inferring nonlinear detection system (FINDS) using integrated avionics sensor flight data for the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment. First, an overview of the FINDS algorithm structure is given. Then, aircraft state estimate time histories and statistics for the flight data sensors are discussed. This is followed by an explanation of modifications made to the detection and decision functions in FINDS to improve false alarm and failure detection performance. Next, the failure detection and false alarm performance of the FINDS algorithm are analyzed by injecting bias failures into fourteen sensor outputs over six repetitive runs of the five minutes of flight data. Results indicate that the detection speed, failure level estimation, and false alarm performance show a marked improvement over the previously reported simulation runs. In agreement with earlier results, detection speed is faster for filter measurement sensors such as MLS than for filter input sensors such as flight control accelerometers. Finally, the progress in modifications of the FINDS algorithm design to accommodate flight computer constraints is discussed.

  10. Parametric inference for biological sequence analysis.

    PubMed

    Pachter, Lior; Sturmfels, Bernd

    2004-11-16

    One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

  11. Accurate HLA type inference using a weighted similarity graph.

    PubMed

    Xie, Minzhu; Li, Jing; Jiang, Tao

    2010-12-14

    The human leukocyte antigen system (HLA) contains many highly variable genes. HLA genes play an important role in the human immune system, and HLA gene matching is crucial for the success of human organ transplantations. Numerous studies have demonstrated that variation in HLA genes is associated with many autoimmune, inflammatory and infectious diseases. However, typing HLA genes by serology or PCR is time consuming and expensive, which limits large-scale studies involving HLA genes. Since it is much easier and cheaper to obtain single nucleotide polymorphism (SNP) genotype data, accurate computational algorithms to infer HLA gene types from SNP genotype data are in need. To infer HLA types from SNP genotypes, the first step is to infer SNP haplotypes from genotypes. However, for the same SNP genotype data set, the haplotype configurations inferred by different methods are usually inconsistent, and it is often difficult to decide which one is true. In this paper, we design an accurate HLA gene type inference algorithm by utilizing SNP genotype data from pedigrees, known HLA gene types of some individuals and the relationship between inferred SNP haplotypes and HLA gene types. Given a set of haplotypes inferred from the genotypes of a population consisting of many pedigrees, the algorithm first constructs a weighted similarity graph based on a new haplotype similarity measure and derives constraint edges from known HLA gene types. Based on the principle that different HLA gene alleles should have different background haplotypes, the algorithm searches for an optimal labeling of all the haplotypes with unknown HLA gene types such that the total weight among the same HLA gene types is maximized. To deal with ambiguous haplotype solutions, we use a genetic algorithm to select haplotype configurations that tend to maximize the same optimization criterion. Our experiments on a previously typed subset of the HapMap data show that the algorithm is highly accurate, achieving an accuracy of 96% for gene HLA-A, 95% for HLA-B, 97% for HLA-C, 84% for HLA-DRB1, 98% for HLA-DQA1 and 97% for HLA-DQB1 in a leave-one-out test. Our algorithm can infer HLA gene types from neighboring SNP genotype data accurately. Compared with a recent approach on the same input data, our algorithm achieved a higher accuracy. The code of our algorithm is available to the public for free upon request to the corresponding authors.

  12. Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations

    NASA Astrophysics Data System (ADS)

    Bovy Jo; Hogg, David W.; Roweis, Sam T.

    2011-06-01

    We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation-Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual d-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or "underlying" distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a "split-and-"erge- procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional veloc! ity distribution of stars near the Sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.

  13. Object-oriented feature-tracking algorithms for SAR images of the marginal ice zone

    NASA Technical Reports Server (NTRS)

    Daida, Jason; Samadani, Ramin; Vesecky, John F.

    1990-01-01

    An unsupervised method that chooses and applies the most appropriate tracking algorithm from among different sea-ice tracking algorithms is reported. In contrast to current unsupervised methods, this method chooses and applies an algorithm by partially examining a sequential image pair to draw inferences about what was examined. Based on these inferences the reported method subsequently chooses which algorithm to apply to specific areas of the image pair where that algorithm should work best.

  14. Gene function prediction with gene interaction networks: a context graph kernel approach.

    PubMed

    Li, Xin; Chen, Hsinchun; Li, Jiexun; Zhang, Zhu

    2010-01-01

    Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.

  15. In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

    PubMed

    Audain, Enrique; Uszkoreit, Julian; Sachsenberg, Timo; Pfeuffer, Julianus; Liang, Xiao; Hermjakob, Henning; Sanchez, Aniel; Eisenacher, Martin; Reinert, Knut; Tabb, David L; Kohlbacher, Oliver; Perez-Riverol, Yasset

    2017-01-06

    In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF+. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended. Protein inference is one of the major challenges in MS-based proteomics nowadays. Currently, there are a vast number of protein inference algorithms and implementations available for the proteomics community. Protein assembly impacts in the final results of the research, the quantitation values and the final claims in the research manuscript. Even though protein inference is a crucial step in proteomics data analysis, a comprehensive evaluation of the many different inference methods has never been performed. Previously Journal of proteomics has published multiple studies about other benchmark of bioinformatics algorithms (PMID: 26585461; PMID: 22728601) in proteomics studies making clear the importance of those studies for the proteomics community and the journal audience. This manuscript presents a new bioinformatics solution based on the KNIME/OpenMS platform that aims at providing a fair comparison of protein inference algorithms (https://github.com/KNIME-OMICS). Six different algorithms - ProteinProphet, MSBayesPro, ProteinLP, Fido and PIA- were evaluated using the highly customizable workflow on four public datasets with varying complexities. Five popular database search engines Mascot, X!Tandem, MS-GF+ and combinations thereof were evaluated for every protein inference tool. In total >186 proteins lists were analyzed and carefully compare using three metrics for quality assessments of the protein inference results: 1) the numbers of reported proteins, 2) peptides per protein, and the 3) number of uniquely reported proteins per inference method, to address the quality of each inference method. We also examined how many proteins were reported by choosing each combination of search engines, protein inference algorithms and parameters on each dataset. The results show that using 1) PIA or Fido seems to be a good choice when studying the results of the analyzed workflow, regarding not only the reported proteins and the high-quality identifications, but also the required runtime. 2) Merging the identifications of multiple search engines gives almost always more confident results and increases the number of peptides per protein group. 3) The usage of databases containing not only the canonical, but also known isoforms of proteins has a small impact on the number of reported proteins. The detection of specific isoforms could, concerning the question behind the study, compensate for slightly shorter reports using the parsimonious reports. 4) The current workflow can be easily extended to support new algorithms and search engine combinations. Copyright © 2016. Published by Elsevier B.V.

  16. An algebra-based method for inferring gene regulatory networks.

    PubMed

    Vera-Licona, Paola; Jarrah, Abdul; Garcia-Puente, Luis David; McGee, John; Laubenbacher, Reinhard

    2014-03-26

    The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html.

  17. With or without you: predictive coding and Bayesian inference in the brain

    PubMed Central

    Aitchison, Laurence; Lengyel, Máté

    2018-01-01

    Two theoretical ideas have emerged recently with the ambition to provide a unifying functional explanation of neural population coding and dynamics: predictive coding and Bayesian inference. Here, we describe the two theories and their combination into a single framework: Bayesian predictive coding. We clarify how the two theories can be distinguished, despite sharing core computational concepts and addressing an overlapping set of empirical phenomena. We argue that predictive coding is an algorithmic / representational motif that can serve several different computational goals of which Bayesian inference is but one. Conversely, while Bayesian inference can utilize predictive coding, it can also be realized by a variety of other representations. We critically evaluate the experimental evidence supporting Bayesian predictive coding and discuss how to test it more directly. PMID:28942084

  18. Performance of Geno-Fuzzy Model on rainfall-runoff predictions in claypan watersheds

    USDA-ARS?s Scientific Manuscript database

    Fuzzy logic provides a relatively simple approach to simulate complex hydrological systems while accounting for the uncertainty of environmental variables. The objective of this study was to develop a fuzzy inference system (FIS) with genetic algorithm (GA) optimization for membership functions (MF...

  19. Causal inference in biology networks with integrated belief propagation.

    PubMed

    Chang, Rui; Karr, Jonathan R; Schadt, Eric E

    2015-01-01

    Inferring causal relationships among molecular and higher order phenotypes is a critical step in elucidating the complexity of living systems. Here we propose a novel method for inferring causality that is no longer constrained by the conditional dependency arguments that limit the ability of statistical causal inference methods to resolve causal relationships within sets of graphical models that are Markov equivalent. Our method utilizes Bayesian belief propagation to infer the responses of perturbation events on molecular traits given a hypothesized graph structure. A distance measure between the inferred response distribution and the observed data is defined to assess the 'fitness' of the hypothesized causal relationships. To test our algorithm, we infer causal relationships within equivalence classes of gene networks in which the form of the functional interactions that are possible are assumed to be nonlinear, given synthetic microarray and RNA sequencing data. We also apply our method to infer causality in real metabolic network with v-structure and feedback loop. We show that our method can recapitulate the causal structure and recover the feedback loop only from steady-state data which conventional method cannot.

  20. Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

    PubMed

    Wu, Yufeng

    2012-03-01

    Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.

  1. Functional Alignment of Metabolic Networks.

    PubMed

    Mazza, Arnon; Wagner, Allon; Ruppin, Eytan; Sharan, Roded

    2016-05-01

    Network alignment has become a standard tool in comparative biology, allowing the inference of protein function, interaction, and orthology. However, current alignment techniques are based on topological properties of networks and do not take into account their functional implications. Here we propose, for the first time, an algorithm to align two metabolic networks by taking advantage of their coupled metabolic models. These models allow us to assess the functional implications of genes or reactions, captured by the metabolic fluxes that are altered following their deletion from the network. Such implications may spread far beyond the region of the network where the gene or reaction lies. We apply our algorithm to align metabolic networks from various organisms, ranging from bacteria to humans, showing that our alignment can reveal functional orthology relations that are missed by conventional topological alignments.

  2. Boosting Bayesian parameter inference of stochastic differential equation models with methods from statistical physics

    NASA Astrophysics Data System (ADS)

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Measured time-series of both precipitation and runoff are known to exhibit highly non-trivial statistical properties. For making reliable probabilistic predictions in hydrology, it is therefore desirable to have stochastic models with output distributions that share these properties. When parameters of such models have to be inferred from data, we also need to quantify the associated parametric uncertainty. For non-trivial stochastic models, however, this latter step is typically very demanding, both conceptually and numerically, and always never done in hydrology. Here, we demonstrate that methods developed in statistical physics make a large class of stochastic differential equation (SDE) models amenable to a full-fledged Bayesian parameter inference. For concreteness we demonstrate these methods by means of a simple yet non-trivial toy SDE model. We consider a natural catchment that can be described by a linear reservoir, at the scale of observation. All the neglected processes are assumed to happen at much shorter time-scales and are therefore modeled with a Gaussian white noise term, the standard deviation of which is assumed to scale linearly with the system state (water volume in the catchment). Even for constant input, the outputs of this simple non-linear SDE model show a wealth of desirable statistical properties, such as fat-tailed distributions and long-range correlations. Standard algorithms for Bayesian inference fail, for models of this kind, because their likelihood functions are extremely high-dimensional intractable integrals over all possible model realizations. The use of Kalman filters is illegitimate due to the non-linearity of the model. Particle filters could be used but become increasingly inefficient with growing number of data points. Hamiltonian Monte Carlo algorithms allow us to translate this inference problem to the problem of simulating the dynamics of a statistical mechanics system and give us access to most sophisticated methods that have been developed in the statistical physics community over the last few decades. We demonstrate that such methods, along with automated differentiation algorithms, allow us to perform a full-fledged Bayesian inference, for a large class of SDE models, in a highly efficient and largely automatized manner. Furthermore, our algorithm is highly parallelizable. For our toy model, discretized with a few hundred points, a full Bayesian inference can be performed in a matter of seconds on a standard PC.

  3. On the inherent competition between valid and spurious inductive inferences in Boolean data

    NASA Astrophysics Data System (ADS)

    Andrecut, M.

    Inductive inference is the process of extracting general rules from specific observations. This problem also arises in the analysis of biological networks, such as genetic regulatory networks, where the interactions are complex and the observations are incomplete. A typical task in these problems is to extract general interaction rules as combinations of Boolean covariates, that explain a measured response variable. The inductive inference process can be considered as an incompletely specified Boolean function synthesis problem. This incompleteness of the problem will also generate spurious inferences, which are a serious threat to valid inductive inference rules. Using random Boolean data as a null model, here we attempt to measure the competition between valid and spurious inductive inference rules from a given data set. We formulate two greedy search algorithms, which synthesize a given Boolean response variable in a sparse disjunct normal form, and respectively a sparse generalized algebraic normal form of the variables from the observation data, and we evaluate numerically their performance.

  4. Model selection and parameter estimation in structural dynamics using approximate Bayesian computation

    NASA Astrophysics Data System (ADS)

    Ben Abdessalem, Anis; Dervilis, Nikolaos; Wagg, David; Worden, Keith

    2018-01-01

    This paper will introduce the use of the approximate Bayesian computation (ABC) algorithm for model selection and parameter estimation in structural dynamics. ABC is a likelihood-free method typically used when the likelihood function is either intractable or cannot be approached in a closed form. To circumvent the evaluation of the likelihood function, simulation from a forward model is at the core of the ABC algorithm. The algorithm offers the possibility to use different metrics and summary statistics representative of the data to carry out Bayesian inference. The efficacy of the algorithm in structural dynamics is demonstrated through three different illustrative examples of nonlinear system identification: cubic and cubic-quintic models, the Bouc-Wen model and the Duffing oscillator. The obtained results suggest that ABC is a promising alternative to deal with model selection and parameter estimation issues, specifically for systems with complex behaviours.

  5. Algorithm of OMA for large-scale orthology inference

    PubMed Central

    Roth, Alexander CJ; Gonnet, Gaston H; Dessimoz, Christophe

    2008-01-01

    Background OMA is a project that aims to identify orthologs within publicly available, complete genomes. With 657 genomes analyzed to date, OMA is one of the largest projects of its kind. Results The algorithm of OMA improves upon standard bidirectional best-hit approach in several respects: it uses evolutionary distances instead of scores, considers distance inference uncertainty, includes many-to-many orthologous relations, and accounts for differential gene losses. Herein, we describe in detail the algorithm for inference of orthology and provide the rationale for parameter selection through multiple tests. Conclusion OMA contains several novel improvement ideas for orthology inference and provides a unique dataset of large-scale orthology assignments. PMID:19055798

  6. Functional Inference of Complex Anatomical Tendinous Networks at a Macroscopic Scale via Sparse Experimentation

    PubMed Central

    Saxena, Anupam; Lipson, Hod; Valero-Cuevas, Francisco J.

    2012-01-01

    In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16th century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These findings also hold clues to both our evolutionary history and the development of versatile machines. PMID:23144601

  7. Functional inference of complex anatomical tendinous networks at a macroscopic scale via sparse experimentation.

    PubMed

    Saxena, Anupam; Lipson, Hod; Valero-Cuevas, Francisco J

    2012-01-01

    In systems and computational biology, much effort is devoted to functional identification of systems and networks at the molecular-or cellular scale. However, similarly important networks exist at anatomical scales such as the tendon network of human fingers: the complex array of collagen fibers that transmits and distributes muscle forces to finger joints. This network is critical to the versatility of the human hand, and its function has been debated since at least the 16(th) century. Here, we experimentally infer the structure (both topology and parameter values) of this network through sparse interrogation with force inputs. A population of models representing this structure co-evolves in simulation with a population of informative future force inputs via the predator-prey estimation-exploration algorithm. Model fitness depends on their ability to explain experimental data, while the fitness of future force inputs depends on causing maximal functional discrepancy among current models. We validate our approach by inferring two known synthetic Latex networks, and one anatomical tendon network harvested from a cadaver's middle finger. We find that functionally similar but structurally diverse models can exist within a narrow range of the training set and cross-validation errors. For the Latex networks, models with low training set error [<4%] and resembling the known network have the smallest cross-validation errors [∼5%]. The low training set [<4%] and cross validation [<7.2%] errors for models for the cadaveric specimen demonstrate what, to our knowledge, is the first experimental inference of the functional structure of complex anatomical networks. This work expands current bioinformatics inference approaches by demonstrating that sparse, yet informative interrogation of biological specimens holds significant computational advantages in accurate and efficient inference over random testing, or assuming model topology and only inferring parameters values. These findings also hold clues to both our evolutionary history and the development of versatile machines.

  8. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    PubMed

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

  9. Applying a multiobjective metaheuristic inspired by honey bees to phylogenetic inference.

    PubMed

    Santander-Jiménez, Sergio; Vega-Rodríguez, Miguel A

    2013-10-01

    The development of increasingly popular multiobjective metaheuristics has allowed bioinformaticians to deal with optimization problems in computational biology where multiple objective functions must be taken into account. One of the most relevant research topics that can benefit from these techniques is phylogenetic inference. Throughout the years, different researchers have proposed their own view about the reconstruction of ancestral evolutionary relationships among species. As a result, biologists often report different phylogenetic trees from a same dataset when considering distinct optimality principles. In this work, we detail a multiobjective swarm intelligence approach based on the novel Artificial Bee Colony algorithm for inferring phylogenies. The aim of this paper is to propose a complementary view of phylogenetics according to the maximum parsimony and maximum likelihood criteria, in order to generate a set of phylogenetic trees that represent a compromise between these principles. Experimental results on a variety of nucleotide data sets and statistical studies highlight the relevance of the proposal with regard to other multiobjective algorithms and state-of-the-art biological methods. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  10. An algebra-based method for inferring gene regulatory networks

    PubMed Central

    2014-01-01

    Background The inference of gene regulatory networks (GRNs) from experimental observations is at the heart of systems biology. This includes the inference of both the network topology and its dynamics. While there are many algorithms available to infer the network topology from experimental data, less emphasis has been placed on methods that infer network dynamics. Furthermore, since the network inference problem is typically underdetermined, it is essential to have the option of incorporating into the inference process, prior knowledge about the network, along with an effective description of the search space of dynamic models. Finally, it is also important to have an understanding of how a given inference method is affected by experimental and other noise in the data used. Results This paper contains a novel inference algorithm using the algebraic framework of Boolean polynomial dynamical systems (BPDS), meeting all these requirements. The algorithm takes as input time series data, including those from network perturbations, such as knock-out mutant strains and RNAi experiments. It allows for the incorporation of prior biological knowledge while being robust to significant levels of noise in the data used for inference. It uses an evolutionary algorithm for local optimization with an encoding of the mathematical models as BPDS. The BPDS framework allows an effective representation of the search space for algebraic dynamic models that improves computational performance. The algorithm is validated with both simulated and experimental microarray expression profile data. Robustness to noise is tested using a published mathematical model of the segment polarity gene network in Drosophila melanogaster. Benchmarking of the algorithm is done by comparison with a spectrum of state-of-the-art network inference methods on data from the synthetic IRMA network to demonstrate that our method has good precision and recall for the network reconstruction task, while also predicting several of the dynamic patterns present in the network. Conclusions Boolean polynomial dynamical systems provide a powerful modeling framework for the reverse engineering of gene regulatory networks, that enables a rich mathematical structure on the model search space. A C++ implementation of the method, distributed under LPGL license, is available, together with the source code, at http://www.paola-vera-licona.net/Software/EARevEng/REACT.html. PMID:24669835

  11. Ensemble stacking mitigates biases in inference of synaptic connectivity.

    PubMed

    Chambers, Brendan; Levy, Maayan; Dechery, Joseph B; MacLean, Jason N

    2018-01-01

    A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.

  12. A novel gene network inference algorithm using predictive minimum description length approach.

    PubMed

    Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang

    2010-05-28

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the PMDL principle is effective in determining the MI threshold and the developed algorithm improves precision of gene regulatory network inference. Based on the sensitivity analysis of all tested cases, an optimal CMI threshold value has been identified. Finally it was observed that the performance of the algorithms saturates at a certain threshold of data size.

  13. Algorithm Optimally Orders Forward-Chaining Inference Rules

    NASA Technical Reports Server (NTRS)

    James, Mark

    2008-01-01

    People typically develop knowledge bases in a somewhat ad hoc manner by incrementally adding rules with no specific organization. This often results in a very inefficient execution of those rules since they are so often order sensitive. This is relevant to tasks like Deep Space Network in that it allows the knowledge base to be incrementally developed and have it automatically ordered for efficiency. Although data flow analysis was first developed for use in compilers for producing optimal code sequences, its usefulness is now recognized in many software systems including knowledge-based systems. However, this approach for exhaustively computing data-flow information cannot directly be applied to inference systems because of the ubiquitous execution of the rules. An algorithm is presented that efficiently performs a complete producer/consumer analysis for each antecedent and consequence clause in a knowledge base to optimally order the rules to minimize inference cycles. An algorithm was developed that optimally orders a knowledge base composed of forwarding chaining inference rules such that independent inference cycle executions are minimized, thus, resulting in significantly faster execution. This algorithm was integrated into the JPL tool Spacecraft Health Inference Engine (SHINE) for verification and it resulted in a significant reduction in inference cycles for what was previously considered an ordered knowledge base. For a knowledge base that is completely unordered, then the improvement is much greater.

  14. Convergent cross-mapping and pairwise asymmetric inference.

    PubMed

    McCracken, James M; Weigel, Robert S

    2014-12-01

    Convergent cross-mapping (CCM) is a technique for computing specific kinds of correlations between sets of times series. It was introduced by Sugihara et al. [Science 338, 496 (2012).] and is reported to be "a necessary condition for causation" capable of distinguishing causality from standard correlation. We show that the relationships between CCM correlations proposed by Sugihara et al. do not, in general, agree with intuitive concepts of "driving" and as such should not be considered indicative of causality. It is shown that the fact that the CCM algorithm implies causality is a function of system parameters for simple linear and nonlinear systems. For example, in a circuit containing a single resistor and inductor, both voltage and current can be identified as the driver depending on the frequency of the source voltage. It is shown that the CCM algorithm, however, can be modified to identify relationships between pairs of time series that are consistent with intuition for the considered example systems for which CCM causality analysis provided nonintuitive driver identifications. This modification of the CCM algorithm is introduced as "pairwise asymmetric inference" (PAI) and examples of its use are presented.

  15. Determining the near-surface current profile from measurements of the wave dispersion relation

    NASA Astrophysics Data System (ADS)

    Smeltzer, Benjamin; Maxwell, Peter; Aesøy, Eirik; Ellingsen, Simen

    2017-11-01

    The current-induced Doppler shifts of waves can yield information about the background mean flow, providing an attractive method of inferring the current profile in the upper layer of the ocean. We present measurements of waves propagating on shear currents in a laboratory water channel, as well as theoretical investigations of inversion techniques for determining the vertical current structure. Spatial and temporal measurements of the free surface profile obtained using a synthetic Schlieren method are analyzed to determine the wave dispersion relation and Doppler shifts as a function of wavelength. The vertical current profile can then be inferred from the Doppler shifts using an inversion algorithm. Most existing algorithms rely on a priori assumptions of the shape of the current profile, and developing a method that uses less stringent assumptions is a focus of this study, allowing for measurement of more general current profiles. The accuracy of current inversion algorithms are evaluated by comparison to measurements of the mean flow profile from particle image velocimetry (PIV), and a discussion of the sensitivity to errors in the Doppler shifts is presented.

  16. The Manhattan Frame Model-Manhattan World Inference in the Space of Surface Normals.

    PubMed

    Straub, Julian; Freifeld, Oren; Rosman, Guy; Leonard, John J; Fisher, John W

    2018-01-01

    Objects and structures within man-made environments typically exhibit a high degree of organization in the form of orthogonal and parallel planes. Traditional approaches utilize these regularities via the restrictive, and rather local, Manhattan World (MW) assumption which posits that every plane is perpendicular to one of the axes of a single coordinate system. The aforementioned regularities are especially evident in the surface normal distribution of a scene where they manifest as orthogonally-coupled clusters. This motivates the introduction of the Manhattan-Frame (MF) model which captures the notion of an MW in the surface normals space, the unit sphere, and two probabilistic MF models over this space. First, for a single MF we propose novel real-time MAP inference algorithms, evaluate their performance and their use in drift-free rotation estimation. Second, to capture the complexity of real-world scenes at a global scale, we extend the MF model to a probabilistic mixture of Manhattan Frames (MMF). For MMF inference we propose a simple MAP inference algorithm and an adaptive Markov-Chain Monte-Carlo sampling algorithm with Metropolis-Hastings split/merge moves that let us infer the unknown number of mixture components. We demonstrate the versatility of the MMF model and inference algorithm across several scales of man-made environments.

  17. Context recognition for a hyperintensional inference machine

    NASA Astrophysics Data System (ADS)

    Duží, Marie; Fait, Michal; Menšík, Marek

    2017-07-01

    The goal of this paper is to introduce the algorithm of context recognition in the functional programming language TIL-Script, which is a necessary condition for the implementation of the TIL-Script inference machine. The TIL-Script language is an operationally isomorphic syntactic variant of Tichý's Transparent Intensional Logic (TIL). From the formal point of view, TIL is a hyperintensional, partial, typed λ-calculus with procedural semantics. Hyperintensional, because TIL λ-terms denote procedures (defined as TIL constructions) producing set-theoretic functions rather than the functions themselves; partial, because TIL is a logic of partial functions; and typed, because all the entities of TIL ontology, including constructions, receive a type within a ramified hierarchy of types. These features make it possible to distinguish three levels of abstraction at which TIL constructions operate. At the highest hyperintensional level the object to operate on is a construction (though a higher-order construction is needed to present this lower-order construction as an object of predication). At the middle intensional level the object to operate on is the function presented, or constructed, by a construction, while at the lowest extensional level the object to operate on is the value (if any) of the presented function. Thus a necessary condition for the development of an inference machine for the TIL-Script language is recognizing a context in which a construction occurs, namely extensional, intensional and hyperintensional context, in order to determine the type of an argument at which a given inference rule can be properly applied. As a result, our logic does not flout logical rules of extensional logic, which makes it possible to develop a hyperintensional inference machine for the TIL-Script language.

  18. Inferring time derivatives including cell growth rates using Gaussian processes

    NASA Astrophysics Data System (ADS)

    Swain, Peter S.; Stevenson, Keiran; Leary, Allen; Montano-Gutierrez, Luis F.; Clark, Ivan B. N.; Vogel, Jackie; Pilizota, Teuta

    2016-12-01

    Often the time derivative of a measured variable is of as much interest as the variable itself. For a growing population of biological cells, for example, the population's growth rate is typically more important than its size. Here we introduce a non-parametric method to infer first and second time derivatives as a function of time from time-series data. Our approach is based on Gaussian processes and applies to a wide range of data. In tests, the method is at least as accurate as others, but has several advantages: it estimates errors both in the inference and in any summary statistics, such as lag times, and allows interpolation with the corresponding error estimation. As illustrations, we infer growth rates of microbial cells, the rate of assembly of an amyloid fibril and both the speed and acceleration of two separating spindle pole bodies. Our algorithm should thus be broadly applicable.

  19. Statistical Signal Models and Algorithms for Image Analysis

    DTIC Science & Technology

    1984-10-25

    In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction

  20. The architecture of adaptive neural network based on a fuzzy inference system for implementing intelligent control in photovoltaic systems

    NASA Astrophysics Data System (ADS)

    Gimazov, R.; Shidlovskiy, S.

    2018-05-01

    In this paper, we consider the architecture of the algorithm for extreme regulation in the photovoltaic system. An algorithm based on an adaptive neural network with fuzzy inference is proposed. The implementation of such an algorithm not only allows solving a number of problems in existing algorithms for extreme power regulation of photovoltaic systems, but also creates a reserve for the creation of a universal control system for a photovoltaic system.

  1. Harnessing Diversity towards the Reconstructing of Large Scale Gene Regulatory Networks

    PubMed Central

    Yamanaka, Ryota; Kitano, Hiroaki

    2013-01-01

    Elucidating gene regulatory network (GRN) from large scale experimental data remains a central challenge in systems biology. Recently, numerous techniques, particularly consensus driven approaches combining different algorithms, have become a potentially promising strategy to infer accurate GRNs. Here, we develop a novel consensus inference algorithm, TopkNet that can integrate multiple algorithms to infer GRNs. Comprehensive performance benchmarking on a cloud computing framework demonstrated that (i) a simple strategy to combine many algorithms does not always lead to performance improvement compared to the cost of consensus and (ii) TopkNet integrating only high-performance algorithms provide significant performance improvement compared to the best individual algorithms and community prediction. These results suggest that a priori determination of high-performance algorithms is a key to reconstruct an unknown regulatory network. Similarity among gene-expression datasets can be useful to determine potential optimal algorithms for reconstruction of unknown regulatory networks, i.e., if expression-data associated with known regulatory network is similar to that with unknown regulatory network, optimal algorithms determined for the known regulatory network can be repurposed to infer the unknown regulatory network. Based on this observation, we developed a quantitative measure of similarity among gene-expression datasets and demonstrated that, if similarity between the two expression datasets is high, TopkNet integrating algorithms that are optimal for known dataset perform well on the unknown dataset. The consensus framework, TopkNet, together with the similarity measure proposed in this study provides a powerful strategy towards harnessing the wisdom of the crowds in reconstruction of unknown regulatory networks. PMID:24278007

  2. Spiking neuron network Helmholtz machine.

    PubMed

    Sountsov, Pavel; Miller, Paul

    2015-01-01

    An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule.

  3. Spiking neuron network Helmholtz machine

    PubMed Central

    Sountsov, Pavel; Miller, Paul

    2015-01-01

    An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule. PMID:25954191

  4. A recurrent self-organizing neural fuzzy inference network.

    PubMed

    Juang, C F; Lin, C T

    1999-01-01

    A recurrent self-organizing neural fuzzy inference network (RSONFIN) is proposed in this paper. The RSONFIN is inherently a recurrent multilayered connectionist network for realizing the basic elements and functions of dynamic fuzzy inference, and may be considered to be constructed from a series of dynamic fuzzy rules. The temporal relations embedded in the network are built by adding some feedback connections representing the memory elements to a feedforward neural fuzzy network. Each weight as well as node in the RSONFIN has its own meaning and represents a special element in a fuzzy rule. There are no hidden nodes (i.e., no membership functions and fuzzy rules) initially in the RSONFIN. They are created on-line via concurrent structure identification (the construction of dynamic fuzzy if-then rules) and parameter identification (the tuning of the free parameters of membership functions). The structure learning together with the parameter learning forms a fast learning algorithm for building a small, yet powerful, dynamic neural fuzzy network. Two major characteristics of the RSONFIN can thus be seen: 1) the recurrent property of the RSONFIN makes it suitable for dealing with temporal problems and 2) no predetermination, like the number of hidden nodes, must be given, since the RSONFIN can find its optimal structure and parameters automatically and quickly. Moreover, to reduce the number of fuzzy rules generated, a flexible input partition method, the aligned clustering-based algorithm, is proposed. Various simulations on temporal problems are done and performance comparisons with some existing recurrent networks are also made. Efficiency of the RSONFIN is verified from these results.

  5. Backward renormalization-group inference of cortical dipole sources and neural connectivity efficacy

    NASA Astrophysics Data System (ADS)

    Amaral, Selene da Rocha; Baccalá, Luiz A.; Barbosa, Leonardo S.; Caticha, Nestor

    2017-06-01

    Proper neural connectivity inference has become essential for understanding cognitive processes associated with human brain function. Its efficacy is often hampered by the curse of dimensionality. In the electroencephalogram case, which is a noninvasive electrophysiological monitoring technique to record electrical activity of the brain, a possible way around this is to replace multichannel electrode information with dipole reconstructed data. We use a method based on maximum entropy and the renormalization group to infer the position of the sources, whose success hinges on transmitting information from low- to high-resolution representations of the cortex. The performance of this method compares favorably to other available source inference algorithms, which are ranked here in terms of their performance with respect to directed connectivity inference by using artificially generated dynamic data. We examine some representative scenarios comprising different numbers of dynamically connected dipoles over distinct cortical surface positions and under different sensor noise impairment levels. The overall conclusion is that inverse problem solutions do not affect the correct inference of the direction of the flow of information as long as the equivalent dipole sources are correctly found.

  6. Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics.

    PubMed

    Ganapathiraju, Madhavi K; Orii, Naoki

    2013-08-30

    Advances in biotechnology have created "big-data" situations in molecular and cellular biology. Several sophisticated algorithms have been developed that process big data to generate hundreds of biomedical hypotheses (or predictions). The bottleneck to translating this large number of biological hypotheses is that each of them needs to be studied by experimentation for interpreting its functional significance. Even when the predictions are estimated to be very accurate, from a biologist's perspective, the choice of which of these predictions is to be studied further is made based on factors like availability of reagents and resources and the possibility of formulating some reasonable hypothesis about its biological relevance. When viewed from a global perspective, say from that of a federal funding agency, ideally the choice of which prediction should be studied would be made based on which of them can make the most translational impact. We propose that algorithms be developed to identify which of the computationally generated hypotheses have potential for high translational impact; this way, funding agencies and scientific community can invest resources and drive the research based on a global view of biomedical impact without being deterred by local view of feasibility. In short, data-analytic algorithms analyze big-data and generate hypotheses; in contrast, the proposed inference-analytic algorithms analyze these hypotheses and rank them by predicted biological impact. We demonstrate this through the development of an algorithm to predict biomedical impact of protein-protein interactions (PPIs) which is estimated by the number of future publications that cite the paper which originally reported the PPI. This position paper describes a new computational problem that is relevant in the era of big-data and discusses the challenges that exist in studying this problem, highlighting the need for the scientific community to engage in this line of research. The proposed class of algorithms, namely inference-analytic algorithms, is necessary to ensure that resources are invested in translating those computational outcomes that promise maximum biological impact. Application of this concept to predict biomedical impact of PPIs illustrates not only the concept, but also the challenges in designing these algorithms.

  7. A linear programming model for protein inference problem in shotgun proteomics.

    PubMed

    Huang, Ting; He, Zengyou

    2012-11-15

    Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.

  8. Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation.

    PubMed

    Guo, Xiaobo; Zhang, Ye; Hu, Wenhao; Tan, Haizhu; Wang, Xueqin

    2014-01-01

    Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference.

  9. Inferring Nonlinear Gene Regulatory Networks from Gene Expression Data Based on Distance Correlation

    PubMed Central

    Guo, Xiaobo; Zhang, Ye; Hu, Wenhao; Tan, Haizhu; Wang, Xueqin

    2014-01-01

    Nonlinear dependence is general in regulation mechanism of gene regulatory networks (GRNs). It is vital to properly measure or test nonlinear dependence from real data for reconstructing GRNs and understanding the complex regulatory mechanisms within the cellular system. A recently developed measurement called the distance correlation (DC) has been shown powerful and computationally effective in nonlinear dependence for many situations. In this work, we incorporate the DC into inferring GRNs from the gene expression data without any underling distribution assumptions. We propose three DC-based GRNs inference algorithms: CLR-DC, MRNET-DC and REL-DC, and then compare them with the mutual information (MI)-based algorithms by analyzing two simulated data: benchmark GRNs from the DREAM challenge and GRNs generated by SynTReN network generator, and an experimentally determined SOS DNA repair network in Escherichia coli. According to both the receiver operator characteristic (ROC) curve and the precision-recall (PR) curve, our proposed algorithms significantly outperform the MI-based algorithms in GRNs inference. PMID:24551058

  10. RESOLVE: A new algorithm for aperture synthesis imaging of extended emission in radio astronomy

    NASA Astrophysics Data System (ADS)

    Junklewitz, H.; Bell, M. R.; Selig, M.; Enßlin, T. A.

    2016-02-01

    We present resolve, a new algorithm for radio aperture synthesis imaging of extended and diffuse emission in total intensity. The algorithm is derived using Bayesian statistical inference techniques, estimating the surface brightness in the sky assuming a priori log-normal statistics. resolve estimates the measured sky brightness in total intensity, and the spatial correlation structure in the sky, which is used to guide the algorithm to an optimal reconstruction of extended and diffuse sources. During this process, the algorithm succeeds in deconvolving the effects of the radio interferometric point spread function. Additionally, resolve provides a map with an uncertainty estimate of the reconstructed surface brightness. Furthermore, with resolve we introduce a new, optimal visibility weighting scheme that can be viewed as an extension to robust weighting. In tests using simulated observations, the algorithm shows improved performance against two standard imaging approaches for extended sources, Multiscale-CLEAN and the Maximum Entropy Method.

  11. Inferring genetic interactions via a nonlinear model and an optimization algorithm.

    PubMed

    Chen, Chung-Ming; Lee, Chih; Chuang, Cheng-Long; Wang, Chia-Chang; Shieh, Grace S

    2010-02-26

    Biochemical pathways are gradually becoming recognized as central to complex human diseases and recently genetic/transcriptional interactions have been shown to be able to predict partial pathways. With the abundant information made available by microarray gene expression data (MGED), nonlinear modeling of these interactions is now feasible. Two of the latest advances in nonlinear modeling used sigmoid models to depict transcriptional interaction of a transcription factor (TF) for a target gene, but do not model cooperative or competitive interactions of several TFs for a target. An S-shape model and an optimization algorithm (GASA) were developed to infer genetic interactions/transcriptional regulation of several genes simultaneously using MGED. GASA consists of a genetic algorithm (GA) and a simulated annealing (SA) algorithm, which is enhanced by a steepest gradient descent algorithm to avoid being trapped in local minimum. Using simulated data with various degrees of noise, we studied how GASA with two model selection criteria and two search spaces performed. Furthermore, GASA was shown to outperform network component analysis, the time series network inference algorithm (TSNI), GA with regular GA (GAGA) and GA with regular SA. Two applications are demonstrated. First, GASA is applied to infer a subnetwork of human T-cell apoptosis. Several of the predicted interactions are supported by the literature. Second, GASA was applied to infer the transcriptional factors of 34 cell cycle regulated targets in S. cerevisiae, and GASA performed better than one of the latest advances in nonlinear modeling, GAGA and TSNI. Moreover, GASA is able to predict multiple transcription factors for certain targets, and these results coincide with experiments confirmed data in YEASTRACT. GASA is shown to infer both genetic interactions and transcriptional regulatory interactions well. In particular, GASA seems able to characterize the nonlinear mechanism of transcriptional regulatory interactions (TIs) in yeast, and may be applied to infer TIs in other organisms. The predicted genetic interactions of a subnetwork of human T-cell apoptosis coincide with existing partial pathways, suggesting the potential of GASA on inferring biochemical pathways.

  12. Theoretic derivation of directed acyclic subgraph algorithm and comparisons with message passing algorithm

    NASA Astrophysics Data System (ADS)

    Ha, Jeongmok; Jeong, Hong

    2016-07-01

    This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.

  13. Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering

    NASA Technical Reports Server (NTRS)

    Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland

    2000-01-01

    Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.

  14. NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

    PubMed Central

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. PMID:24667482

  15. Fundamentals and Recent Developments in Approximate Bayesian Computation

    PubMed Central

    Lintusaari, Jarno; Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka

    2017-01-01

    Abstract Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood-free inference; phylogenetics; simulator-based models; stochastic simulation models; tree-based models.] PMID:28175922

  16. Inferring functional connectivity in MRI using Bayesian network structure learning with a modified PC algorithm

    PubMed Central

    Iyer, Swathi; Shafran, Izhak; Grayson, David; Gates, Kathleen; Nigg, Joel; Fair, Damien

    2013-01-01

    Resting state functional connectivity MRI (rs-fcMRI) is a popular technique used to gauge the functional relatedness between regions in the brain for typical and special populations. Most of the work to date determines this relationship by using Pearson's correlation on BOLD fMRI timeseries. However, it has been recognized that there are at least two key limitations to this method. First, it is not possible to resolve the direct and indirect connections/influences. Second, the direction of information flow between the regions cannot be differentiated. In the current paper, we follow-up on recent work by Smith et al (2011), and apply a Bayesian approach called the PC algorithm to both simulated data and empirical data to determine whether these two factors can be discerned with group average, as opposed to single subject, functional connectivity data. When applied on simulated individual subjects, the algorithm performs well determining indirect and direct connection but fails in determining directionality. However, when applied at group level, PC algorithm gives strong results for both indirect and direct connections and the direction of information flow. Applying the algorithm on empirical data, using a diffusion-weighted imaging (DWI) structural connectivity matrix as the baseline, the PC algorithm outperformed the direct correlations. We conclude that, under certain conditions, the PC algorithm leads to an improved estimate of brain network structure compared to the traditional connectivity analysis based on correlations. PMID:23501054

  17. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets.

    PubMed

    Liu, Li-Zhi; Wu, Fang-Xiang; Zhang, Wen-Jun

    2014-01-01

    As an abstract mapping of the gene regulations in the cell, gene regulatory network is important to both biological research study and practical applications. The reverse engineering of gene regulatory networks from microarray gene expression data is a challenging research problem in systems biology. With the development of biological technologies, multiple time-course gene expression datasets might be collected for a specific gene network under different circumstances. The inference of a gene regulatory network can be improved by integrating these multiple datasets. It is also known that gene expression data may be contaminated with large errors or outliers, which may affect the inference results. A novel method, Huber group LASSO, is proposed to infer the same underlying network topology from multiple time-course gene expression datasets as well as to take the robustness to large error or outliers into account. To solve the optimization problem involved in the proposed method, an efficient algorithm which combines the ideas of auxiliary function minimization and block descent is developed. A stability selection method is adapted to our method to find a network topology consisting of edges with scores. The proposed method is applied to both simulation datasets and real experimental datasets. It shows that Huber group LASSO outperforms the group LASSO in terms of both areas under receiver operating characteristic curves and areas under the precision-recall curves. The convergence analysis of the algorithm theoretically shows that the sequence generated from the algorithm converges to the optimal solution of the problem. The simulation and real data examples demonstrate the effectiveness of the Huber group LASSO in integrating multiple time-course gene expression datasets and improving the resistance to large errors or outliers.

  18. Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis.

    PubMed

    Cheng, Feon W; Gao, Xiang; Bao, Le; Mitchell, Diane C; Wood, Craig; Sliwinski, Martin J; Smiciklas-Wright, Helen; Still, Christopher D; Rolston, David D K; Jensen, Gordon L

    2017-07-01

    To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. The conditional inference tree analysis, a data mining approach, was used to construct a risk stratification algorithm for developing functional limitation based on BMI and other potential risk factors for disability in 1,951 older adults without functional limitations at baseline (baseline age 73.1 ± 4.2 y). We also analyzed the data with multivariate stepwise logistic regression and compared the two approaches (e.g., cross-validation). Over a mean of 9.2 ± 1.7 years of follow-up, 221 individuals developed functional limitation. Higher BMI, age, and comorbidity were consistently identified as significant risk factors for functional decline using both methods. Based on these factors, individuals were stratified into four risk groups via the conditional inference tree analysis. Compared to the low-risk group, all other groups had a significantly higher risk of developing functional limitation. The odds ratio comparing two extreme categories was 9.09 (95% confidence interval: 4.68, 17.6). Higher BMI, age, and comorbid disease were consistently identified as significant risk factors for functional decline among older individuals across all approaches and analyses. © 2017 The Obesity Society.

  19. Poisson-Based Inference for Perturbation Models in Adaptive Spelling Training

    ERIC Educational Resources Information Center

    Baschera, Gian-Marco; Gross, Markus

    2010-01-01

    We present an inference algorithm for perturbation models based on Poisson regression. The algorithm is designed to handle unclassified input with multiple errors described by independent mal-rules. This knowledge representation provides an intelligent tutoring system with local and global information about a student, such as error classification…

  20. 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)

    PubMed Central

    Li, Isaac TS; Shum, Warren; Truong, Kevin

    2007-01-01

    Background To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. Results In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. Conclusion This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching. PMID:17555593

  1. 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA).

    PubMed

    Li, Isaac T S; Shum, Warren; Truong, Kevin

    2007-06-07

    To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching.

  2. Embracing equifinality with efficiency: Limits of Acceptability sampling using the DREAM(LOA) algorithm

    NASA Astrophysics Data System (ADS)

    Vrugt, Jasper A.; Beven, Keith J.

    2018-04-01

    This essay illustrates some recent developments to the DiffeRential Evolution Adaptive Metropolis (DREAM) MATLAB toolbox of Vrugt (2016) to delineate and sample the behavioural solution space of set-theoretic likelihood functions used within the GLUE (Limits of Acceptability) framework (Beven and Binley, 1992, 2014; Beven and Freer, 2001; Beven, 2006). This work builds on the DREAM(ABC) algorithm of Sadegh and Vrugt (2014) and enhances significantly the accuracy and CPU-efficiency of Bayesian inference with GLUE. In particular it is shown how lack of adequate sampling in the model space might lead to unjustified model rejection.

  3. Estimation of flow properties using surface deformation and head data: A trajectory-based approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vasco, D.W.

    2004-07-12

    A trajectory-based algorithm provides an efficient and robust means to infer flow properties from surface deformation and head data. The algorithm is based upon the concept of an ''arrival time'' of a drawdown front, which is defined as the time corresponding to the maximum slope of the drawdown curve. The technique involves three steps: the inference of head changes as a function of position and time, the use of the estimated head changes to define arrival times, and the inversion of the arrival times for flow properties. Trajectories, computed from the output of a numerical simulator, are used to relatemore » the drawdown arrival times to flow properties. The inversion algorithm is iterative, requiring one reservoir simulation for each iteration. The method is applied to data from a set of 14 tiltmeters, located at the Raymond Quarry field site in California. Using the technique, I am able to image a high-conductivity channel which extends to the south of the pumping well. The presence of th is permeable pathway is supported by an analysis of earlier cross-well transient pressure test data.« less

  4. Bayesian inference on EMRI signals using low frequency approximations

    NASA Astrophysics Data System (ADS)

    Ali, Asad; Christensen, Nelson; Meyer, Renate; Röver, Christian

    2012-07-01

    Extreme mass ratio inspirals (EMRIs) are thought to be one of the most exciting gravitational wave sources to be detected with LISA. Due to their complicated nature and weak amplitudes the detection and parameter estimation of such sources is a challenging task. In this paper we present a statistical methodology based on Bayesian inference in which the estimation of parameters is carried out by advanced Markov chain Monte Carlo (MCMC) algorithms such as parallel tempering MCMC. We analysed high and medium mass EMRI systems that fall well inside the low frequency range of LISA. In the context of the Mock LISA Data Challenges, our investigation and results are also the first instance in which a fully Markovian algorithm is applied for EMRI searches. Results show that our algorithm worked well in recovering EMRI signals from different (simulated) LISA data sets having single and multiple EMRI sources and holds great promise for posterior computation under more realistic conditions. The search and estimation methods presented in this paper are general in their nature, and can be applied in any other scenario such as AdLIGO, AdVIRGO and Einstein Telescope with their respective response functions.

  5. Cognitive Inference Device for Activity Supervision in the Elderly

    PubMed Central

    2014-01-01

    Human activity, life span, and quality of life are enhanced by innovations in science and technology. Aging individual needs to take advantage of these developments to lead a self-regulated life. However, maintaining a self-regulated life at old age involves a high degree of risk, and the elderly often fail at this goal. Thus, the objective of our study is to investigate the feasibility of implementing a cognitive inference device (CI-device) for effective activity supervision in the elderly. To frame the CI-device, we propose a device design framework along with an inference algorithm and implement the designs through an artificial neural model with different configurations, mapping the CI-device's functions to minimise the device's prediction error. An analysis and discussion are then provided to validate the feasibility of CI-device implementation for activity supervision in the elderly. PMID:25405211

  6. Evolution in Mind: Evolutionary Dynamics, Cognitive Processes, and Bayesian Inference.

    PubMed

    Suchow, Jordan W; Bourgin, David D; Griffiths, Thomas L

    2017-07-01

    Evolutionary theory describes the dynamics of population change in settings affected by reproduction, selection, mutation, and drift. In the context of human cognition, evolutionary theory is most often invoked to explain the origins of capacities such as language, metacognition, and spatial reasoning, framing them as functional adaptations to an ancestral environment. However, evolutionary theory is useful for understanding the mind in a second way: as a mathematical framework for describing evolving populations of thoughts, ideas, and memories within a single mind. In fact, deep correspondences exist between the mathematics of evolution and of learning, with perhaps the deepest being an equivalence between certain evolutionary dynamics and Bayesian inference. This equivalence permits reinterpretation of evolutionary processes as algorithms for Bayesian inference and has relevance for understanding diverse cognitive capacities, including memory and creativity. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Personalized recommendation via unbalance full-connectivity inference

    NASA Astrophysics Data System (ADS)

    Ma, Wenping; Ren, Chen; Wu, Yue; Wang, Shanfeng; Feng, Xiang

    2017-10-01

    Recommender systems play an important role to help us to find useful information. They are widely used by most e-commerce web sites to push the potential items to individual user according to purchase history. Network-based recommendation algorithms are popular and effective in recommendation, which use two types of elements to represent users and items respectively. In this paper, based on consistence-based inference (CBI) algorithm, we propose a novel network-based algorithm, in which users and items are recognized with no difference. The proposed algorithm also uses information diffusion to find the relationship between users and items. Different from traditional network-based recommendation algorithms, information diffusion initializes from users and items, respectively. Experiments show that the proposed algorithm is effective compared with traditional network-based recommendation algorithms.

  8. Recurrent neural network-based modeling of gene regulatory network using elephant swarm water search algorithm.

    PubMed

    Mandal, Sudip; Saha, Goutam; Pal, Rajat Kumar

    2017-08-01

    Correct inference of genetic regulations inside a cell from the biological database like time series microarray data is one of the greatest challenges in post genomic era for biologists and researchers. Recurrent Neural Network (RNN) is one of the most popular and simple approach to model the dynamics as well as to infer correct dependencies among genes. Inspired by the behavior of social elephants, we propose a new metaheuristic namely Elephant Swarm Water Search Algorithm (ESWSA) to infer Gene Regulatory Network (GRN). This algorithm is mainly based on the water search strategy of intelligent and social elephants during drought, utilizing the different types of communication techniques. Initially, the algorithm is tested against benchmark small and medium scale artificial genetic networks without and with presence of different noise levels and the efficiency was observed in term of parametric error, minimum fitness value, execution time, accuracy of prediction of true regulation, etc. Next, the proposed algorithm is tested against the real time gene expression data of Escherichia Coli SOS Network and results were also compared with others state of the art optimization methods. The experimental results suggest that ESWSA is very efficient for GRN inference problem and performs better than other methods in many ways.

  9. Detection of Cheating by Decimation Algorithm

    NASA Astrophysics Data System (ADS)

    Yamanaka, Shogo; Ohzeki, Masayuki; Decelle, Aurélien

    2015-02-01

    We expand the item response theory to study the case of "cheating students" for a set of exams, trying to detect them by applying a greedy algorithm of inference. This extended model is closely related to the Boltzmann machine learning. In this paper we aim to infer the correct biases and interactions of our model by considering a relatively small number of sets of training data. Nevertheless, the greedy algorithm that we employed in the present study exhibits good performance with a few number of training data. The key point is the sparseness of the interactions in our problem in the context of the Boltzmann machine learning: the existence of cheating students is expected to be very rare (possibly even in real world). We compare a standard approach to infer the sparse interactions in the Boltzmann machine learning to our greedy algorithm and we find the latter to be superior in several aspects.

  10. Inference in the brain: Statistics flowing in redundant population codes

    PubMed Central

    Pitkow, Xaq; Angelaki, Dora E

    2017-01-01

    It is widely believed that the brain performs approximate probabilistic inference to estimate causal variables in the world from ambiguous sensory data. To understand these computations, we need to analyze how information is represented and transformed by the actions of nonlinear recurrent neural networks. We propose that these probabilistic computations function by a message-passing algorithm operating at the level of redundant neural populations. To explain this framework, we review its underlying concepts, including graphical models, sufficient statistics, and message-passing, and then describe how these concepts could be implemented by recurrently connected probabilistic population codes. The relevant information flow in these networks will be most interpretable at the population level, particularly for redundant neural codes. We therefore outline a general approach to identify the essential features of a neural message-passing algorithm. Finally, we argue that to reveal the most important aspects of these neural computations, we must study large-scale activity patterns during moderately complex, naturalistic behaviors. PMID:28595050

  11. Fast noise level estimation algorithm based on principal component analysis transform and nonlinear rectification

    NASA Astrophysics Data System (ADS)

    Xu, Shaoping; Zeng, Xiaoxia; Jiang, Yinnan; Tang, Yiling

    2018-01-01

    We proposed a noniterative principal component analysis (PCA)-based noise level estimation (NLE) algorithm that addresses the problem of estimating the noise level with a two-step scheme. First, we randomly extracted a number of raw patches from a given noisy image and took the smallest eigenvalue of the covariance matrix of the raw patches as the preliminary estimation of the noise level. Next, the final estimation was directly obtained with a nonlinear mapping (rectification) function that was trained on some representative noisy images corrupted with different known noise levels. Compared with the state-of-art NLE algorithms, the experiment results show that the proposed NLE algorithm can reliably infer the noise level and has robust performance over a wide range of image contents and noise levels, showing a good compromise between speed and accuracy in general.

  12. Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biros, George

    Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less

  13. Reinforce: An Ensemble Approach for Inferring PPI Network from AP-MS Data.

    PubMed

    Tian, Bo; Duan, Qiong; Zhao, Can; Teng, Ben; He, Zengyou

    2017-05-17

    Affinity Purification-Mass Spectrometry (AP-MS) is one of the most important technologies for constructing protein-protein interaction (PPI) networks. In this paper, we propose an ensemble method, Reinforce, for inferring PPI network from AP-MS data set. The new algorithm named Reinforce is based on rank aggregation and false discovery rate control. Under the null hypothesis that the interaction scores from different scoring methods are randomly generated, Reinforce follows three steps to integrate multiple ranking results from different algorithms or different data sets. The experimental results show that Reinforce can get more stable and accurate inference results than existing algorithms. The source codes of Reinforce and data sets used in the experiments are available at: https://sourceforge.net/projects/reinforce/.

  14. Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms.

    PubMed

    Galbadrakh, Bulgan; Lee, Kyung-Eun; Park, Hyun-Seok

    2012-12-01

    Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.

  15. GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

    NASA Astrophysics Data System (ADS)

    Takaishi, Tetsuya

    2015-01-01

    The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The GPU code is developed with CUDA Fortran. We compare the computational time in performing the HMC algorithm on GPU (GTX 760) and CPU (Intel i7-4770 3.4GHz) and find that the GPU can be up to 17 times faster than the CPU. We also code the program with OpenACC and find that appropriate coding can achieve the similar speedup with CUDA Fortran.

  16. Metis: A Pure Metropolis Markov Chain Monte Carlo Bayesian Inference Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bates, Cameron Russell; Mckigney, Edward Allen

    The use of Bayesian inference in data analysis has become the standard for large scienti c experiments [1, 2]. The Monte Carlo Codes Group(XCP-3) at Los Alamos has developed a simple set of algorithms currently implemented in C++ and Python to easily perform at-prior Markov Chain Monte Carlo Bayesian inference with pure Metropolis sampling. These implementations are designed to be user friendly and extensible for customization based on speci c application requirements. This document describes the algorithmic choices made and presents two use cases.

  17. Classification of complex information: inference of co-occurring affective states from their expressions in speech.

    PubMed

    Sobol-Shikler, Tal; Robinson, Peter

    2010-07-01

    We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance.

  18. Predicting gene regulatory networks by combining spatial and temporal gene expression data in Arabidopsis root stem cells

    PubMed Central

    de Luis Balaguer, Maria Angels; Fisher, Adam P.; Clark, Natalie M.; Fernandez-Espinosa, Maria Guadalupe; Möller, Barbara K.; Weijers, Dolf; Williams, Cranos; Lorenzo, Oscar; Sozzani, Rosangela

    2017-01-01

    Identifying the transcription factors (TFs) and associated networks involved in stem cell regulation is essential for understanding the initiation and growth of plant tissues and organs. Although many TFs have been shown to have a role in the Arabidopsis root stem cells, a comprehensive view of the transcriptional signature of the stem cells is lacking. In this work, we used spatial and temporal transcriptomic data to predict interactions among the genes involved in stem cell regulation. To accomplish this, we transcriptionally profiled several stem cell populations and developed a gene regulatory network inference algorithm that combines clustering with dynamic Bayesian network inference. We leveraged the topology of our networks to infer potential major regulators. Specifically, through mathematical modeling and experimental validation, we identified PERIANTHIA (PAN) as an important molecular regulator of quiescent center function. The results presented in this work show that our combination of molecular biology, computational biology, and mathematical modeling is an efficient approach to identify candidate factors that function in the stem cells. PMID:28827319

  19. Improving stochastic estimates with inference methods: calculating matrix diagonals.

    PubMed

    Selig, Marco; Oppermann, Niels; Ensslin, Torsten A

    2012-02-01

    Estimating the diagonal entries of a matrix, that is not directly accessible but only available as a linear operator in the form of a computer routine, is a common necessity in many computational applications, especially in image reconstruction and statistical inference. Here, methods of statistical inference are used to improve the accuracy or the computational costs of matrix probing methods to estimate matrix diagonals. In particular, the generalized Wiener filter methodology, as developed within information field theory, is shown to significantly improve estimates based on only a few sampling probes, in cases in which some form of continuity of the solution can be assumed. The strength, length scale, and precise functional form of the exploited autocorrelation function of the matrix diagonal is determined from the probes themselves. The developed algorithm is successfully applied to mock and real world problems. These performance tests show that, in situations where a matrix diagonal has to be calculated from only a small number of computationally expensive probes, a speedup by a factor of 2 to 10 is possible with the proposed method. © 2012 American Physical Society

  20. Bayesian inference for OPC modeling

    NASA Astrophysics Data System (ADS)

    Burbine, Andrew; Sturtevant, John; Fryer, David; Smith, Bruce W.

    2016-03-01

    The use of optical proximity correction (OPC) demands increasingly accurate models of the photolithographic process. Model building and inference techniques in the data science community have seen great strides in the past two decades which make better use of available information. This paper aims to demonstrate the predictive power of Bayesian inference as a method for parameter selection in lithographic models by quantifying the uncertainty associated with model inputs and wafer data. Specifically, the method combines the model builder's prior information about each modelling assumption with the maximization of each observation's likelihood as a Student's t-distributed random variable. Through the use of a Markov chain Monte Carlo (MCMC) algorithm, a model's parameter space is explored to find the most credible parameter values. During parameter exploration, the parameters' posterior distributions are generated by applying Bayes' rule, using a likelihood function and the a priori knowledge supplied. The MCMC algorithm used, an affine invariant ensemble sampler (AIES), is implemented by initializing many walkers which semiindependently explore the space. The convergence of these walkers to global maxima of the likelihood volume determine the parameter values' highest density intervals (HDI) to reveal champion models. We show that this method of parameter selection provides insights into the data that traditional methods do not and outline continued experiments to vet the method.

  1. Assessment of Gamma-Ray Spectra Analysis Method Utilizing the Fireworks Algorithm for various Error Measures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alamaniotis, Miltiadis; Tsoukalas, Lefteri H.

    2018-01-01

    Significant role in enhancing nuclear nonproliferation plays the analysis of obtained data and the inference of the presence or not of special nuclear materials in them. Among various types of measurements, gamma-ray spectra is the widest used type of data utilized for analysis in nonproliferation. In this chapter, a method that employs the fireworks algorithm (FWA) for analyzing gamma-ray spectra aiming at detecting gamma signatures is presented. In particular FWA is utilized to fit a set of known signatures to a measured spectrum by optimizing an objective function, with non-zero coefficients expressing the detected signatures. FWA is tested on amore » set of experimentally obtained measurements and various objective functions -MSE, RMSE, Theil-2, MAE, MAPE, MAP- with results exhibiting its potential in providing high accuracy and high precision of detected signatures. Furthermore, FWA is benchmarked against genetic algorithms, and multiple linear regression with results exhibiting its superiority over the rest tested algorithms with respect to precision for MAE, MAPE and MAP measures.« less

  2. Process Mining for Individualized Behavior Modeling Using Wireless Tracking in Nursing Homes

    PubMed Central

    Fernández-Llatas, Carlos; Benedi, José-Miguel; García-Gómez, Juan M.; Traver, Vicente

    2013-01-01

    The analysis of human behavior patterns is increasingly used for several research fields. The individualized modeling of behavior using classical techniques requires too much time and resources to be effective. A possible solution would be the use of pattern recognition techniques to automatically infer models to allow experts to understand individual behavior. However, traditional pattern recognition algorithms infer models that are not readily understood by human experts. This limits the capacity to benefit from the inferred models. Process mining technologies can infer models as workflows, specifically designed to be understood by experts, enabling them to detect specific behavior patterns in users. In this paper, the eMotiva process mining algorithms are presented. These algorithms filter, infer and visualize workflows. The workflows are inferred from the samples produced by an indoor location system that stores the location of a resident in a nursing home. The visualization tool is able to compare and highlight behavior patterns in order to facilitate expert understanding of human behavior. This tool was tested with nine real users that were monitored for a 25-week period. The results achieved suggest that the behavior of users is continuously evolving and changing and that this change can be measured, allowing for behavioral change detection. PMID:24225907

  3. Inferring the temperature dependence of population parameters: the effects of experimental design and inference algorithm

    PubMed Central

    Palamara, Gian Marco; Childs, Dylan Z; Clements, Christopher F; Petchey, Owen L; Plebani, Marco; Smith, Matthew J

    2014-01-01

    Understanding and quantifying the temperature dependence of population parameters, such as intrinsic growth rate and carrying capacity, is critical for predicting the ecological responses to environmental change. Many studies provide empirical estimates of such temperature dependencies, but a thorough investigation of the methods used to infer them has not been performed yet. We created artificial population time series using a stochastic logistic model parameterized with the Arrhenius equation, so that activation energy drives the temperature dependence of population parameters. We simulated different experimental designs and used different inference methods, varying the likelihood functions and other aspects of the parameter estimation methods. Finally, we applied the best performing inference methods to real data for the species Paramecium caudatum. The relative error of the estimates of activation energy varied between 5% and 30%. The fraction of habitat sampled played the most important role in determining the relative error; sampling at least 1% of the habitat kept it below 50%. We found that methods that simultaneously use all time series data (direct methods) and methods that estimate population parameters separately for each temperature (indirect methods) are complementary. Indirect methods provide a clearer insight into the shape of the functional form describing the temperature dependence of population parameters; direct methods enable a more accurate estimation of the parameters of such functional forms. Using both methods, we found that growth rate and carrying capacity of Paramecium caudatum scale with temperature according to different activation energies. Our study shows how careful choice of experimental design and inference methods can increase the accuracy of the inferred relationships between temperature and population parameters. The comparison of estimation methods provided here can increase the accuracy of model predictions, with important implications in understanding and predicting the effects of temperature on the dynamics of populations. PMID:25558365

  4. MultiNest: Efficient and Robust Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Feroz, F.; Hobson, M. P.; Bridges, M.

    2011-09-01

    We present further development and the first public release of our multimodal nested sampling algorithm, called MultiNest. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson (2008), which itself significantly outperformed existing MCMC techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MultiNest algorithm is demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla LambdaCDM model to include spatial curvature and a varying equation of state for dark energy. The MultiNest software is fully parallelized using MPI and includes an interface to CosmoMC. It will also be released as part of the SuperBayeS package, for the analysis of supersymmetric theories of particle physics, at this http URL.

  5. Bayesian inference of interaction properties of noisy dynamical systems with time-varying coupling: capabilities and limitations

    NASA Astrophysics Data System (ADS)

    Wilting, Jens; Lehnertz, Klaus

    2015-08-01

    We investigate a recently published analysis framework based on Bayesian inference for the time-resolved characterization of interaction properties of noisy, coupled dynamical systems. It promises wide applicability and a better time resolution than well-established methods. At the example of representative model systems, we show that the analysis framework has the same weaknesses as previous methods, particularly when investigating interacting, structurally different non-linear oscillators. We also inspect the tracking of time-varying interaction properties and propose a further modification of the algorithm, which improves the reliability of obtained results. We exemplarily investigate the suitability of this algorithm to infer strength and direction of interactions between various regions of the human brain during an epileptic seizure. Within the limitations of the applicability of this analysis tool, we show that the modified algorithm indeed allows a better time resolution through Bayesian inference when compared to previous methods based on least square fits.

  6. Recursive algorithms for phylogenetic tree counting.

    PubMed

    Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J

    2013-10-28

    In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.

  7. Estimating causal effects with a non-paranormal method for the design of efficient intervention experiments

    PubMed Central

    2014-01-01

    Background Knockdown or overexpression of genes is widely used to identify genes that play important roles in many aspects of cellular functions and phenotypes. Because next-generation sequencing generates high-throughput data that allow us to detect genes, it is important to identify genes that drive functional and phenotypic changes of cells. However, conventional methods rely heavily on the assumption of normality and they often give incorrect results when the assumption is not true. To relax the Gaussian assumption in causal inference, we introduce the non-paranormal method to test conditional independence in the PC-algorithm. Then, we present the non-paranormal intervention-calculus when the directed acyclic graph (DAG) is absent (NPN-IDA), which incorporates the cumulative nature of effects through a cascaded pathway via causal inference for ranking causal genes against a phenotype with the non-paranormal method for estimating DAGs. Results We demonstrate that causal inference with the non-paranormal method significantly improves the performance in estimating DAGs on synthetic data in comparison with the original PC-algorithm. Moreover, we show that NPN-IDA outperforms the conventional methods in exploring regulators of the flowering time in Arabidopsis thaliana and regulators that control the browning of white adipocytes in mice. Our results show that performance improvement in estimating DAGs contributes to an accurate estimation of causal effects. Conclusions Although the simplest alternative procedure was used, our proposed method enables us to design efficient intervention experiments and can be applied to a wide range of research purposes, including drug discovery, because of its generality. PMID:24980787

  8. Estimating causal effects with a non-paranormal method for the design of efficient intervention experiments.

    PubMed

    Teramoto, Reiji; Saito, Chiaki; Funahashi, Shin-ichi

    2014-06-30

    Knockdown or overexpression of genes is widely used to identify genes that play important roles in many aspects of cellular functions and phenotypes. Because next-generation sequencing generates high-throughput data that allow us to detect genes, it is important to identify genes that drive functional and phenotypic changes of cells. However, conventional methods rely heavily on the assumption of normality and they often give incorrect results when the assumption is not true. To relax the Gaussian assumption in causal inference, we introduce the non-paranormal method to test conditional independence in the PC-algorithm. Then, we present the non-paranormal intervention-calculus when the directed acyclic graph (DAG) is absent (NPN-IDA), which incorporates the cumulative nature of effects through a cascaded pathway via causal inference for ranking causal genes against a phenotype with the non-paranormal method for estimating DAGs. We demonstrate that causal inference with the non-paranormal method significantly improves the performance in estimating DAGs on synthetic data in comparison with the original PC-algorithm. Moreover, we show that NPN-IDA outperforms the conventional methods in exploring regulators of the flowering time in Arabidopsis thaliana and regulators that control the browning of white adipocytes in mice. Our results show that performance improvement in estimating DAGs contributes to an accurate estimation of causal effects. Although the simplest alternative procedure was used, our proposed method enables us to design efficient intervention experiments and can be applied to a wide range of research purposes, including drug discovery, because of its generality.

  9. ARACNe-based inference, using curated microarray data, of Arabidopsis thaliana root transcriptional regulatory networks

    PubMed Central

    2014-01-01

    Background Uncovering the complex transcriptional regulatory networks (TRNs) that underlie plant and animal development remains a challenge. However, a vast amount of data from public microarray experiments is available, which can be subject to inference algorithms in order to recover reliable TRN architectures. Results In this study we present a simple bioinformatics methodology that uses public, carefully curated microarray data and the mutual information algorithm ARACNe in order to obtain a database of transcriptional interactions. We used data from Arabidopsis thaliana root samples to show that the transcriptional regulatory networks derived from this database successfully recover previously identified root transcriptional modules and to propose new transcription factors for the SHORT ROOT/SCARECROW and PLETHORA pathways. We further show that these networks are a powerful tool to integrate and analyze high-throughput expression data, as exemplified by our analysis of a SHORT ROOT induction time-course microarray dataset, and are a reliable source for the prediction of novel root gene functions. In particular, we used our database to predict novel genes involved in root secondary cell-wall synthesis and identified the MADS-box TF XAL1/AGL12 as an unexpected participant in this process. Conclusions This study demonstrates that network inference using carefully curated microarray data yields reliable TRN architectures. In contrast to previous efforts to obtain root TRNs, that have focused on particular functional modules or tissues, our root transcriptional interactions provide an overview of the transcriptional pathways present in Arabidopsis thaliana roots and will likely yield a plethora of novel hypotheses to be tested experimentally. PMID:24739361

  10. Model-free inference of direct network interactions from nonlinear collective dynamics.

    PubMed

    Casadiego, Jose; Nitzan, Mor; Hallerberg, Sarah; Timme, Marc

    2017-12-19

    The topology of interactions in network dynamical systems fundamentally underlies their function. Accelerating technological progress creates massively available data about collective nonlinear dynamics in physical, biological, and technological systems. Detecting direct interaction patterns from those dynamics still constitutes a major open problem. In particular, current nonlinear dynamics approaches mostly require to know a priori a model of the (often high dimensional) system dynamics. Here we develop a model-independent framework for inferring direct interactions solely from recording the nonlinear collective dynamics generated. Introducing an explicit dependency matrix in combination with a block-orthogonal regression algorithm, the approach works reliably across many dynamical regimes, including transient dynamics toward steady states, periodic and non-periodic dynamics, and chaos. Together with its capabilities to reveal network (two point) as well as hypernetwork (e.g., three point) interactions, this framework may thus open up nonlinear dynamics options of inferring direct interaction patterns across systems where no model is known.

  11. Selection of optimum median-filter-based ambiguity removal algorithm parameters for NSCAT. [NASA scatterometer

    NASA Technical Reports Server (NTRS)

    Shaffer, Scott; Dunbar, R. Scott; Hsiao, S. Vincent; Long, David G.

    1989-01-01

    The NASA Scatterometer, NSCAT, is an active spaceborne radar designed to measure the normalized radar backscatter coefficient (sigma0) of the ocean surface. These measurements can, in turn, be used to infer the surface vector wind over the ocean using a geophysical model function. Several ambiguous wind vectors result because of the nature of the model function. A median-filter-based ambiguity removal algorithm will be used by the NSCAT ground data processor to select the best wind vector from the set of ambiguous wind vectors. This process is commonly known as dealiasing or ambiguity removal. The baseline NSCAT ambiguity removal algorithm and the method used to select the set of optimum parameter values are described. An extensive simulation of the NSCAT instrument and ground data processor provides a means of testing the resulting tuned algorithm. This simulation generates the ambiguous wind-field vectors expected from the instrument as it orbits over a set of realistic meoscale wind fields. The ambiguous wind field is then dealiased using the median-based ambiguity removal algorithm. Performance is measured by comparison of the unambiguous wind fields with the true wind fields. Results have shown that the median-filter-based ambiguity removal algorithm satisfies NSCAT mission requirements.

  12. Predictive minimum description length principle approach to inferring gene regulatory networks.

    PubMed

    Chaitankar, Vijender; Zhang, Chaoyang; Ghosh, Preetam; Gong, Ping; Perkins, Edward J; Deng, Youping

    2011-01-01

    Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

  13. Visual Inference Programming

    NASA Technical Reports Server (NTRS)

    Wheeler, Kevin; Timucin, Dogan; Rabbette, Maura; Curry, Charles; Allan, Mark; Lvov, Nikolay; Clanton, Sam; Pilewskie, Peter

    2002-01-01

    The goal of visual inference programming is to develop a software framework data analysis and to provide machine learning algorithms for inter-active data exploration and visualization. The topics include: 1) Intelligent Data Understanding (IDU) framework; 2) Challenge problems; 3) What's new here; 4) Framework features; 5) Wiring diagram; 6) Generated script; 7) Results of script; 8) Initial algorithms; 9) Independent Component Analysis for instrument diagnosis; 10) Output sensory mapping virtual joystick; 11) Output sensory mapping typing; 12) Closed-loop feedback mu-rhythm control; 13) Closed-loop training; 14) Data sources; and 15) Algorithms. This paper is in viewgraph form.

  14. Essential Autonomous Science Inference on Rovers (EASIR)

    NASA Technical Reports Server (NTRS)

    Roush, Ted L.; Shipman, Mark; Morris, Robert; Gazis, Paul; Pedersen, Liam

    2003-01-01

    Existing constraints on time, computational, and communication resources associated with Mars rover missions suggest on-board science evaluation of sensor data can contribute to decreasing human-directed operational planning, optimizing returned science data volumes, and recognition of unique or novel data. All of which act to increase the scientific return from a mission. Many different levels of science autonomy exist and each impacts the data collected and returned by, and activities of, rovers. Several computational algorithms, designed to recognize objects of interest to geologists and biologists, are discussed. The algorithms represent various functions that producing scientific opinions and several scenarios illustrate how the opinions can be used.

  15. Fully Decentralized Semi-supervised Learning via Privacy-preserving Matrix Completion.

    PubMed

    Fierimonte, Roberto; Scardapane, Simone; Uncini, Aurelio; Panella, Massimo

    2016-08-26

    Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.

  16. Gauging Variational Inference

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chertkov, Michael; Ahn, Sungsoo; Shin, Jinwoo

    Computing partition function is the most important statistical inference task arising in applications of Graphical Models (GM). Since it is computationally intractable, approximate methods have been used to resolve the issue in practice, where meanfield (MF) and belief propagation (BP) are arguably the most popular and successful approaches of a variational type. In this paper, we propose two new variational schemes, coined Gauged-MF (G-MF) and Gauged-BP (G-BP), improving MF and BP, respectively. Both provide lower bounds for the partition function by utilizing the so-called gauge transformation which modifies factors of GM while keeping the partition function invariant. Moreover, we provemore » that both G-MF and G-BP are exact for GMs with a single loop of a special structure, even though the bare MF and BP perform badly in this case. Our extensive experiments, on complete GMs of relatively small size and on large GM (up-to 300 variables) confirm that the newly proposed algorithms outperform and generalize MF and BP.« less

  17. COSMOABC: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation

    NASA Astrophysics Data System (ADS)

    Ishida, E. E. O.; Vitenti, S. D. P.; Penna-Lima, M.; Cisewski, J.; de Souza, R. S.; Trindade, A. M. M.; Cameron, E.; Busti, V. C.; COIN Collaboration

    2015-11-01

    Approximate Bayesian Computation (ABC) enables parameter inference for complex physical systems in cases where the true likelihood function is unknown, unavailable, or computationally too expensive. It relies on the forward simulation of mock data and comparison between observed and synthetic catalogues. Here we present COSMOABC, a Python ABC sampler featuring a Population Monte Carlo variation of the original ABC algorithm, which uses an adaptive importance sampling scheme. The code is very flexible and can be easily coupled to an external simulator, while allowing to incorporate arbitrary distance and prior functions. As an example of practical application, we coupled COSMOABC with the NUMCOSMO library and demonstrate how it can be used to estimate posterior probability distributions over cosmological parameters based on measurements of galaxy clusters number counts without computing the likelihood function. COSMOABC is published under the GPLv3 license on PyPI and GitHub and documentation is available at http://goo.gl/SmB8EX.

  18. Prediction system of hydroponic plant growth and development using algorithm Fuzzy Mamdani method

    NASA Astrophysics Data System (ADS)

    Sudana, I. Made; Purnawirawan, Okta; Arief, Ulfa Mediaty

    2017-03-01

    Hydroponics is a method of farming without soil. One of the Hydroponic plants is Watercress (Nasturtium Officinale). The development and growth process of hydroponic Watercress was influenced by levels of nutrients, acidity and temperature. The independent variables can be used as input variable system to predict the value level of plants growth and development. The prediction system is using Fuzzy Algorithm Mamdani method. This system was built to implement the function of Fuzzy Inference System (Fuzzy Inference System/FIS) as a part of the Fuzzy Logic Toolbox (FLT) by using MATLAB R2007b. FIS is a computing system that works on the principle of fuzzy reasoning which is similar to humans' reasoning. Basically FIS consists of four units which are fuzzification unit, fuzzy logic reasoning unit, base knowledge unit and defuzzification unit. In addition to know the effect of independent variables on the plants growth and development that can be visualized with the function diagram of FIS output surface that is shaped three-dimensional, and statistical tests based on the data from the prediction system using multiple linear regression method, which includes multiple linear regression analysis, T test, F test, the coefficient of determination and donations predictor that are calculated using SPSS (Statistical Product and Service Solutions) software applications.

  19. Stan: Statistical inference

    NASA Astrophysics Data System (ADS)

    Stan Development Team

    2018-01-01

    Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.

  20. Statistics, Computation, and Modeling in Cosmology

    NASA Astrophysics Data System (ADS)

    Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology

    2017-01-01

    Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.

  1. A statistical approach for inferring the 3D structure of the genome.

    PubMed

    Varoquaux, Nelle; Ay, Ferhat; Noble, William Stafford; Vert, Jean-Philippe

    2014-06-15

    Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely on multidimensional scaling (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, and thereby may lead to incorrect structure reconstruction. We propose a novel approach to infer a consensus 3D structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms-two metric MDS methods using different stress functions, a non-metric version of MDS and ChromSDE, a recently described, advanced MDS method-on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions. A Python implementation of the proposed method is available at http://cbio.ensmp.fr/pastis. © The Author 2014. Published by Oxford University Press.

  2. Hierarchical Probabilistic Inference of Cosmic Shear

    NASA Astrophysics Data System (ADS)

    Schneider, Michael D.; Hogg, David W.; Marshall, Philip J.; Dawson, William A.; Meyers, Joshua; Bard, Deborah J.; Lang, Dustin

    2015-07-01

    Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics.

  3. Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services.

    PubMed

    Shi, Longxiang; Li, Shijian; Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin

    2017-01-01

    With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective.

  4. Semantic Health Knowledge Graph: Semantic Integration of Heterogeneous Medical Knowledge and Services

    PubMed Central

    Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin

    2017-01-01

    With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective. PMID:28299322

  5. An inference engine for embedded diagnostic systems

    NASA Technical Reports Server (NTRS)

    Fox, Barry R.; Brewster, Larry T.

    1987-01-01

    The implementation of an inference engine for embedded diagnostic systems is described. The system consists of two distinct parts. The first is an off-line compiler which accepts a propositional logical statement of the relationship between facts and conclusions and produces data structures required by the on-line inference engine. The second part consists of the inference engine and interface routines which accept assertions of fact and return the conclusions which necessarily follow. Given a set of assertions, it will generate exactly the conclusions which logically follow. At the same time, it will detect any inconsistencies which may propagate from an inconsistent set of assertions or a poorly formulated set of rules. The memory requirements are fixed and the worst case execution times are bounded at compile time. The data structures and inference algorithms are very simple and well understood. The data structures and algorithms are described in detail. The system has been implemented on Lisp, Pascal, and Modula-2.

  6. Inferring microbial interaction networks from metagenomic data using SgLV-EKF algorithm.

    PubMed

    Alshawaqfeh, Mustafa; Serpedin, Erchin; Younes, Ahmad Bani

    2017-03-27

    Inferring the microbial interaction networks (MINs) and modeling their dynamics are critical in understanding the mechanisms of the bacterial ecosystem and designing antibiotic and/or probiotic therapies. Recently, several approaches were proposed to infer MINs using the generalized Lotka-Volterra (gLV) model. Main drawbacks of these models include the fact that these models only consider the measurement noise without taking into consideration the uncertainties in the underlying dynamics. Furthermore, inferring the MIN is characterized by the limited number of observations and nonlinearity in the regulatory mechanisms. Therefore, novel estimation techniques are needed to address these challenges. This work proposes SgLV-EKF: a stochastic gLV model that adopts the extended Kalman filter (EKF) algorithm to model the MIN dynamics. In particular, SgLV-EKF employs a stochastic modeling of the MIN by adding a noise term to the dynamical model to compensate for modeling uncertainties. This stochastic modeling is more realistic than the conventional gLV model which assumes that the MIN dynamics are perfectly governed by the gLV equations. After specifying the stochastic model structure, we propose the EKF to estimate the MIN. SgLV-EKF was compared with two similarity-based algorithms, one algorithm from the integral-based family and two regression-based algorithms, in terms of the achieved performance on two synthetic data-sets and two real data-sets. The first data-set models the randomness in measurement data, whereas, the second data-set incorporates uncertainties in the underlying dynamics. The real data-sets are provided by a recent study pertaining to an antibiotic-mediated Clostridium difficile infection. The experimental results demonstrate that SgLV-EKF outperforms the alternative methods in terms of robustness to measurement noise, modeling errors, and tracking the dynamics of the MIN. Performance analysis demonstrates that the proposed SgLV-EKF algorithm represents a powerful and reliable tool to infer MINs and track their dynamics.

  7. Boolean network inference from time series data incorporating prior biological knowledge.

    PubMed

    Haider, Saad; Pal, Ranadip

    2012-01-01

    Numerous approaches exist for modeling of genetic regulatory networks (GRNs) but the low sampling rates often employed in biological studies prevents the inference of detailed models from experimental data. In this paper, we analyze the issues involved in estimating a model of a GRN from single cell line time series data with limited time points. We present an inference approach for a Boolean Network (BN) model of a GRN from limited transcriptomic or proteomic time series data based on prior biological knowledge of connectivity, constraints on attractor structure and robust design. We applied our inference approach to 6 time point transcriptomic data on Human Mammary Epithelial Cell line (HMEC) after application of Epidermal Growth Factor (EGF) and generated a BN with a plausible biological structure satisfying the data. We further defined and applied a similarity measure to compare synthetic BNs and BNs generated through the proposed approach constructed from transitions of various paths of the synthetic BNs. We have also compared the performance of our algorithm with two existing BN inference algorithms. Through theoretical analysis and simulations, we showed the rarity of arriving at a BN from limited time series data with plausible biological structure using random connectivity and absence of structure in data. The framework when applied to experimental data and data generated from synthetic BNs were able to estimate BNs with high similarity scores. Comparison with existing BN inference algorithms showed the better performance of our proposed algorithm for limited time series data. The proposed framework can also be applied to optimize the connectivity of a GRN from experimental data when the prior biological knowledge on regulators is limited or not unique.

  8. On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inference

    NASA Astrophysics Data System (ADS)

    Hu, Zixi; Yao, Zhewei; Li, Jinglai

    2017-03-01

    Many scientific and engineering problems require to perform Bayesian inference for unknowns of infinite dimension. In such problems, many standard Markov Chain Monte Carlo (MCMC) algorithms become arbitrary slow under the mesh refinement, which is referred to as being dimension dependent. To this end, a family of dimensional independent MCMC algorithms, known as the preconditioned Crank-Nicolson (pCN) methods, were proposed to sample the infinite dimensional parameters. In this work we develop an adaptive version of the pCN algorithm, where the covariance operator of the proposal distribution is adjusted based on sampling history to improve the simulation efficiency. We show that the proposed algorithm satisfies an important ergodicity condition under some mild assumptions. Finally we provide numerical examples to demonstrate the performance of the proposed method.

  9. MapReduce Algorithms for Inferring Gene Regulatory Networks from Time-Series Microarray Data Using an Information-Theoretic Approach.

    PubMed

    Abduallah, Yasser; Turki, Turki; Byron, Kevin; Du, Zongxuan; Cervantes-Cervantes, Miguel; Wang, Jason T L

    2017-01-01

    Gene regulation is a series of processes that control gene expression and its extent. The connections among genes and their regulatory molecules, usually transcription factors, and a descriptive model of such connections are known as gene regulatory networks (GRNs). Elucidating GRNs is crucial to understand the inner workings of the cell and the complexity of gene interactions. To date, numerous algorithms have been developed to infer gene regulatory networks. However, as the number of identified genes increases and the complexity of their interactions is uncovered, networks and their regulatory mechanisms become cumbersome to test. Furthermore, prodding through experimental results requires an enormous amount of computation, resulting in slow data processing. Therefore, new approaches are needed to expeditiously analyze copious amounts of experimental data resulting from cellular GRNs. To meet this need, cloud computing is promising as reported in the literature. Here, we propose new MapReduce algorithms for inferring gene regulatory networks on a Hadoop cluster in a cloud environment. These algorithms employ an information-theoretic approach to infer GRNs using time-series microarray data. Experimental results show that our MapReduce program is much faster than an existing tool while achieving slightly better prediction accuracy than the existing tool.

  10. A flexible ontology for inference of emergent whole cell function from relationships between subcellular processes.

    PubMed

    Hansen, Jens; Meretzky, David; Woldesenbet, Simeneh; Stolovitzky, Gustavo; Iyengar, Ravi

    2017-12-18

    Whole cell responses arise from coordinated interactions between diverse human gene products functioning within various pathways underlying sub-cellular processes (SCP). Lower level SCPs interact to form higher level SCPs, often in a context specific manner to give rise to whole cell function. We sought to determine if capturing such relationships enables us to describe the emergence of whole cell functions from interacting SCPs. We developed the Molecular Biology of the Cell Ontology based on standard cell biology and biochemistry textbooks and review articles. Currently, our ontology contains 5,384 genes, 753 SCPs and 19,180 expertly curated gene-SCP associations. Our algorithm to populate the SCPs with genes enables extension of the ontology on demand and the adaption of the ontology to the continuously growing cell biological knowledge. Since whole cell responses most often arise from the coordinated activity of multiple SCPs, we developed a dynamic enrichment algorithm that flexibly predicts SCP-SCP relationships beyond the current taxonomy. This algorithm enables us to identify interactions between SCPs as a basis for higher order function in a context dependent manner, allowing us to provide a detailed description of how SCPs together can give rise to whole cell functions. We conclude that this ontology can, from omics data sets, enable the development of detailed SCP networks for predictive modeling of emergent whole cell functions.

  11. Inference of gene regulatory networks from genome-wide knockout fitness data

    PubMed Central

    Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

    2013-01-01

    Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online PMID:23271269

  12. Efficient fuzzy Bayesian inference algorithms for incorporating expert knowledge in parameter estimation

    NASA Astrophysics Data System (ADS)

    Rajabi, Mohammad Mahdi; Ataie-Ashtiani, Behzad

    2016-05-01

    Bayesian inference has traditionally been conceived as the proper framework for the formal incorporation of expert knowledge in parameter estimation of groundwater models. However, conventional Bayesian inference is incapable of taking into account the imprecision essentially embedded in expert provided information. In order to solve this problem, a number of extensions to conventional Bayesian inference have been introduced in recent years. One of these extensions is 'fuzzy Bayesian inference' which is the result of integrating fuzzy techniques into Bayesian statistics. Fuzzy Bayesian inference has a number of desirable features which makes it an attractive approach for incorporating expert knowledge in the parameter estimation process of groundwater models: (1) it is well adapted to the nature of expert provided information, (2) it allows to distinguishably model both uncertainty and imprecision, and (3) it presents a framework for fusing expert provided information regarding the various inputs of the Bayesian inference algorithm. However an important obstacle in employing fuzzy Bayesian inference in groundwater numerical modeling applications is the computational burden, as the required number of numerical model simulations often becomes extremely exhaustive and often computationally infeasible. In this paper, a novel approach of accelerating the fuzzy Bayesian inference algorithm is proposed which is based on using approximate posterior distributions derived from surrogate modeling, as a screening tool in the computations. The proposed approach is first applied to a synthetic test case of seawater intrusion (SWI) in a coastal aquifer. It is shown that for this synthetic test case, the proposed approach decreases the number of required numerical simulations by an order of magnitude. Then the proposed approach is applied to a real-world test case involving three-dimensional numerical modeling of SWI in Kish Island, located in the Persian Gulf. An expert elicitation methodology is developed and applied to the real-world test case in order to provide a road map for the use of fuzzy Bayesian inference in groundwater modeling applications.

  13. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    PubMed Central

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  14. Convex clustering: an attractive alternative to hierarchical clustering.

    PubMed

    Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth

    2015-05-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.

  15. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  16. Inverse Ising problem in continuous time: A latent variable approach

    NASA Astrophysics Data System (ADS)

    Donner, Christian; Opper, Manfred

    2017-12-01

    We consider the inverse Ising problem: the inference of network couplings from observed spin trajectories for a model with continuous time Glauber dynamics. By introducing two sets of auxiliary latent random variables we render the likelihood into a form which allows for simple iterative inference algorithms with analytical updates. The variables are (1) Poisson variables to linearize an exponential term which is typical for point process likelihoods and (2) Pólya-Gamma variables, which make the likelihood quadratic in the coupling parameters. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimate of network parameters. Using a third set of latent variables we extend the EM algorithm to sparse couplings via L1 regularization. Finally, we develop an efficient approximate Bayesian inference algorithm using a variational approach. We demonstrate the performance of our algorithms on data simulated from an Ising model. For data which are simulated from a more biologically plausible network with spiking neurons, we show that the Ising model captures well the low order statistics of the data and how the Ising couplings are related to the underlying synaptic structure of the simulated network.

  17. Probabilistic Algorithmic Knowledge

    DTIC Science & Technology

    2005-12-20

    standard possible-worlds sense. Although soundness is not required in the basic definition, it does seem to be useful in many applications. Our interest...think of as describing basic facts about the system, such as “the door is closed” or “agent A sent the message m to B”, more complicated formulas are...messages as long as the adversary knows the decryption key. (The function submsg basically implements the inference rules for ⊢DY .) A DY i (hasi(m

  18. Characterization of Polar Stratospheric Clouds With Spaceborne Lidar: CALIPSO and the 2006 Antarctic Season

    NASA Technical Reports Server (NTRS)

    Pitts, Michael C.; Thomason, L. W.; Poole, Lamont R.; Winker, David M.

    2007-01-01

    The role of polar stratospheric clouds in polar ozone loss has been well documented. The CALIPSO satellite mission offers a new opportunity to characterize PSCs on spatial and temporal scales previously unavailable. A PSC detection algorithm based on a single wavelength threshold approach has been developed for CALIPSO. The method appears to accurately detect PSCs of all opacities, including tenuous clouds, with a very low rate of false positives and few missed clouds. We applied the algorithm to CALIPSO data acquired during the 2006 Antarctic winter season from 13 June through 31 October. The spatial and temporal distribution of CALIPSO PSC observations is illustrated with weekly maps of PSC occurrence. The evolution of the 2006 PSC season is depicted by time series of daily PSC frequency as a function of altitude. Comparisons with virtual solar occultation data indicate that CALIPSO provides a different view of the PSC season than attained with previous solar occultation satellites. Measurement-based time series of PSC areal coverage and vertically-integrated PSC volume are computed from the CALIPSO data. The observed area covered with PSCs is significantly smaller than would be inferred from a temperature-based proxy such as TNAT but is similar in magnitude to that inferred from TSTS. The potential of CALIPSO measurements for investigating PSC microphysics is illustrated using combinations of lidar backscatter coefficient and volume depolarization to infer composition for two CALIPSO PSC scenes.

  19. CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data

    PubMed Central

    Weiss, Scott T.

    2014-01-01

    Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com. PMID:24922310

  20. CGBayesNets: conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data.

    PubMed

    McGeachie, Michael J; Chang, Hsun-Hsien; Weiss, Scott T

    2014-06-01

    Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.

  1. Inference of scale-free networks from gene expression time series.

    PubMed

    Daisuke, Tominaga; Horton, Paul

    2006-04-01

    Quantitative time-series observation of gene expression is becoming possible, for example by cell array technology. However, there are no practical methods with which to infer network structures using only observed time-series data. As most computational models of biological networks for continuous time-series data have a high degree of freedom, it is almost impossible to infer the correct structures. On the other hand, it has been reported that some kinds of biological networks, such as gene networks and metabolic pathways, may have scale-free properties. We hypothesize that the architecture of inferred biological network models can be restricted to scale-free networks. We developed an inference algorithm for biological networks using only time-series data by introducing such a restriction. We adopt the S-system as the network model, and a distributed genetic algorithm to optimize models to fit its simulated results to observed time series data. We have tested our algorithm on a case study (simulated data). We compared optimization under no restriction, which allows for a fully connected network, and under the restriction that the total number of links must equal that expected from a scale free network. The restriction reduced both false positive and false negative estimation of the links and also the differences between model simulation and the given time-series data.

  2. Inference of multi-Gaussian property fields by probabilistic inversion of crosshole ground penetrating radar data using an improved dimensionality reduction

    NASA Astrophysics Data System (ADS)

    Hunziker, Jürg; Laloy, Eric; Linde, Niklas

    2016-04-01

    Deterministic inversion procedures can often explain field data, but they only deliver one final subsurface model that depends on the initial model and regularization constraints. This leads to poor insights about the uncertainties associated with the inferred model properties. In contrast, probabilistic inversions can provide an ensemble of model realizations that accurately span the range of possible models that honor the available calibration data and prior information allowing a quantitative description of model uncertainties. We reconsider the problem of inferring the dielectric permittivity (directly related to radar velocity) structure of the subsurface by inversion of first-arrival travel times from crosshole ground penetrating radar (GPR) measurements. We rely on the DREAM_(ZS) algorithm that is a state-of-the-art Markov chain Monte Carlo (MCMC) algorithm. Such algorithms need several orders of magnitude more forward simulations than deterministic algorithms and often become infeasible in high parameter dimensions. To enable high-resolution imaging with MCMC, we use a recently proposed dimensionality reduction approach that allows reproducing 2D multi-Gaussian fields with far fewer parameters than a classical grid discretization. We consider herein a dimensionality reduction from 5000 to 257 unknowns. The first 250 parameters correspond to a spectral representation of random and uncorrelated spatial fluctuations while the remaining seven geostatistical parameters are (1) the standard deviation of the data error, (2) the mean and (3) the variance of the relative electric permittivity, (4) the integral scale along the major axis of anisotropy, (5) the anisotropy angle, (6) the ratio of the integral scale along the minor axis of anisotropy to the integral scale along the major axis of anisotropy and (7) the shape parameter of the Matérn function. The latter essentially defines the type of covariance function (e.g., exponential, Whittle, Gaussian). We present an improved formulation of the dimensionality reduction, and numerically show how it reduces artifacts in the generated models and provides better posterior estimation of the subsurface geostatistical structure. We next show that the results of the method compare very favorably against previous deterministic and stochastic inversion results obtained at the South Oyster Bacterial Transport Site in Virginia, USA. The long-term goal of this work is to enable MCMC-based full waveform inversion of crosshole GPR data.

  3. Mapping the landscape of metabolic goals of a cell

    DOE PAGES

    Zhao, Qi; Stettner, Arion I.; Reznik, Ed; ...

    2016-05-23

    Here, genome-scale flux balance models of metabolism provide testable predictions of all metabolic rates in an organism, by assuming that the cell is optimizing a metabolic goal known as the objective function. We introduce an efficient inverse flux balance analysis (invFBA) approach, based on linear programming duality, to characterize the space of possible objective functions compatible with measured fluxes. After testing our algorithm on simulated E. coli data and time-dependent S. oneidensis fluxes inferred from gene expression, we apply our inverse approach to flux measurements in long-term evolved E. coli strains, revealing objective functions that provide insight into metabolic adaptationmore » trajectories.« less

  4. ARNetMiT R Package: association rules based gene co-expression networks of miRNA targets.

    PubMed

    Özgür Cingiz, M; Biricik, G; Diri, B

    2017-03-31

    miRNAs are key regulators that bind to target genes to suppress their gene expression level. The relations between miRNA-target genes enable users to derive co-expressed genes that may be involved in similar biological processes and functions in cells. We hypothesize that target genes of miRNAs are co-expressed, when they are regulated by multiple miRNAs. With the usage of these co-expressed genes, we can theoretically construct co-expression networks (GCNs) related to 152 diseases. In this study, we introduce ARNetMiT that utilize a hash based association rule algorithm in a novel way to infer the GCNs on miRNA-target genes data. We also present R package of ARNetMiT, which infers and visualizes GCNs of diseases that are selected by users. Our approach assumes miRNAs as transactions and target genes as their items. Support and confidence values are used to prune association rules on miRNA-target genes data to construct support based GCNs (sGCNs) along with support and confidence based GCNs (scGCNs). We use overlap analysis and the topological features for the performance analysis of GCNs. We also infer GCNs with popular GNI algorithms for comparison with the GCNs of ARNetMiT. Overlap analysis results show that ARNetMiT outperforms the compared GNI algorithms. We see that using high confidence values in scGCNs increase the ratio of the overlapped gene-gene interactions between the compared methods. According to the evaluation of the topological features of ARNetMiT based GCNs, the degrees of nodes have power-law distribution. The hub genes discovered by ARNetMiT based GCNs are consistent with the literature.

  5. The NIFTy way of Bayesian signal inference

    NASA Astrophysics Data System (ADS)

    Selig, Marco

    2014-12-01

    We introduce NIFTy, "Numerical Information Field Theory", a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTy can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTy as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D3PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.

  6. Using MOEA with Redistribution and Consensus Branches to Infer Phylogenies.

    PubMed

    Min, Xiaoping; Zhang, Mouzhao; Yuan, Sisi; Ge, Shengxiang; Liu, Xiangrong; Zeng, Xiangxiang; Xia, Ningshao

    2017-12-26

    In recent years, to infer phylogenies, which are NP-hard problems, more and more research has focused on using metaheuristics. Maximum Parsimony and Maximum Likelihood are two effective ways to conduct inference. Based on these methods, which can also be considered as the optimal criteria for phylogenies, various kinds of multi-objective metaheuristics have been used to reconstruct phylogenies. However, combining these two time-consuming methods results in those multi-objective metaheuristics being slower than a single objective. Therefore, we propose a novel, multi-objective optimization algorithm, MOEA-RC, to accelerate the processes of rebuilding phylogenies using structural information of elites in current populations. We compare MOEA-RC with two representative multi-objective algorithms, MOEA/D and NAGA-II, and a non-consensus version of MOEA-RC on three real-world datasets. The result is, within a given number of iterations, MOEA-RC achieves better solutions than the other algorithms.

  7. Unsupervised learning of natural languages

    PubMed Central

    Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

    2005-01-01

    We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885

  8. Unsupervised learning of natural languages.

    PubMed

    Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

    2005-08-16

    We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

  9. Algorithms for database-dependent search of MS/MS data.

    PubMed

    Matthiesen, Rune

    2013-01-01

    The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.

  10. Hybrid optimization and Bayesian inference techniques for a non-smooth radiation detection problem

    DOE PAGES

    Stefanescu, Razvan; Schmidt, Kathleen; Hite, Jason; ...

    2016-12-12

    In this paper, we propose several algorithms to recover the location and intensity of a radiation source located in a simulated 250 × 180 m block of an urban center based on synthetic measurements. Radioactive decay and detection are Poisson random processes, so we employ likelihood functions based on this distribution. Owing to the domain geometry and the proposed response model, the negative logarithm of the likelihood is only piecewise continuous differentiable, and it has multiple local minima. To address these difficulties, we investigate three hybrid algorithms composed of mixed optimization techniques. For global optimization, we consider simulated annealing, particlemore » swarm, and genetic algorithm, which rely solely on objective function evaluations; that is, they do not evaluate the gradient in the objective function. By employing early stopping criteria for the global optimization methods, a pseudo-optimum point is obtained. This is subsequently utilized as the initial value by the deterministic implicit filtering method, which is able to find local extrema in non-smooth functions, to finish the search in a narrow domain. These new hybrid techniques, combining global optimization and implicit filtering address, difficulties associated with the non-smooth response, and their performances, are shown to significantly decrease the computational time over the global optimization methods. To quantify uncertainties associated with the source location and intensity, we employ the delayed rejection adaptive Metropolis and DiffeRential Evolution Adaptive Metropolis algorithms. Finally, marginal densities of the source properties are obtained, and the means of the chains compare accurately with the estimates produced by the hybrid algorithms.« less

  11. Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference.

    PubMed

    Morcos, Faruck; Lamanna, Charles; Sikora, Marcin; Izaguirre, Jesús

    2008-10-01

    Cytoprophet is a software tool that allows prediction and visualization of protein and domain interaction networks. It is implemented as a plug-in of Cytoscape, an open source software framework for analysis and visualization of molecular networks. Cytoprophet implements three algorithms that predict new potential physical interactions using the domain composition of proteins and experimental assays. The algorithms for protein and domain interaction inference include maximum likelihood estimation (MLE) using expectation maximization (EM); the set cover approach maximum specificity set cover (MSSC) and the sum-product algorithm (SPA). After accepting an input set of proteins with Uniprot ID/Accession numbers and a selected prediction algorithm, Cytoprophet draws a network of potential interactions with probability scores and GO distances as edge attributes. A network of domain interactions between the domains of the initial protein list can also be generated. Cytoprophet was designed to take advantage of the visual capabilities of Cytoscape and be simple to use. An example of inference in a signaling network of myxobacterium Myxococcus xanthus is presented and available at Cytoprophet's website. http://cytoprophet.cse.nd.edu.

  12. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  13. An inference method from multi-layered structure of biomedical data.

    PubMed

    Kim, Myungjun; Nam, Yonghyun; Shin, Hyunjung

    2017-05-18

    Biological system is a multi-layered structure of omics with genome, epigenome, transcriptome, metabolome, proteome, etc., and can be further stretched to clinical/medical layers such as diseasome, drugs, and symptoms. One advantage of omics is that we can figure out an unknown component or its trait by inferring from known omics components. The component can be inferred by the ones in the same level of omics or the ones in different levels. To implement the inference process, an algorithm that can be applied to the multi-layered complex system is required. In this study, we develop a semi-supervised learning algorithm that can be applied to the multi-layered complex system. In order to verify the validity of the inference, it was applied to the prediction problem of disease co-occurrence with a two-layered network composed of symptom-layer and disease-layer. The symptom-disease layered network obtained a fairly high value of AUC, 0.74, which is regarded as noticeable improvement when comparing 0.59 AUC of single-layered disease network. If further stretched to whole layered structure of omics, the proposed method is expected to produce more promising results. This research has novelty in that it is a new integrative algorithm that incorporates the vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. The results can provide enhanced guideline for disease co-occurrence prediction, thereby serve as a valuable tool for inference process of multi-layered biological system.

  14. Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data

    PubMed Central

    Emmert-Streib, Frank; Glazko, Galina V.; Altay, Gökmen; de Matos Simoes, Ricardo

    2012-01-01

    In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms. PMID:22408642

  15. GENIUS: web server to predict local gene networks and key genes for biological functions.

    PubMed

    Puelma, Tomas; Araus, Viviana; Canales, Javier; Vidal, Elena A; Cabello, Juan M; Soto, Alvaro; Gutiérrez, Rodrigo A

    2017-03-01

    GENIUS is a user-friendly web server that uses a novel machine learning algorithm to infer functional gene networks focused on specific genes and experimental conditions that are relevant to biological functions of interest. These functions may have different levels of complexity, from specific biological processes to complex traits that involve several interacting processes. GENIUS also enriches the network with new genes related to the biological function of interest, with accuracies comparable to highly discriminative Support Vector Machine methods. GENIUS currently supports eight model organisms and is freely available for public use at http://networks.bio.puc.cl/genius . genius.psbl@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  16. Control Algorithms For Liquid-Cooled Garments

    NASA Technical Reports Server (NTRS)

    Drew, B.; Harner, K.; Hodgson, E.; Homa, J.; Jennings, D.; Yanosy, J.

    1988-01-01

    Three algorithms developed for control of cooling in protective garments. Metabolic rate inferred from temperatures of cooling liquid outlet and inlet, suitably filtered to account for thermal lag of human body. Temperature at inlet adjusted to value giving maximum comfort at inferred metabolic rate. Applicable to space suits, used for automatic control of cooling in suits worn by workers in radioactive, polluted, or otherwise hazardous environments. More effective than manual control, subject to frequent, overcompensated adjustments as level of activity varies.

  17. Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm. | Office of Cancer Genomics

    Cancer.gov

    We and others have shown that transition and maintenance of biological states is controlled by master regulator proteins, which can be inferred by interrogating tissue-specific regulatory models (interactomes) with transcriptional signatures, using the VIPER algorithm. Yet, some tissues may lack molecular profiles necessary for interactome inference (orphan tissues), or, as for single cells isolated from heterogeneous samples, their tissue context may be undetermined.

  18. Markov chain Monte Carlo techniques applied to parton distribution functions determination: Proof of concept

    NASA Astrophysics Data System (ADS)

    Gbedo, Yémalin Gabin; Mangin-Brinet, Mariane

    2017-07-01

    We present a new procedure to determine parton distribution functions (PDFs), based on Markov chain Monte Carlo (MCMC) methods. The aim of this paper is to show that we can replace the standard χ2 minimization by procedures grounded on statistical methods, and on Bayesian inference in particular, thus offering additional insight into the rich field of PDFs determination. After a basic introduction to these techniques, we introduce the algorithm we have chosen to implement—namely Hybrid (or Hamiltonian) Monte Carlo. This algorithm, initially developed for Lattice QCD, turns out to be very interesting when applied to PDFs determination by global analyses; we show that it allows us to circumvent the difficulties due to the high dimensionality of the problem, in particular concerning the acceptance. A first feasibility study is performed and presented, which indicates that Markov chain Monte Carlo can successfully be applied to the extraction of PDFs and of their uncertainties.

  19. Semisupervised learning using Bayesian interpretation: application to LS-SVM.

    PubMed

    Adankon, Mathias M; Cheriet, Mohamed; Biem, Alain

    2011-04-01

    Bayesian reasoning provides an ideal basis for representing and manipulating uncertain knowledge, with the result that many interesting algorithms in machine learning are based on Bayesian inference. In this paper, we use the Bayesian approach with one and two levels of inference to model the semisupervised learning problem and give its application to the successful kernel classifier support vector machine (SVM) and its variant least-squares SVM (LS-SVM). Taking advantage of Bayesian interpretation of LS-SVM, we develop a semisupervised learning algorithm for Bayesian LS-SVM using our approach based on two levels of inference. Experimental results on both artificial and real pattern recognition problems show the utility of our method.

  20. Performance Optimization Control of ECH using Fuzzy Inference Application

    NASA Astrophysics Data System (ADS)

    Dubey, Abhay Kumar

    Electro-chemical honing (ECH) is a hybrid electrolytic precision micro-finishing technology that, by combining physico-chemical actions of electro-chemical machining and conventional honing processes, provides the controlled functional surfaces-generation and fast material removal capabilities in a single operation. Process multi-performance optimization has become vital for utilizing full potential of manufacturing processes to meet the challenging requirements being placed on the surface quality, size, tolerances and production rate of engineering components in this globally competitive scenario. This paper presents an strategy that integrates the Taguchi matrix experimental design, analysis of variances and fuzzy inference system (FIS) to formulate a robust practical multi-performance optimization methodology for complex manufacturing processes like ECH, which involve several control variables. Two methodologies one using a genetic algorithm tuning of FIS (GA-tuned FIS) and another using an adaptive network based fuzzy inference system (ANFIS) have been evaluated for a multi-performance optimization case study of ECH. The actual experimental results confirm their potential for a wide range of machining conditions employed in ECH.

  1. Quantifying Biomass from Point Clouds by Connecting Representations of Ecosystem Structure

    NASA Astrophysics Data System (ADS)

    Hendryx, S. M.; Barron-Gafford, G.

    2017-12-01

    Quantifying terrestrial ecosystem biomass is an essential part of monitoring carbon stocks and fluxes within the global carbon cycle and optimizing natural resource management. Point cloud data such as from lidar and structure from motion can be effective for quantifying biomass over large areas, but significant challenges remain in developing effective models that allow for such predictions. Inference models that estimate biomass from point clouds are established in many environments, yet, are often scale-dependent, needing to be fitted and applied at the same spatial scale and grid size at which they were developed. Furthermore, training such models typically requires large in situ datasets that are often prohibitively costly or time-consuming to obtain. We present here a scale- and sensor-invariant framework for efficiently estimating biomass from point clouds. Central to this framework, we present a new algorithm, assignPointsToExistingClusters, that has been developed for finding matches between in situ data and clusters in remotely-sensed point clouds. The algorithm can be used for assessing canopy segmentation accuracy and for training and validating machine learning models for predicting biophysical variables. We demonstrate the algorithm's efficacy by using it to train a random forest model of above ground biomass in a shrubland environment in Southern Arizona. We show that by learning a nonlinear function to estimate biomass from segmented canopy features we can reduce error, especially in the presence of inaccurate clusterings, when compared to a traditional, deterministic technique to estimate biomass from remotely measured canopies. Our random forest on cluster features model extends established methods of training random forest regressions to predict biomass of subplots but requires significantly less training data and is scale invariant. The random forest on cluster features model reduced mean absolute error, when evaluated on all test data in leave one out cross validation, by 40.6% from deterministic mesquite allometry and 35.9% from the inferred ecosystem-state allometric function. Our framework should allow for the inference of biomass more efficiently than common subplot methods and more accurately than individual tree segmentation methods in densely vegetated environments.

  2. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

    PubMed

    Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2009-11-01

    Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

  3. Real-time prediction and gating of respiratory motion in 3D space using extended Kalman filters and Gaussian process regression network

    NASA Astrophysics Data System (ADS)

    Bukhari, W.; Hong, S.-M.

    2016-03-01

    The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the radiation treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting respiratory motion in 3D space and realizing a gating function without pre-specifying a particular phase of the patient’s breathing cycle. The algorithm, named EKF-GPRN+ , first employs an extended Kalman filter (EKF) independently along each coordinate to predict the respiratory motion and then uses a Gaussian process regression network (GPRN) to correct the prediction error of the EKF in 3D space. The GPRN is a nonparametric Bayesian algorithm for modeling input-dependent correlations between the output variables in multi-output regression. Inference in GPRN is intractable and we employ variational inference with mean field approximation to compute an approximate predictive mean and predictive covariance matrix. The approximate predictive mean is used to correct the prediction error of the EKF. The trace of the approximate predictive covariance matrix is utilized to capture the uncertainty in EKF-GPRN+ prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification enables us to pause the treatment beam over such instances. EKF-GPRN+ implements a gating function by using simple calculations based on the trace of the predictive covariance matrix. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPRN+ . The experimental results show that the EKF-GPRN+ algorithm reduces the patient-wise prediction error to 38%, 40% and 40% in root-mean-square, compared to no prediction, at lookahead lengths of 192 ms, 384 ms and 576 ms, respectively. The EKF-GPRN+ algorithm can further reduce the prediction error by employing the gating function, albeit at the cost of reduced duty cycle. The error reduction allows the clinical target volume to planning target volume (CTV-PTV) margin to be reduced, leading to decreased normal-tissue toxicity and possible dose escalation. The CTV-PTV margin is also evaluated to quantify clinical benefits of EKF-GPRN+ prediction.

  4. The probabilistic convolution tree: efficient exact Bayesian inference for faster LC-MS/MS protein inference.

    PubMed

    Serang, Oliver

    2014-01-01

    Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called "causal independence"). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to O(k log(k)2) and the space to O(k log(k)) where k is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions.

  5. The Probabilistic Convolution Tree: Efficient Exact Bayesian Inference for Faster LC-MS/MS Protein Inference

    PubMed Central

    Serang, Oliver

    2014-01-01

    Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions. PMID:24626234

  6. HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schneider, Michael D.; Dawson, William A.; Hogg, David W.

    2015-07-01

    Point estimators for the shearing of galaxy images induced by gravitational lensing involve a complex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a probabilistic forward modeling approach to gravitational lensing inference that has the potential to mitigate the biased inferences in most common point estimators and is practical for upcoming lensing surveys. The first part of our statistical framework requires specification of a likelihood function for the pixel data in an imaging survey given parameterized models for the galaxies in the images. We derive the lensing shear posterior by marginalizing over all intrinsic galaxymore » properties that contribute to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distribution specification. We use importance sampling to separate the modeling of small imaging areas from the global shear inference, thereby rendering our algorithm computationally tractable for large surveys. With simple numerical examples we demonstrate the improvements in accuracy from our importance sampling approach, as well as the significance of the conditional distribution specification for the intrinsic galaxy properties when the data are generated from an unknown number of distinct galaxy populations with different morphological characteristics.« less

  7. Inference of the sparse kinetic Ising model using the decimation method

    NASA Astrophysics Data System (ADS)

    Decelle, Aurélien; Zhang, Pan

    2015-05-01

    In this paper we study the inference of the kinetic Ising model on sparse graphs by the decimation method. The decimation method, which was first proposed in Decelle and Ricci-Tersenghi [Phys. Rev. Lett. 112, 070603 (2014), 10.1103/PhysRevLett.112.070603] for the static inverse Ising problem, tries to recover the topology of the inferred system by setting the weakest couplings to zero iteratively. During the decimation process the likelihood function is maximized over the remaining couplings. Unlike the ℓ1-optimization-based methods, the decimation method does not use the Laplace distribution as a heuristic choice of prior to select a sparse solution. In our case, the whole process can be done auto-matically without fixing any parameters by hand. We show that in the dynamical inference problem, where the task is to reconstruct the couplings of an Ising model given the data, the decimation process can be applied naturally into a maximum-likelihood optimization algorithm, as opposed to the static case where pseudolikelihood method needs to be adopted. We also use extensive numerical studies to validate the accuracy of our methods in dynamical inference problems. Our results illustrate that, on various topologies and with different distribution of couplings, the decimation method outperforms the widely used ℓ1-optimization-based methods.

  8. On the optimization of electromagnetic geophysical data: Application of the PSO algorithm

    NASA Astrophysics Data System (ADS)

    Godio, A.; Santilano, A.

    2018-01-01

    Particle Swarm optimization (PSO) algorithm resolves constrained multi-parameter problems and is suitable for simultaneous optimization of linear and nonlinear problems, with the assumption that forward modeling is based on good understanding of ill-posed problem for geophysical inversion. We apply PSO for solving the geophysical inverse problem to infer an Earth model, i.e. the electrical resistivity at depth, consistent with the observed geophysical data. The method doesn't require an initial model and can be easily constrained, according to external information for each single sounding. The optimization process to estimate the model parameters from the electromagnetic soundings focuses on the discussion of the objective function to be minimized. We discuss the possibility to introduce in the objective function vertical and lateral constraints, with an Occam-like regularization. A sensitivity analysis allowed us to check the performance of the algorithm. The reliability of the approach is tested on synthetic, real Audio-Magnetotelluric (AMT) and Long Period MT data. The method appears able to solve complex problems and allows us to estimate the a posteriori distribution of the model parameters.

  9. PANDA: Protein function prediction using domain architecture and affinity propagation.

    PubMed

    Wang, Zheng; Zhao, Chenguang; Wang, Yiheng; Sun, Zheng; Wang, Nan

    2018-02-22

    We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/ .

  10. Marginally specified priors for non-parametric Bayesian estimation

    PubMed Central

    Kessler, David C.; Hoff, Peter D.; Dunson, David B.

    2014-01-01

    Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813

  11. Assessment of Gamma-Ray-Spectra Analysis Method Utilizing the Fireworks Algorithm for Various Error Measures

    DOE PAGES

    Alamaniotis, Miltiadis; Tsoukalas, Lefteri H.

    2018-01-01

    The analysis of measured data plays a significant role in enhancing nuclear nonproliferation mainly by inferring the presence of patterns associated with special nuclear materials. Among various types of measurements, gamma-ray spectra is the widest utilized type of data in nonproliferation applications. In this paper, a method that employs the fireworks algorithm (FWA) for analyzing gamma-ray spectra aiming at detecting gamma signatures is presented. In particular, FWA is utilized to fit a set of known signatures to a measured spectrum by optimizing an objective function, where non-zero coefficients express the detected signatures. FWA is tested on a set of experimentallymore » obtained measurements optimizing various objective functions—MSE, RMSE, Theil-2, MAE, MAPE, MAP—with results exhibiting its potential in providing highly accurate and precise signature detection. Finally and furthermore, FWA is benchmarked against genetic algorithms and multiple linear regression, showing its superiority over those algorithms regarding precision with respect to MAE, MAPE, and MAP measures.« less

  12. Semi-Supervised Multi-View Learning for Gene Network Reconstruction

    PubMed Central

    Ceci, Michelangelo; Pio, Gianvito; Kuzmanovski, Vladimir; Džeroski, Sašo

    2015-01-01

    The task of gene regulatory network reconstruction from high-throughput data is receiving increasing attention in recent years. As a consequence, many inference methods for solving this task have been proposed in the literature. It has been recently observed, however, that no single inference method performs optimally across all datasets. It has also been shown that the integration of predictions from multiple inference methods is more robust and shows high performance across diverse datasets. Inspired by this research, in this paper, we propose a machine learning solution which learns to combine predictions from multiple inference methods. While this approach adds additional complexity to the inference process, we expect it would also carry substantial benefits. These would come from the automatic adaptation to patterns on the outputs of individual inference methods, so that it is possible to identify regulatory interactions more reliably when these patterns occur. This article demonstrates the benefits (in terms of accuracy of the reconstructed networks) of the proposed method, which exploits an iterative, semi-supervised ensemble-based algorithm. The algorithm learns to combine the interactions predicted by many different inference methods in the multi-view learning setting. The empirical evaluation of the proposed algorithm on a prokaryotic model organism (E. coli) and on a eukaryotic model organism (S. cerevisiae) clearly shows improved performance over the state of the art methods. The results indicate that gene regulatory network reconstruction for the real datasets is more difficult for S. cerevisiae than for E. coli. The software, all the datasets used in the experiments and all the results are available for download at the following link: http://figshare.com/articles/Semi_supervised_Multi_View_Learning_for_Gene_Network_Reconstruction/1604827. PMID:26641091

  13. Entropy-Based Search Algorithm for Experimental Design

    NASA Astrophysics Data System (ADS)

    Malakar, N. K.; Knuth, K. H.

    2011-03-01

    The scientific method relies on the iterated processes of inference and inquiry. The inference phase consists of selecting the most probable models based on the available data; whereas the inquiry phase consists of using what is known about the models to select the most relevant experiment. Optimizing inquiry involves searching the parameterized space of experiments to select the experiment that promises, on average, to be maximally informative. In the case where it is important to learn about each of the model parameters, the relevance of an experiment is quantified by Shannon entropy of the distribution of experimental outcomes predicted by a probable set of models. If the set of potential experiments is described by many parameters, we must search this high-dimensional entropy space. Brute force search methods will be slow and computationally expensive. We present an entropy-based search algorithm, called nested entropy sampling, to select the most informative experiment for efficient experimental design. This algorithm is inspired by Skilling's nested sampling algorithm used in inference and borrows the concept of a rising threshold while a set of experiment samples are maintained. We demonstrate that this algorithm not only selects highly relevant experiments, but also is more efficient than brute force search. Such entropic search techniques promise to greatly benefit autonomous experimental design.

  14. MULTINEST: an efficient and robust Bayesian inference tool for cosmology and particle physics

    NASA Astrophysics Data System (ADS)

    Feroz, F.; Hobson, M. P.; Bridges, M.

    2009-10-01

    We present further development and the first public release of our multimodal nested sampling algorithm, called MULTINEST. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson, which itself significantly outperformed existing Markov chain Monte Carlo techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MULTINEST algorithm are demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla Λ cold dark matter model to include spatial curvature and a varying equation of state for dark energy. The MULTINEST software, which is fully parallelized using MPI and includes an interface to COSMOMC, is available at http://www.mrao.cam.ac.uk/software/multinest/. It will also be released as part of the SUPERBAYES package, for the analysis of supersymmetric theories of particle physics, at http://www.superbayes.org.

  15. Modeling gene regulatory networks: A network simplification algorithm

    NASA Astrophysics Data System (ADS)

    Ferreira, Luiz Henrique O.; de Castro, Maria Clicia S.; da Silva, Fabricio A. B.

    2016-12-01

    Boolean networks have been used for some time to model Gene Regulatory Networks (GRNs), which describe cell functions. Those models can help biologists to make predictions, prognosis and even specialized treatment when some disturb on the GRN lead to a sick condition. However, the amount of information related to a GRN can be huge, making the task of inferring its boolean network representation quite a challenge. The method shown here takes into account information about the interactome to build a network, where each node represents a protein, and uses the entropy of each node as a key to reduce the size of the network, allowing the further inferring process to focus only on the main protein hubs, the ones with most potential to interfere in overall network behavior.

  16. Knowledge requirements for automated inference of medical textbook markup.

    PubMed Central

    Berrios, D. C.; Kehler, A.; Fagan, L. M.

    1999-01-01

    Indexing medical text in journals or textbooks requires a tremendous amount of resources. We tested two algorithms for automatically indexing nouns, noun-modifiers, and noun phrases, and inferring selected binary relations between UMLS concepts in a textbook of infectious disease. Sixty-six percent of nouns and noun-modifiers and 81% of noun phrases were correctly matched to UMLS concepts. Semantic relations were identified with 100% specificity and 94% sensitivity. For some medical sub-domains, these algorithms could permit expeditious generation of more complex indexing. PMID:10566445

  17. Spectral decompositions of multiple time series: a Bayesian non-parametric approach.

    PubMed

    Macaro, Christian; Prado, Raquel

    2014-01-01

    We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.

  18. Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.

    PubMed

    Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai

    2017-11-01

    For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.

  19. Generating Customized Verifiers for Automatically Generated Code

    NASA Technical Reports Server (NTRS)

    Denney, Ewen; Fischer, Bernd

    2008-01-01

    Program verification using Hoare-style techniques requires many logical annotations. We have previously developed a generic annotation inference algorithm that weaves in all annotations required to certify safety properties for automatically generated code. It uses patterns to capture generator- and property-specific code idioms and property-specific meta-program fragments to construct the annotations. The algorithm is customized by specifying the code patterns and integrating them with the meta-program fragments for annotation construction. However, this is difficult since it involves tedious and error-prone low-level term manipulations. Here, we describe an annotation schema compiler that largely automates this customization task using generative techniques. It takes a collection of high-level declarative annotation schemas tailored towards a specific code generator and safety property, and generates all customized analysis functions and glue code required for interfacing with the generic algorithm core, thus effectively creating a customized annotation inference algorithm. The compiler raises the level of abstraction and simplifies schema development and maintenance. It also takes care of some more routine aspects of formulating patterns and schemas, in particular handling of irrelevant program fragments and irrelevant variance in the program structure, which reduces the size, complexity, and number of different patterns and annotation schemas that are required. The improvements described here make it easier and faster to customize the system to a new safety property or a new generator, and we demonstrate this by customizing it to certify frame safety of space flight navigation code that was automatically generated from Simulink models by MathWorks' Real-Time Workshop.

  20. Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies.

    PubMed

    Bhaskar, Anand; Javanmard, Adel; Courtade, Thomas A; Tse, David

    2017-03-15

    Genetic variation in human populations is influenced by geographic ancestry due to spatial locality in historical mating and migration patterns. Spatial population structure in genetic datasets has been traditionally analyzed using either model-free algorithms, such as principal components analysis (PCA) and multidimensional scaling, or using explicit spatial probabilistic models of allele frequency evolution. We develop a general probabilistic model and an associated inference algorithm that unify the model-based and data-driven approaches to visualizing and inferring population structure. Our spatial inference algorithm can also be effectively applied to the problem of population stratification in genome-wide association studies (GWAS), where hidden population structure can create fictitious associations when population ancestry is correlated with both the genotype and the trait. Our algorithm Geographic Ancestry Positioning (GAP) relates local genetic distances between samples to their spatial distances, and can be used for visually discerning population structure as well as accurately inferring the spatial origin of individuals on a two-dimensional continuum. On both simulated and several real datasets from diverse human populations, GAP exhibits substantially lower error in reconstructing spatial ancestry coordinates compared to PCA. We also develop an association test that uses the ancestry coordinates inferred by GAP to accurately account for ancestry-induced correlations in GWAS. Based on simulations and analysis of a dataset of 10 metabolic traits measured in a Northern Finland cohort, which is known to exhibit significant population structure, we find that our method has superior power to current approaches. Our software is available at https://github.com/anand-bhaskar/gap . abhaskar@stanford.edu or ajavanma@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  1. Successful Reconstruction of a Physiological Circuit with Known Connectivity from Spiking Activity Alone

    PubMed Central

    Gerhard, Felipe; Kispersky, Tilman; Gutierrez, Gabrielle J.; Marder, Eve; Kramer, Mark; Eden, Uri

    2013-01-01

    Identifying the structure and dynamics of synaptic interactions between neurons is the first step to understanding neural network dynamics. The presence of synaptic connections is traditionally inferred through the use of targeted stimulation and paired recordings or by post-hoc histology. More recently, causal network inference algorithms have been proposed to deduce connectivity directly from electrophysiological signals, such as extracellularly recorded spiking activity. Usually, these algorithms have not been validated on a neurophysiological data set for which the actual circuitry is known. Recent work has shown that traditional network inference algorithms based on linear models typically fail to identify the correct coupling of a small central pattern generating circuit in the stomatogastric ganglion of the crab Cancer borealis. In this work, we show that point process models of observed spike trains can guide inference of relative connectivity estimates that match the known physiological connectivity of the central pattern generator up to a choice of threshold. We elucidate the necessary steps to derive faithful connectivity estimates from a model that incorporates the spike train nature of the data. We then apply the model to measure changes in the effective connectivity pattern in response to two pharmacological interventions, which affect both intrinsic neural dynamics and synaptic transmission. Our results provide the first successful application of a network inference algorithm to a circuit for which the actual physiological synapses between neurons are known. The point process methodology presented here generalizes well to larger networks and can describe the statistics of neural populations. In general we show that advanced statistical models allow for the characterization of effective network structure, deciphering underlying network dynamics and estimating information-processing capabilities. PMID:23874181

  2. Edge-directed inference for microaneurysms detection in digital fundus images

    NASA Astrophysics Data System (ADS)

    Huang, Ke; Yan, Michelle; Aviyente, Selin

    2007-03-01

    Microaneurysms (MAs) detection is a critical step in diabetic retinopathy screening, since MAs are the earliest visible warning of potential future problems. A variety of algorithms have been proposed for MAs detection in mass screening. Different methods have been proposed for MAs detection. The core technology for most of existing methods is based on a directional mathematical morphological operation called "Top-Hat" filter that requires multiple filtering operations at each pixel. Background structure, uneven illumination and noise often cause confusion between MAs and some non-MA structures and limits the applicability of the filter. In this paper, a novel detection framework based on edge directed inference is proposed for MAs detection. The candidate MA regions are first delineated from the edge map of a fundus image. Features measuring shape, brightness and contrast are extracted for each candidate MA region to better exclude false detection from true MAs. Algorithmic analysis and empirical evaluation reveal that the proposed edge directed inference outperforms the "Top-Hat" based algorithm in both detection accuracy and computational speed.

  3. An experimental phylogeny to benchmark ancestral sequence reconstruction

    PubMed Central

    Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.

    2016-01-01

    Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687

  4. Static Analysis Using Abstract Interpretation

    NASA Technical Reports Server (NTRS)

    Arthaud, Maxime

    2017-01-01

    Short presentation about static analysis and most particularly abstract interpretation. It starts with a brief explanation on why static analysis is used at NASA. Then, it describes the IKOS (Inference Kernel for Open Static Analyzers) tool chain. Results on NASA projects are shown. Several well known algorithms from the static analysis literature are then explained (such as pointer analyses, memory analyses, weak relational abstract domains, function summarization, etc.). It ends with interesting problems we encountered (such as C++ analysis with exception handling, or the detection of integer overflow).

  5. iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.

    PubMed

    Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades

    2014-01-01

    Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.

  6. The SAMI Galaxy Survey: the intrinsic shape of kinematically selected galaxies

    NASA Astrophysics Data System (ADS)

    Foster, C.; van de Sande, J.; D'Eugenio, F.; Cortese, L.; McDermid, R. M.; Bland-Hawthorn, J.; Brough, S.; Bryant, J.; Croom, S. M.; Goodwin, M.; Konstantopoulos, I. S.; Lawrence, J.; López-Sánchez, Á. R.; Medling, A. M.; Owers, M. S.; Richards, S. N.; Scott, N.; Taranu, D. S.; Tonini, C.; Zafar, T.

    2017-11-01

    Using the stellar kinematic maps and ancillary imaging data from the Sydney AAO Multi Integral field (SAMI) Galaxy Survey, the intrinsic shape of kinematically selected samples of galaxies is inferred. We implement an efficient and optimized algorithm to fit the intrinsic shape of galaxies using an established method to simultaneously invert the distributions of apparent ellipticities and kinematic misalignments. The algorithm output compares favourably with previous studies of the intrinsic shape of galaxies based on imaging alone and our re-analysis of the ATLAS3D data. Our results indicate that most galaxies are oblate axisymmetric. We show empirically that the intrinsic shape of galaxies varies as a function of their rotational support as measured by the 'spin' parameter proxy λ _{R_e}. In particular, low-spin systems have a higher occurrence of triaxiality, while high-spin systems are more intrinsically flattened and axisymmetric. The intrinsic shape of galaxies is linked to their formation and merger histories. Galaxies with high-spin values have intrinsic shapes consistent with dissipational minor mergers, while the intrinsic shape of low-spin systems is consistent with dissipationless multimerger assembly histories. This range in assembly histories inferred from intrinsic shapes is broadly consistent with expectations from cosmological simulations.

  7. Identifying Seizure Onset Zone From the Causal Connectivity Inferred Using Directed Information

    NASA Astrophysics Data System (ADS)

    Malladi, Rakesh; Kalamangalam, Giridhar; Tandon, Nitin; Aazhang, Behnaam

    2016-10-01

    In this paper, we developed a model-based and a data-driven estimator for directed information (DI) to infer the causal connectivity graph between electrocorticographic (ECoG) signals recorded from brain and to identify the seizure onset zone (SOZ) in epileptic patients. Directed information, an information theoretic quantity, is a general metric to infer causal connectivity between time-series and is not restricted to a particular class of models unlike the popular metrics based on Granger causality or transfer entropy. The proposed estimators are shown to be almost surely convergent. Causal connectivity between ECoG electrodes in five epileptic patients is inferred using the proposed DI estimators, after validating their performance on simulated data. We then proposed a model-based and a data-driven SOZ identification algorithm to identify SOZ from the causal connectivity inferred using model-based and data-driven DI estimators respectively. The data-driven SOZ identification outperforms the model-based SOZ identification algorithm when benchmarked against visual analysis by neurologist, the current clinical gold standard. The causal connectivity analysis presented here is the first step towards developing novel non-surgical treatments for epilepsy.

  8. Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model

    PubMed Central

    Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P.; Terrace, Herbert S.

    2015-01-01

    Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort’s success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models. PMID:26407227

  9. Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.

    PubMed

    Jensen, Greg; Muñoz, Fabian; Alkan, Yelda; Ferrera, Vincent P; Terrace, Herbert S

    2015-01-01

    Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.

  10. Inferring deep-brain activity from cortical activity using functional near-infrared spectroscopy

    PubMed Central

    Liu, Ning; Cui, Xu; Bryant, Daniel M.; Glover, Gary H.; Reiss, Allan L.

    2015-01-01

    Functional near-infrared spectroscopy (fNIRS) is an increasingly popular technology for studying brain function because it is non-invasive, non-irradiating and relatively inexpensive. Further, fNIRS potentially allows measurement of hemodynamic activity with high temporal resolution (milliseconds) and in naturalistic settings. However, in comparison with other imaging modalities, namely fMRI, fNIRS has a significant drawback: limited sensitivity to hemodynamic changes in deep-brain regions. To overcome this limitation, we developed a computational method to infer deep-brain activity using fNIRS measurements of cortical activity. Using simultaneous fNIRS and fMRI, we measured brain activity in 17 participants as they completed three cognitive tasks. A support vector regression (SVR) learning algorithm was used to predict activity in twelve deep-brain regions using information from surface fNIRS measurements. We compared these predictions against actual fMRI-measured activity using Pearson’s correlation to quantify prediction performance. To provide a benchmark for comparison, we also used fMRI measurements of cortical activity to infer deep-brain activity. When using fMRI-measured activity from the entire cortex, we were able to predict deep-brain activity in the fusiform cortex with an average correlation coefficient of 0.80 and in all deep-brain regions with an average correlation coefficient of 0.67. The top 15% of predictions using fNIRS signal achieved an accuracy of 0.7. To our knowledge, this study is the first to investigate the feasibility of using cortical activity to infer deep-brain activity. This new method has the potential to extend fNIRS applications in cognitive and clinical neuroscience research. PMID:25798327

  11. Sensor Fusion to Infer Locations of Standing and Reaching Within the Home in Incomplete Spinal Cord Injury.

    PubMed

    Lonini, Luca; Reissman, Timothy; Ochoa, Jose M; Mummidisetty, Chaithanya K; Kording, Konrad; Jayaraman, Arun

    2017-10-01

    The objective of rehabilitation after spinal cord injury is to enable successful function in everyday life and independence at home. Clinical tests can assess whether patients are able to execute functional movements but are limited in assessing such information at home. A prototype system is developed that detects stand-to-reach activities, a movement with important functional implications, at multiple locations within a mock kitchen. Ten individuals with incomplete spinal cord injuries performed a sequence of standing and reaching tasks. The system monitored their movements by combining two sources of information: a triaxial accelerometer, placed on the subject's thigh, detected sitting or standing, and a network of radio frequency tags, wirelessly connected to a wrist-worn device, detected reaching at three locations. A threshold-based algorithm detected execution of the combined tasks and accuracy was measured by the number of correctly identified events. The system was shown to have an average accuracy of 98% for inferring when individuals performed stand-to-reach activities at each tag location within the same room. The combination of accelerometry and tags yielded accurate assessments of functional stand-to-reach activities within a home environment. Optimization of this technology could simplify patient compliance and allow clinicians to assess functional home activities.

  12. Financial model calibration using consistency hints.

    PubMed

    Abu-Mostafa, Y S

    2001-01-01

    We introduce a technique for forcing the calibration of a financial model to produce valid parameters. The technique is based on learning from hints. It converts simple curve fitting into genuine calibration, where broad conclusions can be inferred from parameter values. The technique augments the error function of curve fitting with consistency hint error functions based on the Kullback-Leibler distance. We introduce an efficient EM-type optimization algorithm tailored to this technique. We also introduce other consistency hints, and balance their weights using canonical errors. We calibrate the correlated multifactor Vasicek model of interest rates, and apply it successfully to Japanese Yen swaps market and US dollar yield market.

  13. An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

    PubMed Central

    2016-01-01

    Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307621

  14. A fast least-squares algorithm for population inference

    PubMed Central

    2013-01-01

    Background Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual’s genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. Results We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster. Conclusions The computational advantage of the least-squares approach along with its good estimation performance warrants further research, especially for very large datasets. As problem sizes increase, the difference in estimation performance between all algorithms decreases. In addition, when prior information is known, the least-squares approach easily incorporates the expected degree of admixture to improve the estimate. PMID:23343408

  15. A fast least-squares algorithm for population inference.

    PubMed

    Parry, R Mitchell; Wang, May D

    2013-01-23

    Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual's genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster. The computational advantage of the least-squares approach along with its good estimation performance warrants further research, especially for very large datasets. As problem sizes increase, the difference in estimation performance between all algorithms decreases. In addition, when prior information is known, the least-squares approach easily incorporates the expected degree of admixture to improve the estimate.

  16. Object/rule integration in CLIPS. [C Language Integrated Production System

    NASA Technical Reports Server (NTRS)

    Donnell, Brian L.

    1993-01-01

    This paper gives a brief overview of the C Language Integrated Production System (CLIPS) with a focus on the object-oriented features. The advantages of an object data representation over the traditional working memory element (WME), i.e., facts, are discussed, and the implementation of the Rete inference algorithm in CLIPS is presented in detail. A few methods for achieving pattern-matching on objects with the current inference engine are given, and finally, the paper examines the modifications necessary to the Rete algorithm to allow direct object pattern-matching.

  17. Annotation of gene function in citrus using gene expression information and co-expression networks

    PubMed Central

    2014-01-01

    Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed. Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit. Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citrus. PMID:25023870

  18. RNA-TVcurve: a Web server for RNA secondary structure comparison based on a multi-scale similarity of its triple vector curve representation.

    PubMed

    Li, Ying; Shi, Xiaohu; Liang, Yanchun; Xie, Juan; Zhang, Yu; Ma, Qin

    2017-01-21

    RNAs have been found to carry diverse functionalities in nature. Inferring the similarity between two given RNAs is a fundamental step to understand and interpret their functional relationship. The majority of functional RNAs show conserved secondary structures, rather than sequence conservation. Those algorithms relying on sequence-based features usually have limitations in their prediction performance. Hence, integrating RNA structure features is very critical for RNA analysis. Existing algorithms mainly fall into two categories: alignment-based and alignment-free. The alignment-free algorithms of RNA comparison usually have lower time complexity than alignment-based algorithms. An alignment-free RNA comparison algorithm was proposed, in which novel numerical representations RNA-TVcurve (triple vector curve representation) of RNA sequence and corresponding secondary structure features are provided. Then a multi-scale similarity score of two given RNAs was designed based on wavelet decomposition of their numerical representation. In support of RNA mutation and phylogenetic analysis, a web server (RNA-TVcurve) was designed based on this alignment-free RNA comparison algorithm. It provides three functional modules: 1) visualization of numerical representation of RNA secondary structure; 2) detection of single-point mutation based on secondary structure; and 3) comparison of pairwise and multiple RNA secondary structures. The inputs of the web server require RNA primary sequences, while corresponding secondary structures are optional. For the primary sequences alone, the web server can compute the secondary structures using free energy minimization algorithm in terms of RNAfold tool from Vienna RNA package. RNA-TVcurve is the first integrated web server, based on an alignment-free method, to deliver a suite of RNA analysis functions, including visualization, mutation analysis and multiple RNAs structure comparison. The comparison results with two popular RNA comparison tools, RNApdist and RNAdistance, showcased that RNA-TVcurve can efficiently capture subtle relationships among RNAs for mutation detection and non-coding RNA classification. All the relevant results were shown in an intuitive graphical manner, and can be freely downloaded from this server. RNA-TVcurve, along with test examples and detailed documents, are available at: http://ml.jlu.edu.cn/tvcurve/ .

  19. Bringing the cross-correlation method up to date

    NASA Technical Reports Server (NTRS)

    Statler, Thomas

    1995-01-01

    The cross-correlation (XC) method of Tonry & Davis (1979, AJ, 84, 1511) is generalized to arbitrary parametrized line profiles. In the new algorithm the correlation function itself, rather than the observed galaxy spectrum, is fitted by the model line profile: this removes much of the complication in the error analysis caused by template mismatch. Like the Fourier correlation quotient (FCQ) method of Bender (1990, A&A, 229, 441), the inferred line profiles are, up to a normalization constant, independent of template mismatch as long as there are no blended lines. The standard reduced chi(exp 2) is a good measure of the fit of the inferred velocity distribution, largely decoupled from the fit of the spectral template. The updated XC method performs as well as other recently developed methods, with the added virtue of conceptual simplicity.

  20. Implementation of the Iterative Proportion Fitting Algorithm for Geostatistical Facies Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li Yupeng, E-mail: yupeng@ualberta.ca; Deutsch, Clayton V.

    2012-06-15

    In geostatistics, most stochastic algorithm for simulation of categorical variables such as facies or rock types require a conditional probability distribution. The multivariate probability distribution of all the grouped locations including the unsampled location permits calculation of the conditional probability directly based on its definition. In this article, the iterative proportion fitting (IPF) algorithm is implemented to infer this multivariate probability. Using the IPF algorithm, the multivariate probability is obtained by iterative modification to an initial estimated multivariate probability using lower order bivariate probabilities as constraints. The imposed bivariate marginal probabilities are inferred from profiles along drill holes or wells.more » In the IPF process, a sparse matrix is used to calculate the marginal probabilities from the multivariate probability, which makes the iterative fitting more tractable and practical. This algorithm can be extended to higher order marginal probability constraints as used in multiple point statistics. The theoretical framework is developed and illustrated with estimation and simulation example.« less

  1. User's guide to the Fault Inferring Nonlinear Detection System (FINDS) computer program

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Godiwala, P. M.; Satz, H. S.

    1988-01-01

    Described are the operation and internal structure of the computer program FINDS (Fault Inferring Nonlinear Detection System). The FINDS algorithm is designed to provide reliable estimates for aircraft position, velocity, attitude, and horizontal winds to be used for guidance and control laws in the presence of possible failures in the avionics sensors. The FINDS algorithm was developed with the use of a digital simulation of a commercial transport aircraft and tested with flight recorded data. The algorithm was then modified to meet the size constraints and real-time execution requirements on a flight computer. For the real-time operation, a multi-rate implementation of the FINDS algorithm has been partitioned to execute on a dual parallel processor configuration: one based on the translational dynamics and the other on the rotational kinematics. The report presents an overview of the FINDS algorithm, the implemented equations, the flow charts for the key subprograms, the input and output files, program variable indexing convention, subprogram descriptions, and the common block descriptions used in the program.

  2. Comparative analysis on the selection of number of clusters in community detection

    NASA Astrophysics Data System (ADS)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  3. Learning Extended Finite State Machines

    NASA Technical Reports Server (NTRS)

    Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

    2014-01-01

    We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.

  4. Estimation of tool wear length in finish milling using a fuzzy inference algorithm

    NASA Astrophysics Data System (ADS)

    Ko, Tae Jo; Cho, Dong Woo

    1993-10-01

    The geometric accuracy and surface roughness are mainly affected by the flank wear at the minor cutting edge in finish machining. A fuzzy estimator obtained by a fuzzy inference algorithm with a max-min composition rule to evaluate the minor flank wear length in finish milling is introduced. The features sensitive to minor flank wear are extracted from the dispersion analysis of a time series AR model of the feed directional acceleration of the spindle housing. Linguistic rules for fuzzy estimation are constructed using these features, and then fuzzy inferences are carried out with test data sets under various cutting conditions. The proposed system turns out to be effective for estimating minor flank wear length, and its mean error is less than 12%.

  5. Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters

    PubMed Central

    2010-01-01

    Background Irregularly shaped spatial clusters are difficult to delineate. A cluster found by an algorithm often spreads through large portions of the map, impacting its geographical meaning. Penalized likelihood methods for Kulldorff's spatial scan statistics have been used to control the excessive freedom of the shape of clusters. Penalty functions based on cluster geometry and non-connectivity have been proposed recently. Another approach involves the use of a multi-objective algorithm to maximize two objectives: the spatial scan statistics and the geometric penalty function. Results & Discussion We present a novel scan statistic algorithm employing a function based on the graph topology to penalize the presence of under-populated disconnection nodes in candidate clusters, the disconnection nodes cohesion function. A disconnection node is defined as a region within a cluster, such that its removal disconnects the cluster. By applying this function, the most geographically meaningful clusters are sifted through the immense set of possible irregularly shaped candidate cluster solutions. To evaluate the statistical significance of solutions for multi-objective scans, a statistical approach based on the concept of attainment function is used. In this paper we compared different penalized likelihoods employing the geometric and non-connectivity regularity functions and the novel disconnection nodes cohesion function. We also build multi-objective scans using those three functions and compare them with the previous penalized likelihood scans. An application is presented using comprehensive state-wide data for Chagas' disease in puerperal women in Minas Gerais state, Brazil. Conclusions We show that, compared to the other single-objective algorithms, multi-objective scans present better performance, regarding power, sensitivity and positive predicted value. The multi-objective non-connectivity scan is faster and better suited for the detection of moderately irregularly shaped clusters. The multi-objective cohesion scan is most effective for the detection of highly irregularly shaped clusters. PMID:21034451

  6. Parallel mutual information estimation for inferring gene regulatory networks on GPUs

    PubMed Central

    2011-01-01

    Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs. PMID:21672264

  7. Fast Inference with Min-Sum Matrix Product.

    PubMed

    Felzenszwalb, Pedro F; McAuley, Julian J

    2011-12-01

    The MAP inference problem in many graphical models can be solved efficiently using a fast algorithm for computing min-sum products of n × n matrices. The class of models in question includes cyclic and skip-chain models that arise in many applications. Although the worst-case complexity of the min-sum product operation is not known to be much better than O(n(3)), an O(n(2.5)) expected time algorithm was recently given, subject to some constraints on the input matrices. In this paper, we give an algorithm that runs in O(n(2) log n) expected time, assuming that the entries in the input matrices are independent samples from a uniform distribution. We also show that two variants of our algorithm are quite fast for inputs that arise in several applications. This leads to significant performance gains over previous methods in applications within computer vision and natural language processing.

  8. Probabilistic cosmological mass mapping from weak lensing shear

    DOE PAGES

    Schneider, M. D.; Ng, K. Y.; Dawson, W. A.; ...

    2017-04-10

    Here, we infer gravitational lensing shear and convergence fields from galaxy ellipticity catalogs under a spatial process prior for the lensing potential. We demonstrate the performance of our algorithm with simulated Gaussian-distributed cosmological lensing shear maps and a reconstruction of the mass distribution of the merging galaxy cluster Abell 781 using galaxy ellipticities measured with the Deep Lens Survey. Given interim posterior samples of lensing shear or convergence fields on the sky, we describe an algorithm to infer cosmological parameters via lens field marginalization. In the most general formulation of our algorithm we make no assumptions about weak shear ormore » Gaussian-distributed shape noise or shears. Because we require solutions and matrix determinants of a linear system of dimension that scales with the number of galaxies, we expect our algorithm to require parallel high-performance computing resources for application to ongoing wide field lensing surveys.« less

  9. Probabilistic Cosmological Mass Mapping from Weak Lensing Shear

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schneider, M. D.; Dawson, W. A.; Ng, K. Y.

    2017-04-10

    We infer gravitational lensing shear and convergence fields from galaxy ellipticity catalogs under a spatial process prior for the lensing potential. We demonstrate the performance of our algorithm with simulated Gaussian-distributed cosmological lensing shear maps and a reconstruction of the mass distribution of the merging galaxy cluster Abell 781 using galaxy ellipticities measured with the Deep Lens Survey. Given interim posterior samples of lensing shear or convergence fields on the sky, we describe an algorithm to infer cosmological parameters via lens field marginalization. In the most general formulation of our algorithm we make no assumptions about weak shear or Gaussian-distributedmore » shape noise or shears. Because we require solutions and matrix determinants of a linear system of dimension that scales with the number of galaxies, we expect our algorithm to require parallel high-performance computing resources for application to ongoing wide field lensing surveys.« less

  10. Using Stan for Item Response Theory Models

    ERIC Educational Resources Information Center

    Ames, Allison J.; Au, Chi Hang

    2018-01-01

    Stan is a flexible probabilistic programming language providing full Bayesian inference through Hamiltonian Monte Carlo algorithms. The benefits of Hamiltonian Monte Carlo include improved efficiency and faster inference, when compared to other MCMC software implementations. Users can interface with Stan through a variety of computing…

  11. Causal Inference and Explaining Away in a Spiking Network

    PubMed Central

    Moreno-Bote, Rubén; Drugowitsch, Jan

    2015-01-01

    While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification. PMID:26621426

  12. Causal Inference and Explaining Away in a Spiking Network.

    PubMed

    Moreno-Bote, Rubén; Drugowitsch, Jan

    2015-12-01

    While the brain uses spiking neurons for communication, theoretical research on brain computations has mostly focused on non-spiking networks. The nature of spike-based algorithms that achieve complex computations, such as object probabilistic inference, is largely unknown. Here we demonstrate that a family of high-dimensional quadratic optimization problems with non-negativity constraints can be solved exactly and efficiently by a network of spiking neurons. The network naturally imposes the non-negativity of causal contributions that is fundamental to causal inference, and uses simple operations, such as linear synapses with realistic time constants, and neural spike generation and reset non-linearities. The network infers the set of most likely causes from an observation using explaining away, which is dynamically implemented by spike-based, tuned inhibition. The algorithm performs remarkably well even when the network intrinsically generates variable spike trains, the timing of spikes is scrambled by external sources of noise, or the network is mistuned. This type of network might underlie tasks such as odor identification and classification.

  13. DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA.

    PubMed

    Bhaskar, Anand; Song, Yun S

    2014-01-01

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.

  14. DESCARTES’ RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA1

    PubMed Central

    Bhaskar, Anand; Song, Yun S.

    2016-01-01

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the “folded” SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes’ rule of signs for polynomials to the Laplace transform of piecewise continuous functions. PMID:28018011

  15. A Multifactorial, Criteria-based Progressive Algorithm for Hamstring Injury Treatment.

    PubMed

    Mendiguchia, Jurdan; Martinez-Ruiz, Enrique; Edouard, Pascal; Morin, Jean-Benoît; Martinez-Martinez, Francisco; Idoate, Fernando; Mendez-Villanueva, Alberto

    2017-07-01

    Given the prevalence of hamstring injuries in football, a rehabilitation program that effectively promotes muscle tissue repair and functional recovery is paramount to minimize reinjury risk and optimize player performance and availability. This study aimed to assess the concurrent effectiveness of administering an individualized and multifactorial criteria-based algorithm (rehabilitation algorithm [RA]) on hamstring injury rehabilitation in comparison with using a general rehabilitation protocol (RP). Implementing a double-blind randomized controlled trial approach, two equal groups of 24 football players (48 total) completed either an RA group or a validated RP group 5 d after an acute hamstring injury. Within 6 months after return to sport, six hamstring reinjuries occurred in RP versus one injury in RA (relative risk = 6, 90% confidence interval = 1-35; clinical inference: very likely beneficial effect). The average duration of return to sport was possibly quicker (effect size = 0.34 ± 0.42) in RP (23.2 ± 11.7 d) compared with RA (25.5 ± 7.8 d) (-13.8%, 90% confidence interval = -34.0% to 3.4%; clinical inference: possibly small effect). At the time to return to sport, RA players showed substantially better 10-m time, maximal sprinting speed, and greater mechanical variables related to speed (i.e., maximum theoretical speed and maximal horizontal power) than the RP. Although return to sport was slower, male football players who underwent an individualized, multifactorial, criteria-based algorithm with a performance- and primary risk factor-oriented training program from the early stages of the process markedly decreased the risk of reinjury compared with a general protocol where long-length strength training exercises were prioritized.

  16. Gene Regulatory Network Inferences Using a Maximum-Relevance and Maximum-Significance Strategy

    PubMed Central

    Liu, Wei; Zhu, Wen; Liao, Bo; Chen, Xiangtao

    2016-01-01

    Recovering gene regulatory networks from expression data is a challenging problem in systems biology that provides valuable information on the regulatory mechanisms of cells. A number of algorithms based on computational models are currently used to recover network topology. However, most of these algorithms have limitations. For example, many models tend to be complicated because of the “large p, small n” problem. In this paper, we propose a novel regulatory network inference method called the maximum-relevance and maximum-significance network (MRMSn) method, which converts the problem of recovering networks into a problem of how to select the regulator genes for each gene. To solve the latter problem, we present an algorithm that is based on information theory and selects the regulator genes for a specific gene by maximizing the relevance and significance. A first-order incremental search algorithm is used to search for regulator genes. Eventually, a strict constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus obtain the complete network structure. We performed our method on five different datasets and compared our method to five state-of-the-art methods for network inference based on information theory. The results confirm the effectiveness of our method. PMID:27829000

  17. How to estimate the 3D power spectrum of the Lyman-α forest

    NASA Astrophysics Data System (ADS)

    Font-Ribera, Andreu; McDonald, Patrick; Slosar, Anže

    2018-01-01

    We derive and numerically implement an algorithm for estimating the 3D power spectrum of the Lyman-α (Lyα) forest flux fluctuations. The algorithm exploits the unique geometry of Lyα forest data to efficiently measure the cross-spectrum between lines of sight as a function of parallel wavenumber, transverse separation and redshift. We start by approximating the global covariance matrix as block-diagonal, where only pixels from the same spectrum are correlated. We then compute the eigenvectors of the derivative of the signal covariance with respect to cross-spectrum parameters, and project the inverse-covariance-weighted spectra onto them. This acts much like a radial Fourier transform over redshift windows. The resulting cross-spectrum inference is then converted into our final product, an approximation of the likelihood for the 3D power spectrum expressed as second order Taylor expansion around a fiducial model. We demonstrate the accuracy and scalability of the algorithm and comment on possible extensions. Our algorithm will allow efficient analysis of the upcoming Dark Energy Spectroscopic Instrument dataset.

  18. How to estimate the 3D power spectrum of the Lyman-α forest

    DOE PAGES

    Font-Ribera, Andreu; McDonald, Patrick; Slosar, Anže

    2018-01-02

    Here, we derive and numerically implement an algorithm for estimating the 3D power spectrum of the Lyman-α (Lyα) forest flux fluctuations. The algorithm exploits the unique geometry of Lyα forest data to efficiently measure the cross-spectrum between lines of sight as a function of parallel wavenumber, transverse separation and redshift. We start by approximating the global covariance matrix as block-diagonal, where only pixels from the same spectrum are correlated. We then compute the eigenvectors of the derivative of the signal covariance with respect to cross-spectrum parameters, and project the inverse-covariance-weighted spectra onto them. This acts much like a radial Fouriermore » transform over redshift windows. The resulting cross-spectrum inference is then converted into our final product, an approximation of the likelihood for the 3D power spectrum expressed as second order Taylor expansion around a fiducial model. We demonstrate the accuracy and scalability of the algorithm and comment on possible extensions. Our algorithm will allow efficient analysis of the upcoming Dark Energy Spectroscopic Instrument dataset.« less

  19. A General Exponential Framework for Dimensionality Reduction.

    PubMed

    Wang, Su-Jing; Yan, Shuicheng; Yang, Jian; Zhou, Chun-Guang; Fu, Xiaolan

    2014-02-01

    As a general framework, Laplacian embedding, based on a pairwise similarity matrix, infers low dimensional representations from high dimensional data. However, it generally suffers from three issues: 1) algorithmic performance is sensitive to the size of neighbors; 2) the algorithm encounters the well known small sample size (SSS) problem; and 3) the algorithm de-emphasizes small distance pairs. To address these issues, here we propose exponential embedding using matrix exponential and provide a general framework for dimensionality reduction. In the framework, the matrix exponential can be roughly interpreted by the random walk over the feature similarity matrix, and thus is more robust. The positive definite property of matrix exponential deals with the SSS problem. The behavior of the decay function of exponential embedding is more significant in emphasizing small distance pairs. Under this framework, we apply matrix exponential to extend many popular Laplacian embedding algorithms, e.g., locality preserving projections, unsupervised discriminant projections, and marginal fisher analysis. Experiments conducted on the synthesized data, UCI, and the Georgia Tech face database show that the proposed new framework can well address the issues mentioned above.

  20. How to estimate the 3D power spectrum of the Lyman-α forest

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Font-Ribera, Andreu; McDonald, Patrick; Slosar, Anže

    Here, we derive and numerically implement an algorithm for estimating the 3D power spectrum of the Lyman-α (Lyα) forest flux fluctuations. The algorithm exploits the unique geometry of Lyα forest data to efficiently measure the cross-spectrum between lines of sight as a function of parallel wavenumber, transverse separation and redshift. We start by approximating the global covariance matrix as block-diagonal, where only pixels from the same spectrum are correlated. We then compute the eigenvectors of the derivative of the signal covariance with respect to cross-spectrum parameters, and project the inverse-covariance-weighted spectra onto them. This acts much like a radial Fouriermore » transform over redshift windows. The resulting cross-spectrum inference is then converted into our final product, an approximation of the likelihood for the 3D power spectrum expressed as second order Taylor expansion around a fiducial model. We demonstrate the accuracy and scalability of the algorithm and comment on possible extensions. Our algorithm will allow efficient analysis of the upcoming Dark Energy Spectroscopic Instrument dataset.« less

  1. Parsimonious Charge Deconvolution for Native Mass Spectrometry

    PubMed Central

    2018-01-01

    Charge deconvolution infers the mass from mass over charge (m/z) measurements in electrospray ionization mass spectra. When applied over a wide input m/z or broad target mass range, charge-deconvolution algorithms can produce artifacts, such as false masses at one-half or one-third of the correct mass. Indeed, a maximum entropy term in the objective function of MaxEnt, the most commonly used charge deconvolution algorithm, favors a deconvolved spectrum with many peaks over one with fewer peaks. Here we describe a new “parsimonious” charge deconvolution algorithm that produces fewer artifacts. The algorithm is especially well-suited to high-resolution native mass spectrometry of intact glycoproteins and protein complexes. Deconvolution of native mass spectra poses special challenges due to salt and small molecule adducts, multimers, wide mass ranges, and fewer and lower charge states. We demonstrate the performance of the new deconvolution algorithm on a range of samples. On the heavily glycosylated plasma properdin glycoprotein, the new algorithm could deconvolve monomer and dimer simultaneously and, when focused on the m/z range of the monomer, gave accurate and interpretable masses for glycoforms that had previously been analyzed manually using m/z peaks rather than deconvolved masses. On therapeutic antibodies, the new algorithm facilitated the analysis of extensions, truncations, and Fab glycosylation. The algorithm facilitates the use of native mass spectrometry for the qualitative and quantitative analysis of protein and protein assemblies. PMID:29376659

  2. Bayesian approach for peak detection in two-dimensional chromatography.

    PubMed

    Vivó-Truyols, Gabriel

    2012-03-20

    A new method for peak detection in two-dimensional chromatography is presented. In a first step, the method starts with a conventional one-dimensional peak detection algorithm to detect modulated peaks. In a second step, a sophisticated algorithm is constructed to decide which of the individual one-dimensional peaks have been originated from the same compound and should then be arranged in a two-dimensional peak. The merging algorithm is based on Bayesian inference. The user sets prior information about certain parameters (e.g., second-dimension retention time variability, first-dimension band broadening, chromatographic noise). On the basis of these priors, the algorithm calculates the probability of myriads of peak arrangements (i.e., ways of merging one-dimensional peaks), finding which of them holds the highest value. Uncertainty in each parameter can be accounted by adapting conveniently its probability distribution function, which in turn may change the final decision of the most probable peak arrangement. It has been demonstrated that the Bayesian approach presented in this paper follows the chromatographers' intuition. The algorithm has been applied and tested with LC × LC and GC × GC data and takes around 1 min to process chromatograms with several thousands of peaks.

  3. An integrative approach to inferring biologically meaningful gene modules.

    PubMed

    Cho, Ji-Hoon; Wang, Kai; Galas, David J

    2011-07-26

    The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

  4. Network Inference via the Time-Varying Graphical Lasso

    PubMed Central

    Hallac, David; Park, Youngsuk; Boyd, Stephen; Leskovec, Jure

    2018-01-01

    Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics of such data, it is essential to understand the relationships between the different entities and how these relationships evolve over time. In this paper, we introduce the time-varying graphical lasso (TVGL), a method of inferring time-varying networks from raw time series data. We cast the problem in terms of estimating a sparse time-varying inverse covariance matrix, which reveals a dynamic network of interdependencies between the entities. Since dynamic network inference is a computationally expensive task, we derive a scalable message-passing algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in an efficient way. We also discuss several extensions, including a streaming algorithm to update the model and incorporate new observations in real time. Finally, we evaluate our TVGL algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming state-of-the-art baselines in terms of both accuracy and scalability. PMID:29770256

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Na, Man Gyun; Oh, Seungrohk

    A neuro-fuzzy inference system combined with the wavelet denoising, principal component analysis (PCA), and sequential probability ratio test (SPRT) methods has been developed to monitor the relevant sensor using the information of other sensors. The parameters of the neuro-fuzzy inference system that estimates the relevant sensor signal are optimized by a genetic algorithm and a least-squares algorithm. The wavelet denoising technique was applied to remove noise components in input signals into the neuro-fuzzy system. By reducing the dimension of an input space into the neuro-fuzzy system without losing a significant amount of information, the PCA was used to reduce themore » time necessary to train the neuro-fuzzy system, simplify the structure of the neuro-fuzzy inference system, and also, make easy the selection of the input signals into the neuro-fuzzy system. By using the residual signals between the estimated signals and the measured signals, the SPRT is applied to detect whether the sensors are degraded or not. The proposed sensor-monitoring algorithm was verified through applications to the pressurizer water level, the pressurizer pressure, and the hot-leg temperature sensors in pressurized water reactors.« less

  6. Fast algorithms for computing phylogenetic divergence time.

    PubMed

    Crosby, Ralph W; Williams, Tiffani L

    2017-12-06

    The inference of species divergence time is a key step in most phylogenetic studies. Methods have been available for the last ten years to perform the inference, but the performance of the methods does not yet scale well to studies with hundreds of taxa and thousands of DNA base pairs. For example a study of 349 primate taxa was estimated to require over 9 months of processing time. In this work, we present a new algorithm, AncestralAge, that significantly improves the performance of the divergence time process. As part of AncestralAge, we demonstrate a new method for the computation of phylogenetic likelihood and our experiments show a 90% improvement in likelihood computation time on the aforementioned dataset of 349 primates taxa with over 60,000 DNA base pairs. Additionally, we show that our new method for the computation of the Bayesian prior on node ages reduces the running time for this computation on the 349 taxa dataset by 99%. Through the use of these new algorithms we open up the ability to perform divergence time inference on large phylogenetic studies.

  7. Joint inversion of teleseismic receiver functions and magnetotelluric data using a genetic algorithm: Are seismic velocities and electrical conductivities compatible?

    NASA Astrophysics Data System (ADS)

    Moorkamp, M.; Jones, A. G.; Eaton, D. W.

    2007-08-01

    Joint inversion of different kinds of geophysical data has the potential to improve model resolution, under the assumption that the different observations are sensitive to the same subsurface features. Here, we examine the compatibility of P-wave teleseismic receiver functions and long-period magnetotelluric (MT) observations, using joint inversion, to infer one-dimensional lithospheric structure. We apply a genetic algorithm to invert teleseismic and MT data from the Slave craton; a region where previous independent analyses of these data have indicated correlated layering of the lithosphere. Examination of model resolution and parameter trade-off suggests that the main features of this area, the Moho, Central Slave Mantle Conductor and the Lithosphere-Asthenosphere boundary, are sensed to varying degrees by both methods. Thus, joint inversion of these two complementary data sets can be used to construct improved models of the lithosphere. Further studies will be needed to assess whether the approach can be applied globally.

  8. Model-based Bayesian inference for ROC data analysis

    NASA Astrophysics Data System (ADS)

    Lei, Tianhu; Bae, K. Ty

    2013-03-01

    This paper presents a study of model-based Bayesian inference to Receiver Operating Characteristics (ROC) data. The model is a simple version of general non-linear regression model. Different from Dorfman model, it uses a probit link function with a covariate variable having zero-one two values to express binormal distributions in a single formula. Model also includes a scale parameter. Bayesian inference is implemented by Markov Chain Monte Carlo (MCMC) method carried out by Bayesian analysis Using Gibbs Sampling (BUGS). Contrast to the classical statistical theory, Bayesian approach considers model parameters as random variables characterized by prior distributions. With substantial amount of simulated samples generated by sampling algorithm, posterior distributions of parameters as well as parameters themselves can be accurately estimated. MCMC-based BUGS adopts Adaptive Rejection Sampling (ARS) protocol which requires the probability density function (pdf) which samples are drawing from be log concave with respect to the targeted parameters. Our study corrects a common misconception and proves that pdf of this regression model is log concave with respect to its scale parameter. Therefore, ARS's requirement is satisfied and a Gaussian prior which is conjugate and possesses many analytic and computational advantages is assigned to the scale parameter. A cohort of 20 simulated data sets and 20 simulations from each data set are used in our study. Output analysis and convergence diagnostics for MCMC method are assessed by CODA package. Models and methods by using continuous Gaussian prior and discrete categorical prior are compared. Intensive simulations and performance measures are given to illustrate our practice in the framework of model-based Bayesian inference using MCMC method.

  9. Permutation inference for the general linear model

    PubMed Central

    Winkler, Anderson M.; Ridgway, Gerard R.; Webster, Matthew A.; Smith, Stephen M.; Nichols, Thomas E.

    2014-01-01

    Permutation methods can provide exact control of false positives and allow the use of non-standard statistics, making only weak assumptions about the data. With the availability of fast and inexpensive computing, their main limitation would be some lack of flexibility to work with arbitrary experimental designs. In this paper we report on results on approximate permutation methods that are more flexible with respect to the experimental design and nuisance variables, and conduct detailed simulations to identify the best method for settings that are typical for imaging research scenarios. We present a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and show that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios. We also demonstrate how the inference on glm parameters, originally intended for independent data, can be used in certain special but useful cases in which independence is violated. Detailed examples of common neuroimaging applications are provided, as well as a complete algorithm – the “randomise” algorithm – for permutation inference with the glm. PMID:24530839

  10. Bayesian microsaccade detection

    PubMed Central

    Mihali, Andra; van Opheusden, Bas; Ma, Wei Ji

    2017-01-01

    Microsaccades are high-velocity fixational eye movements, with special roles in perception and cognition. The default microsaccade detection method is to determine when the smoothed eye velocity exceeds a threshold. We have developed a new method, Bayesian microsaccade detection (BMD), which performs inference based on a simple statistical model of eye positions. In this model, a hidden state variable changes between drift and microsaccade states at random times. The eye position is a biased random walk with different velocity distributions for each state. BMD generates samples from the posterior probability distribution over the eye state time series given the eye position time series. Applied to simulated data, BMD recovers the “true” microsaccades with fewer errors than alternative algorithms, especially at high noise. Applied to EyeLink eye tracker data, BMD detects almost all the microsaccades detected by the default method, but also apparent microsaccades embedded in high noise—although these can also be interpreted as false positives. Next we apply the algorithms to data collected with a Dual Purkinje Image eye tracker, whose higher precision justifies defining the inferred microsaccades as ground truth. When we add artificial measurement noise, the inferences of all algorithms degrade; however, at noise levels comparable to EyeLink data, BMD recovers the “true” microsaccades with 54% fewer errors than the default algorithm. Though unsuitable for online detection, BMD has other advantages: It returns probabilities rather than binary judgments, and it can be straightforwardly adapted as the generative model is refined. We make our algorithm available as a software package. PMID:28114483

  11. CaSPIAN: A Causal Compressive Sensing Algorithm for Discovering Directed Interactions in Gene Networks

    PubMed Central

    Emad, Amin; Milenkovic, Olgica

    2014-01-01

    We introduce a novel algorithm for inference of causal gene interactions, termed CaSPIAN (Causal Subspace Pursuit for Inference and Analysis of Networks), which is based on coupling compressive sensing and Granger causality techniques. The core of the approach is to discover sparse linear dependencies between shifted time series of gene expressions using a sequential list-version of the subspace pursuit reconstruction algorithm and to estimate the direction of gene interactions via Granger-type elimination. The method is conceptually simple and computationally efficient, and it allows for dealing with noisy measurements. Its performance as a stand-alone platform without biological side-information was tested on simulated networks, on the synthetic IRMA network in Saccharomyces cerevisiae, and on data pertaining to the human HeLa cell network and the SOS network in E. coli. The results produced by CaSPIAN are compared to the results of several related algorithms, demonstrating significant improvements in inference accuracy of documented interactions. These findings highlight the importance of Granger causality techniques for reducing the number of false-positives, as well as the influence of noise and sampling period on the accuracy of the estimates. In addition, the performance of the method was tested in conjunction with biological side information of the form of sparse “scaffold networks”, to which new edges were added using available RNA-seq or microarray data. These biological priors aid in increasing the sensitivity and precision of the algorithm in the small sample regime. PMID:24622336

  12. Neuronal couplings between retinal ganglion cells inferred by efficient inverse statistical physics methods

    PubMed Central

    Cocco, Simona; Leibler, Stanislas; Monasson, Rémi

    2009-01-01

    Complexity of neural systems often makes impracticable explicit measurements of all interactions between their constituents. Inverse statistical physics approaches, which infer effective couplings between neurons from their spiking activity, have been so far hindered by their computational complexity. Here, we present 2 complementary, computationally efficient inverse algorithms based on the Ising and “leaky integrate-and-fire” models. We apply those algorithms to reanalyze multielectrode recordings in the salamander retina in darkness and under random visual stimulus. We find strong positive couplings between nearby ganglion cells common to both stimuli, whereas long-range couplings appear under random stimulus only. The uncertainty on the inferred couplings due to limitations in the recordings (duration, small area covered on the retina) is discussed. Our methods will allow real-time evaluation of couplings for large assemblies of neurons. PMID:19666487

  13. Mass Conservation and Inference of Metabolic Networks from High-Throughput Mass Spectrometry Data

    PubMed Central

    Bandaru, Pradeep; Bansal, Mukesh

    2011-01-01

    Abstract We present a step towards the metabolome-wide computational inference of cellular metabolic reaction networks from metabolic profiling data, such as mass spectrometry. The reconstruction is based on identification of irreducible statistical interactions among the metabolite activities using the ARACNE reverse-engineering algorithm and on constraining possible metabolic transformations to satisfy the conservation of mass. The resulting algorithms are validated on synthetic data from an abridged computational model of Escherichia coli metabolism. Precision rates upwards of 50% are routinely observed for identification of full metabolic reactions, and recalls upwards of 20% are also seen. PMID:21314454

  14. Carbon monoxide mixing ratio inference from gas filter radiometer data

    NASA Technical Reports Server (NTRS)

    Wallio, H. A.; Reichle, H. G., Jr.; Casas, J. C.; Saylor, M. S.; Gormsen, B. B.

    1983-01-01

    A new algorithm has been developed which permits, for the first time, real time data reduction of nadir measurements taken with a gas filter correlation radiometer to determine tropospheric carbon monoxide concentrations. The algorithm significantly reduces the complexity of the equations to be solved while providing accuracy comparable to line-by-line calculations. The method is based on a regression analysis technique using a truncated power series representation of the primary instrument output signals to infer directly a weighted average of trace gas concentration. The results produced by a microcomputer-based implementation of this technique are compared with those produced by the more rigorous line-by-line methods. This algorithm has been used in the reduction of Measurement of Air Pollution from Satellites, Shuttle, and aircraft data.

  15. A 3D Cloud-Construction Algorithm for the EarthCARE Satellite Mission

    NASA Technical Reports Server (NTRS)

    Barker, H. W.; Jerg, M. P.; Wehr, T.; Kato, S.; Donovan, D. P.; Hogan, R. J.

    2011-01-01

    This article presents and assesses an algorithm that constructs 3D distributions of cloud from passive satellite imagery and collocated 2D nadir profiles of cloud properties inferred synergistically from lidar, cloud radar and imager data.

  16. Improved personalized recommendation based on a similarity network

    NASA Astrophysics Data System (ADS)

    Wang, Ximeng; Liu, Yun; Xiong, Fei

    2016-08-01

    A recommender system helps individual users find the preferred items rapidly and has attracted extensive attention in recent years. Many successful recommendation algorithms are designed on bipartite networks, such as network-based inference or heat conduction. However, most of these algorithms define the resource-allocation methods for an average allocation. That is not reasonable because average allocation cannot indicate the user choice preference and the influence between users which leads to a series of non-personalized recommendation results. We propose a personalized recommendation approach that combines the similarity function and bipartite network to generate a similarity network that improves the resource-allocation process. Our model introduces user influence into the recommender system and states that the user influence can make the resource-allocation process more reasonable. We use four different metrics to evaluate our algorithms for three benchmark data sets. Experimental results show that the improved recommendation on a similarity network can obtain better accuracy and diversity than some competing approaches.

  17. Construction of regulatory networks using expression time-series data of a genotyped population.

    PubMed

    Yeung, Ka Yee; Dombek, Kenneth M; Lo, Kenneth; Mittler, John E; Zhu, Jun; Schadt, Eric E; Bumgarner, Roger E; Raftery, Adrian E

    2011-11-29

    The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene-gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.

  18. Visual recognition and inference using dynamic overcomplete sparse learning.

    PubMed

    Murray, Joseph F; Kreutz-Delgado, Kenneth

    2007-09-01

    We present a hierarchical architecture and learning algorithm for visual recognition and other visual inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Using properties of biological vision for guidance, we posit a stochastic generative world model and from it develop a simplified world model (SWM) based on a tractable variational approximation that is designed to enforce sparse coding. Recent developments in computational methods for learning overcomplete representations (Lewicki & Sejnowski, 2000; Teh, Welling, Osindero, & Hinton, 2003) suggest that overcompleteness can be useful for visual tasks, and we use an overcomplete dictionary learning algorithm (Kreutz-Delgado, et al., 2003) as a preprocessing stage to produce accurate, sparse codings of images. Inference is performed by constructing a dynamic multilayer network with feedforward, feedback, and lateral connections, which is trained to approximate the SWM. Learning is done with a variant of the back-propagation-through-time algorithm, which encourages convergence to desired states within a fixed number of iterations. Vision tasks require large networks, and to make learning efficient, we take advantage of the sparsity of each layer to update only a small subset of elements in a large weight matrix at each iteration. Experiments on a set of rotated objects demonstrate various types of visual inference and show that increasing the degree of overcompleteness improves recognition performance in difficult scenes with occluded objects in clutter.

  19. MultiSETTER: web server for multiple RNA structure comparison.

    PubMed

    Čech, Petr; Hoksza, David; Svozil, Daniel

    2015-08-12

    Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their tertiary and quaternary structures. While structural superposition of short RNAs is achievable in a reasonable time, large structures represent much bigger challenge. Therefore, we have developed a fast and accurate algorithm for RNA pairwise structure superposition called SETTER and implemented it in the SETTER web server. However, though biological relationships can be inferred by a pairwise structure alignment, key features preserved by evolution can be identified only from a multiple structure alignment. Thus, we extended the SETTER algorithm to the alignment of multiple RNA structures and developed the MultiSETTER algorithm. In this paper, we present the updated version of the SETTER web server that implements a user friendly interface to the MultiSETTER algorithm. The server accepts RNA structures either as the list of PDB IDs or as user-defined PDB files. After the superposition is computed, structures are visualized in 3D and several reports and statistics are generated. To the best of our knowledge, the MultiSETTER web server is the first publicly available tool for a multiple RNA structure alignment. The MultiSETTER server offers the visual inspection of an alignment in 3D space which may reveal structural and functional relationships not captured by other multiple alignment methods based either on a sequence or on secondary structure motifs.

  20. Gene network biological validity based on gene-gene interaction relevance.

    PubMed

    Gómez-Vela, Francisco; Díaz-Díaz, Norberto

    2014-01-01

    In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown.

  1. Discovering time-lagged rules from microarray data using gene profile classifiers

    PubMed Central

    2011-01-01

    Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. PMID:21524308

  2. Single nucleotide polymorphism coverage and inference of N-acetyltransferase-2 acetylator phenotypes in wordwide population groups.

    PubMed

    Suarez-Kurtz, Guilherme; Fuchshuber-Moraes, Mateus; Struchiner, Claudio J; Parra, Esteban J

    2016-08-01

    Several algorithms have been proposed to reduce the genotyping effort and cost, while retaining the accuracy of N-acetyltransferase-2 (NAT2) phenotype prediction. Data from the 1000 Genomes (1KG) project and an admixed cohort of Black Brazilians were used to assess the accuracy of NAT2 phenotype prediction using algorithms based on paired single nucleotide polymorphisms (SNPs) (rs1041983 and rs1801280) or a tag SNP (rs1495741). NAT2 haplotypes comprising SNPs rs1801279, rs1041983, rs1801280, rs1799929, rs1799930, rs1208 and rs1799931 were assigned according to the arylamine N-acetyltransferases database. Contingency tables were used to visualize the agreement between the NAT2 acetylator phenotypes on the basis of these haplotypes versus phenotypes inferred by the prediction algorithms. The paired and tag SNP algorithms provided more than 96% agreement with the 7-SNP derived phenotypes in Europeans, East Asians, South Asians and Admixed Americans, but discordance of phenotype prediction occurred in 30.2 and 24.8% 1KG Africans and in 14.4 and 18.6% Black Brazilians, respectively. Paired SNP panel misclassification occurs in carriers of NATs haplotypes *13A (282T alone), *12B (282T and 803G), *6B (590A alone) and *14A (191A alone), whereas haplotype *14, defined by the 191A allele, is the major culprit of misclassification by the tag allele. Both the paired SNP and the tag SNP algorithms may be used, with economy of scale, to infer NAT2 acetylator phenotypes, including the ultra-slow phenotype, in European, East Asian, South Asian and American populations represented in the 1KG cohort. Both algorithms, however, perform poorly in populations of predominant African descent, including admixed African-Americans, African Caribbeans and Black Brazilians.

  3. Denoising, deconvolving, and decomposing photon observations. Derivation of the D3PO algorithm

    NASA Astrophysics Data System (ADS)

    Selig, Marco; Enßlin, Torsten A.

    2015-02-01

    The analysis of astronomical images is a non-trivial task. The D3PO algorithm addresses the inference problem of denoising, deconvolving, and decomposing photon observations. Its primary goal is the simultaneous but individual reconstruction of the diffuse and point-like photon flux given a single photon count image, where the fluxes are superimposed. In order to discriminate between these morphologically different signal components, a probabilistic algorithm is derived in the language of information field theory based on a hierarchical Bayesian parameter model. The signal inference exploits prior information on the spatial correlation structure of the diffuse component and the brightness distribution of the spatially uncorrelated point-like sources. A maximum a posteriori solution and a solution minimizing the Gibbs free energy of the inference problem using variational Bayesian methods are discussed. Since the derivation of the solution is not dependent on the underlying position space, the implementation of the D3PO algorithm uses the nifty package to ensure applicability to various spatial grids and at any resolution. The fidelity of the algorithm is validated by the analysis of simulated data, including a realistic high energy photon count image showing a 32 × 32 arcmin2 observation with a spatial resolution of 0.1 arcmin. In all tests the D3PO algorithm successfully denoised, deconvolved, and decomposed the data into a diffuse and a point-like signal estimate for the respective photon flux components. A copy of the code is available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/574/A74

  4. Graph rigidity, cyclic belief propagation, and point pattern matching.

    PubMed

    McAuley, Julian J; Caetano, Tibério S; Barbosa, Marconi S

    2008-11-01

    A recent paper [1] proposed a provably optimal polynomial time method for performing near-isometric point pattern matching by means of exact probabilistic inference in a chordal graphical model. Its fundamental result is that the chordal graph in question is shown to be globally rigid, implying that exact inference provides the same matching solution as exact inference in a complete graphical model. This implies that the algorithm is optimal when there is no noise in the point patterns. In this paper, we present a new graph that is also globally rigid but has an advantage over the graph proposed in [1]: Its maximal clique size is smaller, rendering inference significantly more efficient. However, this graph is not chordal, and thus, standard Junction Tree algorithms cannot be directly applied. Nevertheless, we show that loopy belief propagation in such a graph converges to the optimal solution. This allows us to retain the optimality guarantee in the noiseless case, while substantially reducing both memory requirements and processing time. Our experimental results show that the accuracy of the proposed solution is indistinguishable from that in [1] when there is noise in the point patterns.

  5. Shear wave prediction using committee fuzzy model constrained by lithofacies, Zagros basin, SW Iran

    NASA Astrophysics Data System (ADS)

    Shiroodi, Sadjad Kazem; Ghafoori, Mohammad; Ansari, Hamid Reza; Lashkaripour, Golamreza; Ghanadian, Mostafa

    2017-02-01

    The main purpose of this study is to introduce the geological controlling factors in improving an intelligence-based model to estimate shear wave velocity from seismic attributes. The proposed method includes three main steps in the framework of geological events in a complex sedimentary succession located in the Persian Gulf. First, the best attributes were selected from extracted seismic data. Second, these attributes were transformed into shear wave velocity using fuzzy inference systems (FIS) such as Sugeno's fuzzy inference (SFIS), adaptive neuro-fuzzy inference (ANFIS) and optimized fuzzy inference (OFIS). Finally, a committee fuzzy machine (CFM) based on bat-inspired algorithm (BA) optimization was applied to combine previous predictions into an enhanced solution. In order to show the geological effect on improving the prediction, the main classes of predominate lithofacies in the reservoir of interest including shale, sand, and carbonate were selected and then the proposed algorithm was performed with and without lithofacies constraint. The results showed a good agreement between real and predicted shear wave velocity in the lithofacies-based model compared to the model without lithofacies especially in sand and carbonate.

  6. Evaluation of a new neutron energy spectrum unfolding code based on an Adaptive Neuro-Fuzzy Inference System (ANFIS).

    PubMed

    Hosseini, Seyed Abolfazl; Esmaili Paeen Afrakoti, Iman

    2018-01-17

    The purpose of the present study was to reconstruct the energy spectrum of a poly-energetic neutron source using an algorithm developed based on an Adaptive Neuro-Fuzzy Inference System (ANFIS). ANFIS is a kind of artificial neural network based on the Takagi-Sugeno fuzzy inference system. The ANFIS algorithm uses the advantages of both fuzzy inference systems and artificial neural networks to improve the effectiveness of algorithms in various applications such as modeling, control and classification. The neutron pulse height distributions used as input data in the training procedure for the ANFIS algorithm were obtained from the simulations performed by MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). Taking into account the normalization condition of each energy spectrum, 4300 neutron energy spectra were generated randomly. (The value in each bin was generated randomly, and finally a normalization of each generated energy spectrum was performed). The randomly generated neutron energy spectra were considered as output data of the developed ANFIS computational code in the training step. To calculate the neutron energy spectrum using conventional methods, an inverse problem with an approximately singular response matrix (with the determinant of the matrix close to zero) should be solved. The solution of the inverse problem using the conventional methods unfold neutron energy spectrum with low accuracy. Application of the iterative algorithms in the solution of such a problem, or utilizing the intelligent algorithms (in which there is no need to solve the problem), is usually preferred for unfolding of the energy spectrum. Therefore, the main reason for development of intelligent algorithms like ANFIS for unfolding of neutron energy spectra is to avoid solving the inverse problem. In the present study, the unfolded neutron energy spectra of 252Cf and 241Am-9Be neutron sources using the developed computational code were found to have excellent agreement with the reference data. Also, the unfolded energy spectra of the neutron sources as obtained using ANFIS were more accurate than the results reported from calculations performed using artificial neural networks in previously published papers. © The Author(s) 2018. Published by Oxford University Press on behalf of The Japan Radiation Research Society and Japanese Society for Radiation Oncology.

  7. Inference of boundaries in causal sets

    NASA Astrophysics Data System (ADS)

    Cunningham, William J.

    2018-05-01

    We investigate the extrinsic geometry of causal sets in (1+1) -dimensional Minkowski spacetime. The properties of boundaries in an embedding space can be used not only to measure observables, but also to supplement the discrete action in the partition function via discretized Gibbons–Hawking–York boundary terms. We define several ways to represent a causal set using overlapping subsets, which then allows us to distinguish between null and non-null bounding hypersurfaces in an embedding space. We discuss algorithms to differentiate between different types of regions, consider when these distinctions are possible, and then apply the algorithms to several spacetime regions. Numerical results indicate the volumes of timelike boundaries can be measured to within 0.5% accuracy for flat boundaries and within 10% accuracy for highly curved boundaries for medium-sized causal sets with N  =  214 spacetime elements.

  8. Exact solutions for species tree inference from discordant gene trees.

    PubMed

    Chang, Wen-Chieh; Górecki, Paweł; Eulenstein, Oliver

    2013-10-01

    Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.

  9. Localization of the lumbar discs using machine learning and exact probabilistic inference.

    PubMed

    Oktay, Ayse Betul; Akgul, Yusuf Sinan

    2011-01-01

    We propose a novel fully automatic approach to localize the lumbar intervertebral discs in MR images with PHOG based SVM and a probabilistic graphical model. At the local level, our method assigns a score to each pixel in target image that indicates whether it is a disc center or not. At the global level, we define a chain-like graphical model that represents the lumbar intervertebral discs and we use an exact inference algorithm to localize the discs. Our main contributions are the employment of the SVM with the PHOG based descriptor which is robust against variations of the discs and a graphical model that reflects the linear nature of the vertebral column. Our inference algorithm runs in polynomial time and produces globally optimal results. The developed system is validated on a real spine MRI dataset and the final localization results are favorable compared to the results reported in the literature.

  10. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies

    PubMed Central

    Zhang, Shujun

    2018-01-01

    Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study. PMID:29377896

  11. Gaussian process based independent analysis for temporal source separation in fMRI.

    PubMed

    Hald, Ditte Høvenhoff; Henao, Ricardo; Winther, Ole

    2017-05-15

    Functional Magnetic Resonance Imaging (fMRI) gives us a unique insight into the processes of the brain, and opens up for analyzing the functional activation patterns of the underlying sources. Task-inferred supervised learning with restrictive assumptions in the regression set-up, restricts the exploratory nature of the analysis. Fully unsupervised independent component analysis (ICA) algorithms, on the other hand, can struggle to detect clear classifiable components on single-subject data. We attribute this shortcoming to inadequate modeling of the fMRI source signals by failing to incorporate its temporal nature. fMRI source signals, biological stimuli and non-stimuli-related artifacts are all smooth over a time-scale compatible with the sampling time (TR). We therefore propose Gaussian process ICA (GPICA), which facilitates temporal dependency by the use of Gaussian process source priors. On two fMRI data sets with different sampling frequency, we show that the GPICA-inferred temporal components and associated spatial maps allow for a more definite interpretation than standard temporal ICA methods. The temporal structures of the sources are controlled by the covariance of the Gaussian process, specified by a kernel function with an interpretable and controllable temporal length scale parameter. We propose a hierarchical model specification, considering both instantaneous and convolutive mixing, and we infer source spatial maps, temporal patterns and temporal length scale parameters by Markov Chain Monte Carlo. A companion implementation made as a plug-in for SPM can be downloaded from https://github.com/dittehald/GPICA. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Constructing fMRI connectivity networks: a whole brain functional parcellation method for node definition.

    PubMed

    Maggioni, Eleonora; Tana, Maria Gabriella; Arrigoni, Filippo; Zucca, Claudio; Bianchi, Anna Maria

    2014-05-15

    Functional Magnetic Resonance Imaging (fMRI) is used for exploring brain functionality, and recently it was applied for mapping the brain connection patterns. To give a meaningful neurobiological interpretation to the connectivity network, it is fundamental to properly define the network framework. In particular, the choice of the network nodes may affect the final connectivity results and the consequent interpretation. We introduce a novel method for the intra subject topological characterization of the nodes of fMRI brain networks, based on a whole brain parcellation scheme. The proposed whole brain parcellation algorithm divides the brain into clusters that are homogeneous from the anatomical and functional point of view, each of which constitutes a node. The functional parcellation described is based on the Tononi's cluster index, which measures instantaneous correlation in terms of intrinsic and extrinsic statistical dependencies. The method performance and reliability were first tested on simulated data, then on a real fMRI dataset acquired on healthy subjects during visual stimulation. Finally, the proposed algorithm was applied to epileptic patients' fMRI data recorded during seizures, to verify its usefulness as preparatory step for effective connectivity analysis. For each patient, the nodes of the network involved in ictal activity were defined according to the proposed parcellation scheme and Granger Causality Analysis (GCA) was applied to infer effective connectivity. We showed that the algorithm 1) performed well on simulated data, 2) was able to produce reliable inter subjects results and 3) led to a detailed definition of the effective connectivity pattern. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Functional Additive Mixed Models

    PubMed Central

    Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja

    2014-01-01

    We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach. PMID:26347592

  14. Functional Additive Mixed Models.

    PubMed

    Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja

    2015-04-01

    We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.

  15. Inferring phenomenological models of Markov processes from data

    NASA Astrophysics Data System (ADS)

    Rivera, Catalina; Nemenman, Ilya

    Microscopically accurate modeling of stochastic dynamics of biochemical networks is hard due to the extremely high dimensionality of the state space of such networks. Here we propose an algorithm for inference of phenomenological, coarse-grained models of Markov processes describing the network dynamics directly from data, without the intermediate step of microscopically accurate modeling. The approach relies on the linear nature of the Chemical Master Equation and uses Bayesian Model Selection for identification of parsimonious models that fit the data. When applied to synthetic data from the Kinetic Proofreading process (KPR), a common mechanism used by cells for increasing specificity of molecular assembly, the algorithm successfully uncovers the known coarse-grained description of the process. This phenomenological description has been notice previously, but this time it is derived in an automated manner by the algorithm. James S. McDonnell Foundation Grant No. 220020321.

  16. The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: effect of smoothing of density field on reconstruction and anisotropic BAO analysis

    NASA Astrophysics Data System (ADS)

    Vargas-Magaña, Mariana; Ho, Shirley; Fromenteau, Sebastien.; Cuesta, Antonio. J.

    2017-05-01

    The reconstruction algorithm introduced by Eisenstein et al., which is widely used in clustering analysis, is based on the inference of the first-order Lagrangian displacement field from the Gaussian smoothed galaxy density field in redshift space. The smoothing scale applied to the density field affects the inferred displacement field that is used to move the galaxies, and partially erases the non-linear evolution of the density field. In this article, we explore this crucial step in the reconstruction algorithm. We study the performance of the reconstruction technique using two metrics: first, we study the performance using the anisotropic clustering, extending previous studies focused on isotropic clustering; secondly, we study its effect on the displacement field. We find that smoothing has a strong effect in the quadrupole of the correlation function and affects the accuracy and precision with which we can measure DA(z) and H(z). We find that the optimal smoothing scale to use in the reconstruction algorithm applied to Baryonic Oscillations Spectroscopic Survey-Constant (stellar) MASS (CMASS) is between 5 and 10 h-1 Mpc. Varying from the `usual' 15-5 h-1 Mpc shows ˜0.3 per cent variations in DA(z) and ˜0.4 per cent H(z) and uncertainties are also reduced by 40 per cent and 30 per cent, respectively. We also find that the accuracy of velocity field reconstruction depends strongly on the smoothing scale used for the density field. We measure the bias and uncertainties associated with different choices of smoothing length.

  17. From Sensory Signals to Modality-Independent Conceptual Representations: A Probabilistic Language of Thought Approach

    PubMed Central

    Erdogan, Goker; Yildirim, Ilker; Jacobs, Robert A.

    2015-01-01

    People learn modality-independent, conceptual representations from modality-specific sensory signals. Here, we hypothesize that any system that accomplishes this feat will include three components: a representational language for characterizing modality-independent representations, a set of sensory-specific forward models for mapping from modality-independent representations to sensory signals, and an inference algorithm for inverting forward models—that is, an algorithm for using sensory signals to infer modality-independent representations. To evaluate this hypothesis, we instantiate it in the form of a computational model that learns object shape representations from visual and/or haptic signals. The model uses a probabilistic grammar to characterize modality-independent representations of object shape, uses a computer graphics toolkit and a human hand simulator to map from object representations to visual and haptic features, respectively, and uses a Bayesian inference algorithm to infer modality-independent object representations from visual and/or haptic signals. Simulation results show that the model infers identical object representations when an object is viewed, grasped, or both. That is, the model’s percepts are modality invariant. We also report the results of an experiment in which different subjects rated the similarity of pairs of objects in different sensory conditions, and show that the model provides a very accurate account of subjects’ ratings. Conceptually, this research significantly contributes to our understanding of modality invariance, an important type of perceptual constancy, by demonstrating how modality-independent representations can be acquired and used. Methodologically, it provides an important contribution to cognitive modeling, particularly an emerging probabilistic language-of-thought approach, by showing how symbolic and statistical approaches can be combined in order to understand aspects of human perception. PMID:26554704

  18. Revision of an automated microseismic location algorithm for DAS - 3C geophone hybrid array

    NASA Astrophysics Data System (ADS)

    Mizuno, T.; LeCalvez, J.; Raymer, D.

    2017-12-01

    Application of distributed acoustic sensing (DAS) has been studied in several areas in seismology. One of the areas is microseismic reservoir monitoring (e.g., Molteni et al., 2017, First Break). Considering the present limitations of DAS, which include relatively low signal-to-noise ratio (SNR) and no 3C polarization measurements, a DAS - 3C geophone hybrid array is a practical option when using a single monitoring well. Considering the large volume of data from distributed sensing, microseismic event detection and location using a source scanning type algorithm is a reasonable choice, especially for real-time monitoring. The algorithm must handle both strain rate along the borehole axis for DAS and particle velocity for 3C geophones. Only a small quantity of large SNR events will be detected throughout a large aperture encompassing the hybrid array; therefore, the aperture is to be optimized dynamically to eliminate noisy channels for a majority of events. For such hybrid array, coalescence microseismic mapping (CMM) (Drew et al., 2005, SPE) was revised. CMM forms a likelihood function of location of event and its origin time. At each receiver, a time function of event arrival likelihood is inferred using an SNR function, and it is migrated to time and space to determine hypocenter and origin time likelihood. This algorithm was revised to dynamically optimize such a hybrid array by identifying receivers where a microseismic signal is possibly detected and using only those receivers to compute the likelihood function. Currently, peak SNR is used to select receivers. To prevent false results due to small aperture, a minimum aperture threshold is employed. The algorithm refines location likelihood using 3C geophone polarization. We tested this algorithm using a ray-based synthetic dataset. Leaney (2014, PhD thesis, UBC) is used to compute particle velocity at receivers. Strain rate along the borehole axis is computed from particle velocity as DAS microseismic synthetic data. The likelihood function formed by both DAS and geophone behaves as expected with the aperture dynamically selected depending on the SNR of the event. We conclude that this algorithm can be successfully applied for such hybrid arrays to monitor microseismic activity. A study using a recently acquired dataset is planned.

  19. Molecular adaptation within the coat protein-encoding gene of Tunisian almond isolates of Prunus necrotic ringspot virus.

    PubMed

    Boulila, Moncef; Ben Tiba, Sawssen; Jilani, Saoussen

    2013-04-01

    The sequence alignments of five Tunisian isolates of Prunus necrotic ringspot virus (PNRSV) were searched for evidence of recombination and diversifying selection. Since failing to account for recombination can elevate the false positive error rate in positive selection inference, a genetic algorithm (GARD) was used first and led to the detection of potential recombination events in the coat protein-encoding gene of that virus. The Recco algorithm confirmed these results by identifying, additionally, the potential recombinants. For neutrality testing and evaluation of nucleotide polymorphism in PNRSV CP gene, Tajima's D, and Fu and Li's D and F statistical tests were used. About selection inference, eight algorithms (SLAC, FEL, IFEL, REL, FUBAR, MEME, PARRIS, and GA branch) incorporated in HyPhy package were utilized to assess the selection pressure exerted on the expression of PNRSV capsid. Inferred phylogenies pointed out, in addition to the three classical groups (PE-5, PV-32, and PV-96), the delineation of a fourth cluster having the new proposed designation SW6, and a fifth clade comprising four Tunisian PNRSV isolates which underwent recombination and selective pressure and to which the name Tunisian outgroup was allocated.

  20. Integration of Steady-State and Temporal Gene Expression Data for the Inference of Gene Regulatory Networks

    PubMed Central

    Wang, Yi Kan; Hurley, Daniel G.; Schnell, Santiago; Print, Cristin G.; Crampin, Edmund J.

    2013-01-01

    We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data. PMID:23967277

  1. No interpretation without representation: the role of domain-specific representations and inferences in the Wason selection task.

    PubMed

    Fiddick, L; Cosmides, L; Tooby, J

    2000-10-16

    The Wason selection task is a tool used to study reasoning about conditional rules. Performance on this task changes systematically when one varies its content, and these content effects have been used to argue that the human cognitive architecture contains a number of domain-specific representation and inference systems, such as social contract algorithms and hazard management systems. Recently, however, Sperber, Cara & Girotto (Sperber, D., Cara, F., & Girotto, V. (1995). Relevance theory explains the selection task. Cognition, 57, 31-95) have proposed that relevance theory can explain performance on the selection task - including all content effects - without invoking inference systems that are content-specialized. Herein, we show that relevance theory alone cannot explain a variety of content effects - effects that were predicted in advance and are parsimoniously explained by theories that invoke domain-specific algorithms for representing and making inferences about (i) social contracts and (ii) reducing risk in hazardous situations. Moreover, although Sperber et al. (1995) were able to use relevance theory to produce some new content effects in other domains, they conducted no experiments involving social exchanges or precautions, and so were unable to determine which - content-specialized algorithms or relevance effects - dominate reasoning when the two conflict. When experiments, reported herein, are constructed so that the different theories predict divergent outcomes, the results support the predictions of social contract theory and hazard management theory, indicating that these inference systems override content-general relevance factors. The fact that social contract and hazard management algorithms provide better explanations for performance in their respective domains does not mean that the content-general logical procedures posited by relevance theory do not exist, or that relevance effects never occur. It does mean, however, that one needs a principled way of explaining which effects will dominate when a set of inputs activate more than one reasoning system. We propose the principle of pre-emptive specificity - that the human cognitive architecture should be designed so that more specialized inference systems pre-empt more general ones whenever the stimuli centrally fit the input conditions of the more specialized system. This principle follows from evolutionary and computational considerations that are common to both relevance theory and the ecological rationality approach.

  2. BiomeNet: A Bayesian Model for Inference of Metabolic Divergence among Microbial Communities

    PubMed Central

    Chipman, Hugh; Gu, Hong; Bielawski, Joseph P.

    2014-01-01

    Metagenomics yields enormous numbers of microbial sequences that can be assigned a metabolic function. Using such data to infer community-level metabolic divergence is hindered by the lack of a suitable statistical framework. Here, we describe a novel hierarchical Bayesian model, called BiomeNet (Bayesian inference of metabolic networks), for inferring differential prevalence of metabolic subnetworks among microbial communities. To infer the structure of community-level metabolic interactions, BiomeNet applies a mixed-membership modelling framework to enzyme abundance information. The basic idea is that the mixture components of the model (metabolic reactions, subnetworks, and networks) are shared across all groups (microbiome samples), but the mixture proportions vary from group to group. Through this framework, the model can capture nested structures within the data. BiomeNet is unique in modeling each metagenome sample as a mixture of complex metabolic systems (metabosystems). The metabosystems are composed of mixtures of tightly connected metabolic subnetworks. BiomeNet differs from other unsupervised methods by allowing researchers to discriminate groups of samples through the metabolic patterns it discovers in the data, and by providing a framework for interpreting them. We describe a collapsed Gibbs sampler for inference of the mixture weights under BiomeNet, and we use simulation to validate the inference algorithm. Application of BiomeNet to human gut metagenomes revealed a metabosystem with greater prevalence among inflammatory bowel disease (IBD) patients. Based on the discriminatory subnetworks for this metabosystem, we inferred that the community is likely to be closely associated with the human gut epithelium, resistant to dietary interventions, and interfere with human uptake of an antioxidant connected to IBD. Because this metabosystem has a greater capacity to exploit host-associated glycans, we speculate that IBD-associated communities might arise from opportunist growth of bacteria that can circumvent the host's nutrient-based mechanism for bacterial partner selection. PMID:25412107

  3. An integrative approach to inferring biologically meaningful gene modules

    PubMed Central

    2011-01-01

    Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level. PMID:21791051

  4. Explicit-Duration Hidden Markov Model Inference of UP-DOWN States from Continuous Signals

    PubMed Central

    McFarland, James M.; Hahn, Thomas T. G.; Mehta, Mayank R.

    2011-01-01

    Neocortical neurons show UP-DOWN state (UDS) oscillations under a variety of conditions. These UDS have been extensively studied because of the insight they can yield into the functioning of cortical networks, and their proposed role in putative memory formation. A key element in these studies is determining the precise duration and timing of the UDS. These states are typically determined from the membrane potential of one or a small number of cells, which is often not sufficient to reliably estimate the state of an ensemble of neocortical neurons. The local field potential (LFP) provides an attractive method for determining the state of a patch of cortex with high spatio-temporal resolution; however current methods for inferring UDS from LFP signals lack the robustness and flexibility to be applicable when UDS properties may vary substantially within and across experiments. Here we present an explicit-duration hidden Markov model (EDHMM) framework that is sufficiently general to allow statistically principled inference of UDS from different types of signals (membrane potential, LFP, EEG), combinations of signals (e.g., multichannel LFP recordings) and signal features over long recordings where substantial non-stationarities are present. Using cortical LFPs recorded from urethane-anesthetized mice, we demonstrate that the proposed method allows robust inference of UDS. To illustrate the flexibility of the algorithm we show that it performs well on EEG recordings as well. We then validate these results using simultaneous recordings of the LFP and membrane potential (MP) of nearby cortical neurons, showing that our method offers significant improvements over standard methods. These results could be useful for determining functional connectivity of different brain regions, as well as understanding network dynamics. PMID:21738730

  5. Unsupervised Network Analysis of the Plastic Supraoptic Nucleus Transcriptome Predicts Caprin2 Regulatory Interactions.

    PubMed

    Loh, Su-Yi; Jahans-Price, Thomas; Greenwood, Michael P; Greenwood, Mingkwan; Hoe, See-Ziau; Konopacka, Agnieszka; Campbell, Colin; Murphy, David; Hindmarch, Charles C T

    2017-01-01

    The supraoptic nucleus (SON) is a group of neurons in the hypothalamus responsible for the synthesis and secretion of the peptide hormones vasopressin and oxytocin. Following physiological cues, such as dehydration, salt-loading and lactation, the SON undergoes a function related plasticity that we have previously described in the rat at the transcriptome level. Using the unsupervised graphical lasso (Glasso) algorithm, we reconstructed a putative network from 500 plastic SON genes in which genes are the nodes and the edges are the inferred interactions. The most active nodal gene identified within the network was Caprin2 . Caprin2 encodes an RNA-binding protein that we have previously shown to be vital for the functioning of osmoregulatory neuroendocrine neurons in the SON of the rat hypothalamus. To test the validity of the Glasso network, we either overexpressed or knocked down Caprin2 transcripts in differentiated rat pheochromocytoma PC12 cells and showed that these manipulations had significant opposite effects on the levels of putative target mRNAs. These studies suggest that the predicative power of the Glasso algorithm within an in vivo system is accurate, and identifies biological targets that may be important to the functional plasticity of the SON.

  6. Maximum Likelihood Estimations and EM Algorithms with Length-biased Data

    PubMed Central

    Qin, Jing; Ning, Jing; Liu, Hao; Shen, Yu

    2012-01-01

    SUMMARY Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online. PMID:22323840

  7. Classification of Microarray Data Using Kernel Fuzzy Inference System

    PubMed Central

    Kumar Rath, Santanu

    2014-01-01

    The DNA microarray classification technique has gained more popularity in both research and practice. In real data analysis, such as microarray data, the dataset contains a huge number of insignificant and irrelevant features that tend to lose useful information. Classes with high relevance and feature sets with high significance are generally referred for the selected features, which determine the samples classification into their respective classes. In this paper, kernel fuzzy inference system (K-FIS) algorithm is applied to classify the microarray data (leukemia) using t-test as a feature selection method. Kernel functions are used to map original data points into a higher-dimensional (possibly infinite-dimensional) feature space defined by a (usually nonlinear) function ϕ through a mathematical process called the kernel trick. This paper also presents a comparative study for classification using K-FIS along with support vector machine (SVM) for different set of features (genes). Performance parameters available in the literature such as precision, recall, specificity, F-measure, ROC curve, and accuracy are considered to analyze the efficiency of the classification model. From the proposed approach, it is apparent that K-FIS model obtains similar results when compared with SVM model. This is an indication that the proposed approach relies on kernel function. PMID:27433543

  8. A reconsideration of negative ratings for network-based recommendation

    NASA Astrophysics Data System (ADS)

    Hu, Liang; Ren, Liang; Lin, Wenbin

    2018-01-01

    Recommendation algorithms based on bipartite networks have become increasingly popular, thanks to their accuracy and flexibility. Currently, many of these methods ignore users' negative ratings. In this work, we propose a method to exploit negative ratings for the network-based inference algorithm. We find that negative ratings play a positive role regardless of sparsity of data sets. Furthermore, we improve the efficiency of our method and compare it with the state-of-the-art algorithms. Experimental results show that the present method outperforms the existing algorithms.

  9. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence

    PubMed Central

    Peña-Castillo, Lourdes; Tasan, Murat; Myers, Chad L; Lee, Hyunju; Joshi, Trupti; Zhang, Chao; Guan, Yuanfang; Leone, Michele; Pagnani, Andrea; Kim, Wan Kyu; Krumpelman, Chase; Tian, Weidong; Obozinski, Guillaume; Qi, Yanjun; Mostafavi, Sara; Lin, Guan Ning; Berriz, Gabriel F; Gibbons, Francis D; Lanckriet, Gert; Qiu, Jian; Grant, Charles; Barutcuoglu, Zafer; Hill, David P; Warde-Farley, David; Grouios, Chris; Ray, Debajyoti; Blake, Judith A; Deng, Minghua; Jordan, Michael I; Noble, William S; Morris, Quaid; Klein-Seetharaman, Judith; Bar-Joseph, Ziv; Chen, Ting; Sun, Fengzhu; Troyanskaya, Olga G; Marcotte, Edward M; Xu, Dong; Hughes, Timothy R; Roth, Frederick P

    2008-01-01

    Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized. PMID:18613946

  10. Statistical analysis of field data for aircraft warranties

    NASA Astrophysics Data System (ADS)

    Lakey, Mary J.

    Air Force and Navy maintenance data collection systems were researched to determine their scientific applicability to the warranty process. New and unique algorithms were developed to extract failure distributions which were then used to characterize how selected families of equipment typically fails. Families of similar equipment were identified in terms of function, technology and failure patterns. Statistical analyses and applications such as goodness-of-fit test, maximum likelihood estimation and derivation of confidence intervals for the probability density function parameters were applied to characterize the distributions and their failure patterns. Statistical and reliability theory, with relevance to equipment design and operational failures were also determining factors in characterizing the failure patterns of the equipment families. Inferences about the families with relevance to warranty needs were then made.

  11. An expert system environment for the Generic VHSIC Spaceborne Computer (GVSC)

    NASA Astrophysics Data System (ADS)

    Cockerham, Ann; Labhart, Jay; Rowe, Michael; Skinner, James

    The authors describe a Phase II Phillips Laboratory Small Business Innovative Research (SBIR) program being performed to implement a flexible and general-purpose inference environment for embedded space and avionics applications. This inference environment is being developed in Ada and takes special advantage of the target architecture, the GVSC. The GVSC implements the MIL-STD-1750A ISA and contains enhancements to allow access of up to 8 MBytes of memory. The inference environment makes use of the Merit Enhanced Traversal Engine (METE) algorithm, which employs the latest inference and knowledge representation strategies to optimize both run-time speed and memory utilization.

  12. Coupled optics reconstruction from TBT data using MAD-X

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alexahin, Y.; Gianfelice-Wendt, E.; /Fermilab

    2007-06-01

    Turn-by-turn BPM data provide immediate information on the coupled optics functions at BPM locations. In the case of small deviations from the known (design) uncoupled optics some cognizance of the sources of perturbation, BPM calibration errors and tilts can also be inferred without detailed lattice modeling. In practical situations, however, fitting the lattice model with the help of some optics code would lead to more reliable results. We present an algorithm for coupled optics reconstruction from TBT data on the basis of MAD-X and give examples of its application for the Fermilab Tevatron accelerator.

  13. Density Estimation with Mercer Kernels

    NASA Technical Reports Server (NTRS)

    Macready, William G.

    2003-01-01

    We present a new method for density estimation based on Mercer kernels. The density estimate can be understood as the density induced on a data manifold by a mixture of Gaussians fit in a feature space. As is usual, the feature space and data manifold are defined with any suitable positive-definite kernel function. We modify the standard EM algorithm for mixtures of Gaussians to infer the parameters of the density. One benefit of the approach is it's conceptual simplicity, and uniform applicability over many different types of data. Preliminary results are presented for a number of simple problems.

  14. Universal Darwinism As a Process of Bayesian Inference.

    PubMed

    Campbell, John O

    2016-01-01

    Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an "experiment" in the external world environment, and the results of that "experiment" or the "surprise" entailed by predicted and actual outcomes of the "experiment." Minimization of free energy implies that the implicit measure of "surprise" experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature.

  15. Universal Darwinism As a Process of Bayesian Inference

    PubMed Central

    Campbell, John O.

    2016-01-01

    Many of the mathematical frameworks describing natural selection are equivalent to Bayes' Theorem, also known as Bayesian updating. By definition, a process of Bayesian Inference is one which involves a Bayesian update, so we may conclude that these frameworks describe natural selection as a process of Bayesian inference. Thus, natural selection serves as a counter example to a widely-held interpretation that restricts Bayesian Inference to human mental processes (including the endeavors of statisticians). As Bayesian inference can always be cast in terms of (variational) free energy minimization, natural selection can be viewed as comprising two components: a generative model of an “experiment” in the external world environment, and the results of that “experiment” or the “surprise” entailed by predicted and actual outcomes of the “experiment.” Minimization of free energy implies that the implicit measure of “surprise” experienced serves to update the generative model in a Bayesian manner. This description closely accords with the mechanisms of generalized Darwinian process proposed both by Dawkins, in terms of replicators and vehicles, and Campbell, in terms of inferential systems. Bayesian inference is an algorithm for the accumulation of evidence-based knowledge. This algorithm is now seen to operate over a wide range of evolutionary processes, including natural selection, the evolution of mental models and cultural evolutionary processes, notably including science itself. The variational principle of free energy minimization may thus serve as a unifying mathematical framework for universal Darwinism, the study of evolutionary processes operating throughout nature. PMID:27375438

  16. Fast half-sibling population reconstruction: theory and algorithms.

    PubMed

    Dexter, Daniel; Brown, Daniel G

    2013-07-12

    Kinship inference is the task of identifying genealogically related individuals. Kinship information is important for determining mating structures, notably in endangered populations. Although many solutions exist for reconstructing full sibling relationships, few exist for half-siblings. We consider the problem of determining whether a proposed half-sibling population reconstruction is valid under Mendelian inheritance assumptions. We show that this problem is NP-complete and provide a 0/1 integer program that identifies the minimum number of individuals that must be removed from a population in order for the reconstruction to become valid. We also present SibJoin, a heuristic-based clustering approach based on Mendelian genetics, which is strikingly fast. The software is available at http://github.com/ddexter/SibJoin.git+. Our SibJoin algorithm is reasonably accurate and thousands of times faster than existing algorithms. The heuristic is used to infer a half-sibling structure for a population which was, until recently, too large to evaluate.

  17. F-MAP: A Bayesian approach to infer the gene regulatory network using external hints

    PubMed Central

    Shahdoust, Maryam; Mahjub, Hossein; Sadeghi, Mehdi

    2017-01-01

    The Common topological features of related species gene regulatory networks suggest reconstruction of the network of one species by using the further information from gene expressions profile of related species. We present an algorithm to reconstruct the gene regulatory network named; F-MAP, which applies the knowledge about gene interactions from related species. Our algorithm sets a Bayesian framework to estimate the precision matrix of one species microarray gene expressions dataset to infer the Gaussian Graphical model of the network. The conjugate Wishart prior is used and the information from related species is applied to estimate the hyperparameters of the prior distribution by using the factor analysis. Applying the proposed algorithm on six related species of drosophila shows that the precision of reconstructed networks is improved considerably compared to the precision of networks constructed by other Bayesian approaches. PMID:28938012

  18. Mixed-initiative control of intelligent systems

    NASA Technical Reports Server (NTRS)

    Borchardt, G. C.

    1987-01-01

    Mixed-initiative user interfaces provide a means by which a human operator and an intelligent system may collectively share the task of deciding what to do next. Such interfaces are important to the effective utilization of real-time expert systems as assistants in the execution of critical tasks. Presented here is the Incremental Inference algorithm, a symbolic reasoning mechanism based on propositional logic and suited to the construction of mixed-initiative interfaces. The algorithm is similar in some respects to the Truth Maintenance System, but replaces the notion of 'justifications' with a notion of recency, allowing newer values to override older values yet permitting various interested parties to refresh these values as they become older and thus more vulnerable to change. A simple example is given of the use of the Incremental Inference algorithm plus an overview of the integration of this mechanism within the SPECTRUM expert system for geological interpretation of imaging spectrometer data.

  19. Activation Product Inverse Calculations with NDI

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gray, Mark Girard

    NDI based forward calculations of activation product concentrations can be systematically used to infer structural element concentrations from measured activation product concentrations with an iterative algorithm. The algorithm converges exactly for the basic production-depletion chain with explicit activation product production and approximately, in the least-squares sense, for the full production-depletion chain with explicit activation product production and nosub production-depletion chain. The algorithm is suitable for automation.

  20. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses.

    PubMed

    Zaneveld, Jesse R R; Thurber, Rebecca L V

    2014-01-01

    Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses.

  2. Sequential fitting-and-separating reflectance components for analytical bidirectional reflectance distribution function estimation.

    PubMed

    Lee, Yu; Yu, Chanki; Lee, Sang Wook

    2018-01-10

    We present a sequential fitting-and-separating algorithm for surface reflectance components that separates individual dominant reflectance components and simultaneously estimates the corresponding bidirectional reflectance distribution function (BRDF) parameters from the separated reflectance values. We tackle the estimation of a Lafortune BRDF model, which combines a nonLambertian diffuse reflection and multiple specular reflectance components with a different specular lobe. Our proposed method infers the appropriate number of BRDF lobes and their parameters by separating and estimating each of the reflectance components using an interval analysis-based branch-and-bound method in conjunction with iterative K-ordered scale estimation. The focus of this paper is the estimation of the Lafortune BRDF model. Nevertheless, our proposed method can be applied to other analytical BRDF models such as the Cook-Torrance and Ward models. Experiments were carried out to validate the proposed method using isotropic materials from the Mitsubishi Electric Research Laboratories-Massachusetts Institute of Technology (MERL-MIT) BRDF database, and the results show that our method is superior to a conventional minimization algorithm.

  3. ITrace: An implicit trust inference method for trust-aware collaborative filtering

    NASA Astrophysics Data System (ADS)

    He, Xu; Liu, Bin; Chen, Kejia

    2018-04-01

    The growth of Internet commerce has stimulated the use of collaborative filtering (CF) algorithms as recommender systems. A CF algorithm recommends items of interest to the target user by leveraging the votes given by other similar users. In a standard CF framework, it is assumed that the credibility of every voting user is exactly the same with respect to the target user. This assumption is not satisfied and thus may lead to misleading recommendations in many practical applications. A natural countermeasure is to design a trust-aware CF (TaCF) algorithm, which can take account of the difference in the credibilities of the voting users when performing CF. To this end, this paper presents a trust inference approach, which can predict the implicit trust of the target user on every voting user from a sparse explicit trust matrix. Then an improved CF algorithm termed iTrace is proposed, which takes advantage of both the explicit and the predicted implicit trust to provide recommendations with the CF framework. An empirical evaluation on a public dataset demonstrates that the proposed algorithm provides a significant improvement in recommendation quality in terms of mean absolute error.

  4. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemla, A; Lang, D; Kostova, T

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less

  5. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

    PubMed

    Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

    2011-06-02

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.

  6. A Robust Deconvolution Method based on Transdimensional Hierarchical Bayesian Inference

    NASA Astrophysics Data System (ADS)

    Kolb, J.; Lekic, V.

    2012-12-01

    Analysis of P-S and S-P conversions allows us to map receiver side crustal and lithospheric structure. This analysis often involves deconvolution of the parent wave field from the scattered wave field as a means of suppressing source-side complexity. A variety of deconvolution techniques exist including damped spectral division, Wiener filtering, iterative time-domain deconvolution, and the multitaper method. All of these techniques require estimates of noise characteristics as input parameters. We present a deconvolution method based on transdimensional Hierarchical Bayesian inference in which both noise magnitude and noise correlation are used as parameters in calculating the likelihood probability distribution. Because the noise for P-S and S-P conversion analysis in terms of receiver functions is a combination of both background noise - which is relatively easy to characterize - and signal-generated noise - which is much more difficult to quantify - we treat measurement errors as an known quantity, characterized by a probability density function whose mean and variance are model parameters. This transdimensional Hierarchical Bayesian approach has been successfully used previously in the inversion of receiver functions in terms of shear and compressional wave speeds of an unknown number of layers [1]. In our method we used a Markov chain Monte Carlo (MCMC) algorithm to find the receiver function that best fits the data while accurately assessing the noise parameters. In order to parameterize the receiver function we model the receiver function as an unknown number of Gaussians of unknown amplitude and width. The algorithm takes multiple steps before calculating the acceptance probability of a new model, in order to avoid getting trapped in local misfit minima. Using both observed and synthetic data, we show that the MCMC deconvolution method can accurately obtain a receiver function as well as an estimate of the noise parameters given the parent and daughter components. Furthermore, we demonstrate that this new approach is far less susceptible to generating spurious features even at high noise levels. Finally, the method yields not only the most-likely receiver function, but also quantifies its full uncertainty. [1] Bodin, T., M. Sambridge, H. Tkalčić, P. Arroucau, K. Gallagher, and N. Rawlinson (2012), Transdimensional inversion of receiver functions and surface wave dispersion, J. Geophys. Res., 117, B02301

  7. Hybrid grammar-based approach to nonlinear dynamical system identification from biological time series

    NASA Astrophysics Data System (ADS)

    McKinney, B. A.; Crowe, J. E., Jr.; Voss, H. U.; Crooke, P. S.; Barney, N.; Moore, J. H.

    2006-02-01

    We introduce a grammar-based hybrid approach to reverse engineering nonlinear ordinary differential equation models from observed time series. This hybrid approach combines a genetic algorithm to search the space of model architectures with a Kalman filter to estimate the model parameters. Domain-specific knowledge is used in a context-free grammar to restrict the search space for the functional form of the target model. We find that the hybrid approach outperforms a pure evolutionary algorithm method, and we observe features in the evolution of the dynamical models that correspond with the emergence of favorable model components. We apply the hybrid method to both artificially generated time series and experimentally observed protein levels from subjects who received the smallpox vaccine. From the observed data, we infer a cytokine protein interaction network for an individual’s response to the smallpox vaccine.

  8. An extension to the Chahine method of inverting the radiative transfer equation. [application to ozone distribution in atmosphere

    NASA Technical Reports Server (NTRS)

    Twomey, S.; Herman, B.; Rabinoff, R.

    1977-01-01

    An extension of the Chahine relaxation method (1970) for inverting the radiative transfer equation is presented. This method is superior to the original method in that it takes into account in a realistic manner the shape of the kernel function, and its extension to nonlinear systems is much more straightforward. A comparison of the new method with a matrix method due to Twomey (1965), in a problem involving inference of vertical distribution of ozone from spectroscopic measurements in the near ultraviolet, indicates that in this situation this method is stable with errors in the input data up to 4%, whereas the matrix method breaks down at these levels. The problem of non-uniqueness of the solution, which is a property of the system of equations rather than of any particular algorithm for solving them, remains, although it takes on slightly different forms for the two algorithms.

  9. A new learning algorithm for a fully connected neuro-fuzzy inference system.

    PubMed

    Chen, C L Philip; Wang, Jing; Wang, Chi-Hsu; Chen, Long

    2014-10-01

    A traditional neuro-fuzzy system is transformed into an equivalent fully connected three layer neural network (NN), namely, the fully connected neuro-fuzzy inference systems (F-CONFIS). The F-CONFIS differs from traditional NNs by its dependent and repeated weights between input and hidden layers and can be considered as the variation of a kind of multilayer NN. Therefore, an efficient learning algorithm for the F-CONFIS to cope these repeated weights is derived. Furthermore, a dynamic learning rate is proposed for neuro-fuzzy systems via F-CONFIS where both premise (hidden) and consequent portions are considered. Several simulation results indicate that the proposed approach achieves much better accuracy and fast convergence.

  10. Remote Sensing of Cloud Properties using Ground-based Measurements of Zenith Radiance

    NASA Technical Reports Server (NTRS)

    Chiu, J. Christine; Marshak, Alexander; Knyazikhin, Yuri; Wiscombe, Warren J.; Barker, Howard W.; Barnard, James C.; Luo, Yi

    2006-01-01

    An extensive verification of cloud property retrievals has been conducted for two algorithms using zenith radiances measured by the Atmospheric Radiation Measurement (ARM) Program ground-based passive two-channel (673 and 870 nm) Narrow Field-Of-View Radiometer. The underlying principle of these algorithms is that clouds have nearly identical optical properties at these wavelengths, but corresponding spectral surface reflectances (for vegetated surfaces) differ significantly. The first algorithm, the RED vs. NIR, works for a fully three-dimensional cloud situation. It retrieves not only cloud optical depth, but also an effective radiative cloud fraction. Importantly, due to one-second time resolution of radiance measurements, we are able, for the first time, to capture detailed changes in cloud structure at the natural time scale of cloud evolution. The cloud optical depths tau retrieved by this algorithm are comparable to those inferred from both downward fluxes in overcast situations and microwave brightness temperatures for broken clouds. Moreover, it can retrieve tau for thin patchy clouds, where flux and microwave observations fail to detect them. The second algorithm, referred to as COUPLED, couples zenith radiances with simultaneous fluxes to infer 2. In general, the COUPLED and RED vs. NIR algorithms retrieve consistent values of tau. However, the COUPLED algorithm is more sensitive to the accuracies of measured radiance, flux, and surface reflectance than the RED vs. NIR algorithm. This is especially true for thick overcast clouds where it may substantially overestimate z.

  11. Perception as Evidence Accumulation and Bayesian Inference: Insights from Masked Priming

    ERIC Educational Resources Information Center

    Norris, Dennis; Kinoshita, Sachiko

    2008-01-01

    The authors argue that perception is Bayesian inference based on accumulation of noisy evidence and that, in masked priming, the perceptual system is tricked into treating the prime and the target as a single object. Of the 2 algorithms considered for formalizing how the evidence sampled from a prime and target is combined, only 1 was shown to be…

  12. Fetal ECG extraction via Type-2 adaptive neuro-fuzzy inference systems.

    PubMed

    Ahmadieh, Hajar; Asl, Babak Mohammadzadeh

    2017-04-01

    We proposed a noninvasive method for separating the fetal ECG (FECG) from maternal ECG (MECG) by using Type-2 adaptive neuro-fuzzy inference systems. The method can extract FECG components from abdominal signal by using one abdominal channel, including maternal and fetal cardiac signals and other environmental noise signals, and one chest channel. The proposed algorithm detects the nonlinear dynamics of the mother's body. So, the components of the MECG are estimated from the abdominal signal. By subtracting estimated mother cardiac signal from abdominal signal, fetal cardiac signal can be extracted. This algorithm was applied on synthetic ECG signals generated based on the models developed by McSharry et al. and Behar et al. and also on DaISy real database. In environments with high uncertainty, our method performs better than the Type-1 fuzzy method. Specifically, in evaluation of the algorithm with the synthetic data based on McSharry model, for input signals with SNR of -5dB, the SNR of the extracted FECG was improved by 38.38% in comparison with the Type-1 fuzzy method. Also, the results show that increasing the uncertainty or decreasing the input SNR leads to increasing the percentage of the improvement in SNR of the extracted FECG. For instance, when the SNR of the input signal decreases to -30dB, our proposed algorithm improves the SNR of the extracted FECG by 71.06% with respect to the Type-1 fuzzy method. The same results were obtained on synthetic data based on Behar model. Our results on real database reflect the success of the proposed method to separate the maternal and fetal heart signals even if their waves overlap in time. Moreover, the proposed algorithm was applied to the simulated fetal ECG with ectopic beats and achieved good results in separating FECG from MECG. The results show the superiority of the proposed Type-2 neuro-fuzzy inference method over the Type-1 neuro-fuzzy inference and the polynomial networks methods, which is due to its capability to capture the nonlinearities of the model better. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Missing data imputation and haplotype phase inference for genome-wide association studies

    PubMed Central

    Browning, Sharon R.

    2009-01-01

    Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance. PMID:18850115

  14. Finding undetected protein associations in cell signaling by belief propagation.

    PubMed

    Bailly-Bechet, M; Borgs, C; Braunstein, A; Chayes, J; Dagkessamanskaia, A; François, J-M; Zecchina, R

    2011-01-11

    External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High-throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field.

  15. Computational Intelligence Based Data Fusion Algorithm for Dynamic sEMG and Skeletal Muscle Force Modelling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chandrasekhar Potluri,; Madhavi Anugolu; Marco P. Schoen

    2013-08-01

    In this work, an array of three surface Electrography (sEMG) sensors are used to acquired muscle extension and contraction signals for 18 healthy test subjects. The skeletal muscle force is estimated using the acquired sEMG signals and a Non-linear Wiener Hammerstein model, relating the two signals in a dynamic fashion. The model is obtained from using System Identification (SI) algorithm. The obtained force models for each sensor are fused using a proposed fuzzy logic concept with the intent to improve the force estimation accuracy and resilience to sensor failure or misalignment. For the fuzzy logic inference system, the sEMG entropy,more » the relative error, and the correlation of the force signals are considered for defining the membership functions. The proposed fusion algorithm yields an average of 92.49% correlation between the actual force and the overall estimated force output. In addition, the proposed fusionbased approach is implemented on a test platform. Experiments indicate an improvement in finger/hand force estimation.« less

  16. Detection of nicotine content impact in tobacco manufacturing using computational intelligence.

    PubMed

    Begic Fazlic, Lejla; Avdagic, Zikrija

    2011-01-01

    A study is presented for the detection of nicotine impact in different cigarette type, using recorded data and Computational Intelligence techniques. Recorded puffs are processed using Continuous Wavelet Transform and used to extract time-frequency features for normal and abnormal puffs conditions. The wavelet energy distributions are used as inputs to classifiers based on Adaptive Neuro-Fuzzy Inference Systems (ANFIS) and Genetic Algorithms (GAs). The number and the parameters of Membership Functions are used in ANFIS along with the features from wavelet energy distributionare selected using GAs, maximising the diagnosis success. GA with ANFIS (GANFIS) are trained with a subset of data with known nicotine conditions. The trained GANFIS are tested using the other set of data (testing data). A classical method by High-Performance Liquid Chromatography is also introduced to solve this problem, respectively. The results as well as the performances of these two approaches are compared. A combination of these two algorithms is also suggested to improve the efficiency of this solution procedure. Computational results show that this combined algorithm is promising.

  17. An Optimal Algorithm towards Successive Location Privacy in Sensor Networks with Dynamic Programming

    NASA Astrophysics Data System (ADS)

    Zhao, Baokang; Wang, Dan; Shao, Zili; Cao, Jiannong; Chan, Keith C. C.; Su, Jinshu

    In wireless sensor networks, preserving location privacy under successive inference attacks is extremely critical. Although this problem is NP-complete in general cases, we propose a dynamic programming based algorithm and prove it is optimal in special cases where the correlation only exists between p immediate adjacent observations.

  18. Genetic algorithm optimized rainfall-runoff fuzzy inference system for row crop watersheds with claypan soils

    USDA-ARS?s Scientific Manuscript database

    The fuzzy logic algorithm has the ability to describe knowledge in a descriptive human-like manner in the form of simple rules using linguistic variables, and provides a new way of modeling uncertain or naturally fuzzy hydrological processes like non-linear rainfall-runoff relationships. Fuzzy infe...

  19. Effect of defuzzification method of fuzzy modeling

    NASA Astrophysics Data System (ADS)

    Lapohos, Tibor; Buchal, Ralph O.

    1994-10-01

    Imprecision can arise in fuzzy relational modeling as a result of fuzzification, inference and defuzzification. These three sources of imprecision are difficult to separate. We have determined through numerical studies that an important source of imprecision is the defuzzification stage. This imprecision adversely affects the quality of the model output. The most widely used defuzzification algorithm is known by the name of `center of area' (COA) or `center of gravity' (COG). In this paper, we show that this algorithm not only maps the near limit values of the variables improperly but also introduces errors for middle domain values of the same variables. Furthermore, the behavior of this algorithm is a function of the shape of the reference sets. We compare the COA method to the weighted average of cluster centers (WACC) procedure in which the transformation is carried out based on the values of the cluster centers belonging to each of the reference membership functions instead of using the functions themselves. We show that this procedure is more effective and computationally much faster than the COA. The method is tested for a family of reference sets satisfying certain constraints, that is, for any support value the sum of reference membership function values equals one and the peak values of the two marginal membership functions project to the boundaries of the universe of discourse. For all the member sets of this family of reference sets the defuzzification errors do not get bigger as the linguistic variables tend to their extreme values. In addition, the more reference sets that are defined for a certain linguistic variable, the less the average defuzzification error becomes. In case of triangle shaped reference sets there is no defuzzification error at all. Finally, an alternative solution is provided that improves the performance of the COA method.

  20. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

    PubMed

    Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

    2016-10-21

    Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.

  1. Spectral Anonymization of Data

    PubMed Central

    Lasko, Thomas A.; Vinterbo, Staal A.

    2011-01-01

    The goal of data anonymization is to allow the release of scientifically useful data in a form that protects the privacy of its subjects. This requires more than simply removing personal identifiers from the data, because an attacker can still use auxiliary information to infer sensitive individual information. Additional perturbation is necessary to prevent these inferences, and the challenge is to perturb the data in a way that preserves its analytic utility. No existing anonymization algorithm provides both perfect privacy protection and perfect analytic utility. We make the new observation that anonymization algorithms are not required to operate in the original vector-space basis of the data, and many algorithms can be improved by operating in a judiciously chosen alternate basis. A spectral basis derived from the data’s eigenvectors is one that can provide substantial improvement. We introduce the term spectral anonymization to refer to an algorithm that uses a spectral basis for anonymization, and we give two illustrative examples. We also propose new measures of privacy protection that are more general and more informative than existing measures, and a principled reference standard with which to define adequate privacy protection. PMID:21373375

  2. Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals

    PubMed Central

    Fourment, Mathieu; Claywell, Brian C; Dinh, Vu; McCoy, Connor; Matsen IV, Frederick A; Darling, Aaron E

    2018-01-01

    Abstract Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop “guided” proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy. PMID:29186587

  3. Fast function-on-scalar regression with penalized basis expansions.

    PubMed

    Reiss, Philip T; Huang, Lei; Mennes, Maarten

    2010-01-01

    Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.

  4. Scaling up spike-and-slab models for unsupervised feature learning.

    PubMed

    Goodfellow, Ian J; Courville, Aaron; Bengio, Yoshua

    2013-08-01

    We describe the use of two spike-and-slab models for modeling real-valued data, with an emphasis on their applications to object recognition. The first model, which we call spike-and-slab sparse coding (S3C), is a preexisting model for which we introduce a faster approximate inference algorithm. We introduce a deep variant of S3C, which we call the partially directed deep Boltzmann machine (PD-DBM) and extend our S3C inference algorithm for use on this model. We describe learning procedures for each. We demonstrate that our inference procedure for S3C enables scaling the model to unprecedented large problem sizes, and demonstrate that using S3C as a feature extractor results in very good object recognition performance, particularly when the number of labeled examples is low. We show that the PD-DBM generates better samples than its shallow counterpart, and that unlike DBMs or DBNs, the PD-DBM may be trained successfully without greedy layerwise training.

  5. Data analysis using scale-space filtering and Bayesian probabilistic reasoning

    NASA Technical Reports Server (NTRS)

    Kulkarni, Deepak; Kutulakos, Kiriakos; Robinson, Peter

    1991-01-01

    This paper describes a program for analysis of output curves from Differential Thermal Analyzer (DTA). The program first extracts probabilistic qualitative features from a DTA curve of a soil sample, and then uses Bayesian probabilistic reasoning to infer the mineral in the soil. The qualifier module employs a simple and efficient extension of scale-space filtering suitable for handling DTA data. We have observed that points can vanish from contours in the scale-space image when filtering operations are not highly accurate. To handle the problem of vanishing points, perceptual organizations heuristics are used to group the points into lines. Next, these lines are grouped into contours by using additional heuristics. Probabilities are associated with these contours using domain-specific correlations. A Bayes tree classifier processes probabilistic features to infer the presence of different minerals in the soil. Experiments show that the algorithm that uses domain-specific correlation to infer qualitative features outperforms a domain-independent algorithm that does not.

  6. Model-Free Reconstruction of Excitatory Neuronal Connectivity from Calcium Imaging Signals

    PubMed Central

    Stetter, Olav; Battaglia, Demian; Soriano, Jordi; Geisel, Theo

    2012-01-01

    A systematic assessment of global neural network connectivity through direct electrophysiological assays has remained technically infeasible, even in simpler systems like dissociated neuronal cultures. We introduce an improved algorithmic approach based on Transfer Entropy to reconstruct structural connectivity from network activity monitored through calcium imaging. We focus in this study on the inference of excitatory synaptic links. Based on information theory, our method requires no prior assumptions on the statistics of neuronal firing and neuronal connections. The performance of our algorithm is benchmarked on surrogate time series of calcium fluorescence generated by the simulated dynamics of a network with known ground-truth topology. We find that the functional network topology revealed by Transfer Entropy depends qualitatively on the time-dependent dynamic state of the network (bursting or non-bursting). Thus by conditioning with respect to the global mean activity, we improve the performance of our method. This allows us to focus the analysis to specific dynamical regimes of the network in which the inferred functional connectivity is shaped by monosynaptic excitatory connections, rather than by collective synchrony. Our method can discriminate between actual causal influences between neurons and spurious non-causal correlations due to light scattering artifacts, which inherently affect the quality of fluorescence imaging. Compared to other reconstruction strategies such as cross-correlation or Granger Causality methods, our method based on improved Transfer Entropy is remarkably more accurate. In particular, it provides a good estimation of the excitatory network clustering coefficient, allowing for discrimination between weakly and strongly clustered topologies. Finally, we demonstrate the applicability of our method to analyses of real recordings of in vitro disinhibited cortical cultures where we suggest that excitatory connections are characterized by an elevated level of clustering compared to a random graph (although not extreme) and can be markedly non-local. PMID:22927808

  7. Enriching regulatory networks by bootstrap learning using optimised GO-based gene similarity and gene links mined from PubMed abstracts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.

    2011-02-18

    Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant linksmore » between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.« less

  8. Additive hazards regression and partial likelihood estimation for ecological monitoring data across space.

    PubMed

    Lin, Feng-Chang; Zhu, Jun

    2012-01-01

    We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin.

  9. Wisdom of crowds for robust gene network inference

    PubMed Central

    Marbach, Daniel; Costello, James C.; Küffner, Robert; Vega, Nicci; Prill, Robert J.; Camacho, Diogo M.; Allison, Kyle R.; Kellis, Manolis; Collins, James J.; Stolovitzky, Gustavo

    2012-01-01

    Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks. PMID:22796662

  10. Development of algorithms for building inventory compilation through remote sensing and statistical inferencing

    NASA Astrophysics Data System (ADS)

    Sarabandi, Pooya

    Building inventories are one of the core components of disaster vulnerability and loss estimations models, and as such, play a key role in providing decision support for risk assessment, disaster management and emergency response efforts. In may parts of the world inclusive building inventories, suitable for the use in catastrophe models cannot be found. Furthermore, there are serious shortcomings in the existing building inventories that include incomplete or out-dated information on critical attributes as well as missing or erroneous values for attributes. In this dissertation a set of methodologies for updating spatial and geometric information of buildings from single and multiple high-resolution optical satellite images are presented. Basic concepts, terminologies and fundamentals of 3-D terrain modeling from satellite images are first introduced. Different sensor projection models are then presented and sources of optical noise such as lens distortions are discussed. An algorithm for extracting height and creating 3-D building models from a single high-resolution satellite image is formulated. The proposed algorithm is a semi-automated supervised method capable of extracting attributes such as longitude, latitude, height, square footage, perimeter, irregularity index and etc. The associated errors due to the interactive nature of the algorithm are quantified and solutions for minimizing the human-induced errors are proposed. The height extraction algorithm is validated against independent survey data and results are presented. The validation results show that an average height modeling accuracy of 1.5% can be achieved using this algorithm. Furthermore, concept of cross-sensor data fusion for the purpose of 3-D scene reconstruction using quasi-stereo images is developed in this dissertation. The developed algorithm utilizes two or more single satellite images acquired from different sensors and provides the means to construct 3-D building models in a more economical way. A terrain-dependent-search algorithm is formulated to facilitate the search for correspondences in a quasi-stereo pair of images. The calculated heights for sample buildings using cross-sensor data fusion algorithm show an average coefficient of variation 1.03%. In order to infer structural-type and occupancy-type, i.e. engineering attributes, of buildings from spatial and geometric attributes of 3-D models, a statistical data analysis framework is formulated. Applications of "Classification Trees" and "Multinomial Logistic Models" in modeling the marginal probabilities of class-membership of engineering attributes are investigated. Adaptive statistical models to incorporate different spatial and geometric attributes of buildings---while inferring the engineering attributes---are developed in this dissertation. The inferred engineering attributes in conjunction with the spatial and geometric attributes derived from the imagery can be used to augment regional building inventories and therefore enhance the result of catastrophe models. In the last part of the dissertation, a set of empirically-derived motion-damage relationships based on the correlation of observed building performance with measured ground-motion parameters from 1994 Northridge and 1999 Chi-Chi Taiwan earthquakes are developed. Fragility functions in the form of cumulative lognormal distributions and damage probability matrices for several classes of buildings (wood, steel and concrete), as well as number of ground-motion intensity measures are developed and compared to currently-used motion-damage relationships.

  11. Capturing User Reading Behaviors for Personalized Document Summarization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Songhua; Jiang, Hao; Lau, Francis

    2011-01-01

    We propose a new personalized document summarization method that observes a user's personal reading preferences. These preferences are inferred from the user's reading behaviors, including facial expressions, gaze positions, and reading durations that were captured during the user's past reading activities. We compare the performance of our algorithm with that of a few peer algorithms and software packages. The results of our comparative study show that our algorithm can produce more superior personalized document summaries than all the other methods in that the summaries generated by our algorithm can better satisfy a user's personal preferences.

  12. Optimized face recognition algorithm using radial basis function neural networks and its practical applications.

    PubMed

    Yoo, Sung-Hoon; Oh, Sung-Kwun; Pedrycz, Witold

    2015-09-01

    In this study, we propose a hybrid method of face recognition by using face region information extracted from the detected face region. In the preprocessing part, we develop a hybrid approach based on the Active Shape Model (ASM) and the Principal Component Analysis (PCA) algorithm. At this step, we use a CCD (Charge Coupled Device) camera to acquire a facial image by using AdaBoost and then Histogram Equalization (HE) is employed to improve the quality of the image. ASM extracts the face contour and image shape to produce a personal profile. Then we use a PCA method to reduce dimensionality of face images. In the recognition part, we consider the improved Radial Basis Function Neural Networks (RBF NNs) to identify a unique pattern associated with each person. The proposed RBF NN architecture consists of three functional modules realizing the condition phase, the conclusion phase, and the inference phase completed with the help of fuzzy rules coming in the standard 'if-then' format. In the formation of the condition part of the fuzzy rules, the input space is partitioned with the use of Fuzzy C-Means (FCM) clustering. In the conclusion part of the fuzzy rules, the connections (weights) of the RBF NNs are represented by four kinds of polynomials such as constant, linear, quadratic, and reduced quadratic. The values of the coefficients are determined by running a gradient descent method. The output of the RBF NNs model is obtained by running a fuzzy inference method. The essential design parameters of the network (including learning rate, momentum coefficient and fuzzification coefficient used by the FCM) are optimized by means of Differential Evolution (DE). The proposed P-RBF NNs (Polynomial based RBF NNs) are applied to facial recognition and its performance is quantified from the viewpoint of the output performance and recognition rate. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets.

    PubMed

    Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A

    2011-07-01

    Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. © 2011 Naegle et al.

  14. A grammar inference approach for predicting kinase specific phosphorylation sites.

    PubMed

    Datta, Sutapa; Mukhopadhyay, Subhasis

    2015-01-01

    Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.

  15. INFERENCE BUILDING BLOCKS

    DTIC Science & Technology

    2018-02-15

    address the problem that probabilistic inference algorithms are diÿcult and tedious to implement, by expressing them in terms of a small number of...building blocks, which are automatic transformations on probabilistic programs. On one hand, our curation of these building blocks reflects the way human...reasoning with low-level computational optimization, so the speed and accuracy of the generated solvers are competitive with state-of-the-art systems. 15

  16. Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

    PubMed Central

    2010-01-01

    Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603

  17. A Comparison of Phasing Algorithms for Trios and Unrelated Individuals

    PubMed Central

    Marchini, Jonathan; Cutler, David; Patterson, Nick; Stephens, Matthew; Eskin, Eleazar; Halperin, Eran; Lin, Shin; Qin, Zhaohui S.; Munro, Heather M.; Abecasis, Gonçalo R.; Donnelly, Peter

    2006-01-01

    Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million–SNP HapMap data set. Finally, we evaluated methods of estimating the value of r2 between a pair of SNPs and concluded that all methods estimated r2 well when the estimated value was ⩾0.8. PMID:16465620

  18. Estimating crustal heterogeneity from double-difference tomography

    USGS Publications Warehouse

    Got, J.-L.; Monteiller, V.; Virieux, J.; Okubo, P.

    2006-01-01

    Seismic velocity parameters in limited, but heterogeneous volumes can be inferred using a double-difference tomographic algorithm, but to obtain meaningful results accuracy must be maintained at every step of the computation. MONTEILLER et al. (2005) have devised a double-difference tomographic algorithm that takes full advantage of the accuracy of cross-spectral time-delays of large correlated event sets. This algorithm performs an accurate computation of theoretical travel-time delays in heterogeneous media and applies a suitable inversion scheme based on optimization theory. When applied to Kilauea Volcano, in Hawaii, the double-difference tomography approach shows significant and coherent changes to the velocity model in the well-resolved volumes beneath the Kilauea caldera and the upper east rift. In this paper, we first compare the results obtained using MONTEILLER et al.'s algorithm with those obtained using the classic travel-time tomographic approach. Then, we evaluated the effect of using data series of different accuracies, such as handpicked arrival-time differences ("picking differences"), on the results produced by double-difference tomographic algorithms. We show that picking differences have a non-Gaussian probability density function (pdf). Using a hyperbolic secant pdf instead of a Gaussian pdf allows improvement of the double-difference tomographic result when using picking difference data. We completed our study by investigating the use of spatially discontinuous time-delay data. ?? Birkha??user Verlag, Basel, 2006.

  19. The exact analysis of contingency tables in medical research.

    PubMed

    Mehta, C R

    1994-01-01

    A unified view of exact nonparametric inference, with special emphasis on data in the form of contingency tables, is presented. While the concept of exact tests has been in existence since the early work of RA Fisher, the computational complexity involved in actually executing such tests precluded their use until fairly recently. Modern algorithmic advances, combined with the easy availability of inexpensive computing power, has renewed interest in exact methods of inference, especially because they remain valid in the face of small, sparse, imbalanced, or heavily tied data. After defining exact p-values in terms of the permutation principle, we reference algorithms for computing them. Several data sets are then analysed by both exact and asymptotic methods. We end with a discussion of the available software.

  20. Search for Patterns of Functional Specificity in the Brain: A Nonparametric Hierarchical Bayesian Model for Group fMRI Data

    PubMed Central

    Sridharan, Ramesh; Vul, Edward; Hsieh, Po-Jang; Kanwisher, Nancy; Golland, Polina

    2012-01-01

    Functional MRI studies have uncovered a number of brain areas that demonstrate highly specific functional patterns. In the case of visual object recognition, small, focal regions have been characterized with selectivity for visual categories such as human faces. In this paper, we develop an algorithm that automatically learns patterns of functional specificity from fMRI data in a group of subjects. The method does not require spatial alignment of functional images from different subjects. The algorithm is based on a generative model that comprises two main layers. At the lower level, we express the functional brain response to each stimulus as a binary activation variable. At the next level, we define a prior over sets of activation variables in all subjects. We use a Hierarchical Dirichlet Process as the prior in order to learn the patterns of functional specificity shared across the group, which we call functional systems, and estimate the number of these systems. Inference based on our model enables automatic discovery and characterization of dominant and consistent functional systems. We apply the method to data from a visual fMRI study comprised of 69 distinct stimulus images. The discovered system activation profiles correspond to selectivity for a number of image categories such as faces, bodies, and scenes. Among systems found by our method, we identify new areas that are deactivated by face stimuli. In empirical comparisons with perviously proposed exploratory methods, our results appear superior in capturing the structure in the space of visual categories of stimuli. PMID:21884803

  1. Boosting Bayesian parameter inference of nonlinear stochastic differential equation models by Hamiltonian scale separation.

    PubMed

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.

  2. Recursive regularization for inferring gene networks from time-course gene expression profiles

    PubMed Central

    Shimamura, Teppei; Imoto, Seiya; Yamaguchi, Rui; Fujita, André; Nagasaki, Masao; Miyano, Satoru

    2009-01-01

    Background Inferring gene networks from time-course microarray experiments with vector autoregressive (VAR) model is the process of identifying functional associations between genes through multivariate time series. This problem can be cast as a variable selection problem in Statistics. One of the promising methods for variable selection is the elastic net proposed by Zou and Hastie (2005). However, VAR modeling with the elastic net succeeds in increasing the number of true positives while it also results in increasing the number of false positives. Results By incorporating relative importance of the VAR coefficients into the elastic net, we propose a new class of regularization, called recursive elastic net, to increase the capability of the elastic net and estimate gene networks based on the VAR model. The recursive elastic net can reduce the number of false positives gradually by updating the importance. Numerical simulations and comparisons demonstrate that the proposed method succeeds in reducing the number of false positives drastically while keeping the high number of true positives in the network inference and achieves two or more times higher true discovery rate (the proportion of true positives among the selected edges) than the competing methods even when the number of time points is small. We also compared our method with various reverse-engineering algorithms on experimental data of MCF-7 breast cancer cells stimulated with two ErbB ligands, EGF and HRG. Conclusion The recursive elastic net is a powerful tool for inferring gene networks from time-course gene expression profiles. PMID:19386091

  3. Supervised variational model with statistical inference and its application in medical image segmentation.

    PubMed

    Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David

    2015-01-01

    Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.

  4. Methods to assess an exercise intervention trial based on 3-level functional data.

    PubMed

    Li, Haocheng; Kozey Keadle, Sarah; Staudenmayer, John; Assaad, Houssein; Huang, Jianhua Z; Carroll, Raymond J

    2015-10-01

    Motivated by data recording the effects of an exercise intervention on subjects' physical activity over time, we develop a model to assess the effects of a treatment when the data are functional with 3 levels (subjects, weeks and days in our application) and possibly incomplete. We develop a model with 3-level mean structure effects, all stratified by treatment and subject random effects, including a general subject effect and nested effects for the 3 levels. The mean and random structures are specified as smooth curves measured at various time points. The association structure of the 3-level data is induced through the random curves, which are summarized using a few important principal components. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed effects model framework for model fitting, prediction and inference. We develop an algorithm to fit the model iteratively with the Expectation/Conditional Maximization Either (ECME) version of the EM algorithm and eigenvalue decompositions. Selection of the number of principal components and handling incomplete data issues are incorporated into the algorithm. The performance of the Wald-type hypothesis test is also discussed. The method is applied to the physical activity data and evaluated empirically by a simulation study. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Making predictions in a changing world-inference, uncertainty, and learning.

    PubMed

    O'Reilly, Jill X

    2013-01-01

    To function effectively, brains need to make predictions about their environment based on past experience, i.e., they need to learn about their environment. The algorithms by which learning occurs are of interest to neuroscientists, both in their own right (because they exist in the brain) and as a tool to model participants' incomplete knowledge of task parameters and hence, to better understand their behavior. This review focusses on a particular challenge for learning algorithms-how to match the rate at which they learn to the rate of change in the environment, so that they use as much observed data as possible whilst disregarding irrelevant, old observations. To do this algorithms must evaluate whether the environment is changing. We discuss the concepts of likelihood, priors and transition functions, and how these relate to change detection. We review expected and estimation uncertainty, and how these relate to change detection and learning rate. Finally, we consider the neural correlates of uncertainty and learning. We argue that the neural correlates of uncertainty bear a resemblance to neural systems that are active when agents actively explore their environments, suggesting that the mechanisms by which the rate of learning is set may be subject to top down control (in circumstances when agents actively seek new information) as well as bottom up control (by observations that imply change in the environment).

  6. Analysis of estimation algorithms for CDTI and CAS applications

    NASA Technical Reports Server (NTRS)

    Goka, T.

    1985-01-01

    Estimation algorithms for Cockpit Display of Traffic Information (CDTI) and Collision Avoidance System (CAS) applications were analyzed and/or developed. The algorithms are based on actual or projected operational and performance characteristics of an Enhanced TCAS II traffic sensor developed by Bendix and the Federal Aviation Administration. Three algorithm areas are examined and discussed. These are horizontal x and y, range and altitude estimation algorithms. Raw estimation errors are quantified using Monte Carlo simulations developed for each application; the raw errors are then used to infer impacts on the CDTI and CAS applications. Applications of smoothing algorithms to CDTI problems are also discussed briefly. Technical conclusions are summarized based on the analysis of simulation results.

  7. Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles

    PubMed Central

    Thaden, Joshua T; Mogno, Ilaria; Wierzbowski, Jamey; Cottarel, Guillaume; Kasif, Simon; Collins, James J; Gardner, Timothy S

    2007-01-01

    Machine learning approaches offer the potential to systematically identify transcriptional regulatory interactions from a compendium of microarray expression profiles. However, experimental validation of the performance of these methods at the genome scale has remained elusive. Here we assess the global performance of four existing classes of inference algorithms using 445 Escherichia coli Affymetrix arrays and 3,216 known E. coli regulatory interactions from RegulonDB. We also developed and applied the context likelihood of relatedness (CLR) algorithm, a novel extension of the relevance networks class of algorithms. CLR demonstrates an average precision gain of 36% relative to the next-best performing algorithm. At a 60% true positive rate, CLR identifies 1,079 regulatory interactions, of which 338 were in the previously known network and 741 were novel predictions. We tested the predicted interactions for three transcription factors with chromatin immunoprecipitation, confirming 21 novel interactions and verifying our RegulonDB-based performance estimates. CLR also identified a regulatory link providing central metabolic control of iron transport, which we confirmed with real-time quantitative PCR. The compendium of expression data compiled in this study, coupled with RegulonDB, provides a valuable model system for further improvement of network inference algorithms using experimental data. PMID:17214507

  8. Rational approximations to rational models: alternative algorithms for category learning.

    PubMed

    Sanborn, Adam N; Griffiths, Thomas L; Navarro, Daniel J

    2010-10-01

    Rational models of cognition typically consider the abstract computational problems posed by the environment, assuming that people are capable of optimally solving those problems. This differs from more traditional formal models of cognition, which focus on the psychological processes responsible for behavior. A basic challenge for rational models is thus explaining how optimal solutions can be approximated by psychological processes. We outline a general strategy for answering this question, namely to explore the psychological plausibility of approximation algorithms developed in computer science and statistics. In particular, we argue that Monte Carlo methods provide a source of rational process models that connect optimal solutions to psychological processes. We support this argument through a detailed example, applying this approach to Anderson's (1990, 1991) rational model of categorization (RMC), which involves a particularly challenging computational problem. Drawing on a connection between the RMC and ideas from nonparametric Bayesian statistics, we propose 2 alternative algorithms for approximate inference in this model. The algorithms we consider include Gibbs sampling, a procedure appropriate when all stimuli are presented simultaneously, and particle filters, which sequentially approximate the posterior distribution with a small number of samples that are updated as new data become available. Applying these algorithms to several existing datasets shows that a particle filter with a single particle provides a good description of human inferences.

  9. Expectation propagation for large scale Bayesian inference of non-linear molecular networks from perturbation data.

    PubMed

    Narimani, Zahra; Beigy, Hamid; Ahmad, Ashar; Masoudi-Nejad, Ali; Fröhlich, Holger

    2017-01-01

    Inferring the structure of molecular networks from time series protein or gene expression data provides valuable information about the complex biological processes of the cell. Causal network structure inference has been approached using different methods in the past. Most causal network inference techniques, such as Dynamic Bayesian Networks and ordinary differential equations, are limited by their computational complexity and thus make large scale inference infeasible. This is specifically true if a Bayesian framework is applied in order to deal with the unavoidable uncertainty about the correct model. We devise a novel Bayesian network reverse engineering approach using ordinary differential equations with the ability to include non-linearity. Besides modeling arbitrary, possibly combinatorial and time dependent perturbations with unknown targets, one of our main contributions is the use of Expectation Propagation, an algorithm for approximate Bayesian inference over large scale network structures in short computation time. We further explore the possibility of integrating prior knowledge into network inference. We evaluate the proposed model on DREAM4 and DREAM8 data and find it competitive against several state-of-the-art existing network inference methods.

  10. Reliability analysis using an exponential power model with bathtub-shaped failure rate function: a Bayes study.

    PubMed

    Shehla, Romana; Khan, Athar Ali

    2016-01-01

    Models with bathtub-shaped hazard function have been widely accepted in the field of reliability and medicine and are particularly useful in reliability related decision making and cost analysis. In this paper, the exponential power model capable of assuming increasing as well as bathtub-shape, is studied. This article makes a Bayesian study of the same model and simultaneously shows how posterior simulations based on Markov chain Monte Carlo algorithms can be straightforward and routine in R. The study is carried out for complete as well as censored data, under the assumption of weakly-informative priors for the parameters. In addition to this, inference interest focuses on the posterior distribution of non-linear functions of the parameters. Also, the model has been extended to include continuous explanatory variables and R-codes are well illustrated. Two real data sets are considered for illustrative purposes.

  11. The sensing and perception subsystem of the NASA research telerobot

    NASA Technical Reports Server (NTRS)

    Wilcox, B.; Gennery, D. B.; Bon, B.; Litwin, T.

    1987-01-01

    A useful space telerobot for on-orbit assembly, maintenance, and repair tasks must have a sensing and perception subsystem which can provide the locations, orientations, and velocities of all relevant objects in the work environment. This function must be accomplished with sufficient speed and accuracy to permit effective grappling and manipulation. Appropriate symbolic names must be attached to each object for use by higher-level planning algorithms. Sensor data and inferences must be presented to the remote human operator in a way that is both comprehensible in ensuring safe autonomous operation and useful for direct teleoperation. Research at JPL toward these objectives is described.

  12. Comparative analysis of gut microbiota associated with body mass index in a large Korean cohort.

    PubMed

    Yun, Yeojun; Kim, Han-Na; Kim, Song E; Heo, Seong Gu; Chang, Yoosoo; Ryu, Seungho; Shin, Hocheol; Kim, Hyung-Lae

    2017-07-04

    Gut microbiota plays an important role in the harvesting, storage, and expenditure of energy obtained from one's diet. Our cross-sectional study aimed to identify differences in gut microbiota according to body mass index (BMI) in a Korean population. 16S rRNA gene sequence data from 1463 subjects were categorized by BMI into normal, overweight, and obese groups. Fecal microbiotas were compared to determine differences in diversity and functional inference analysis related with BMI. The correlation between genus-level microbiota and BMI was tested using zero-inflated Gaussian mixture models, with or without covariate adjustment of nutrient intake. We confirmed differences between 16Sr RNA gene sequencing data of each BMI group, with decreasing diversity in the obese compared with the normal group. According to analysis of inferred metagenomic functional content using PICRUSt algorithm, a highly significant discrepancy in metabolism and immune functions (P < 0.0001) was predicted in the obese group. Differential taxonomic components in each BMI group were greatly affected by nutrient adjustment, whereas signature bacteria were not influenced by nutrients in the obese compared with the overweight group. We found highly significant statistical differences between normal, overweight and obese groups using a large sample size with or without diet confounding factors. Our informative dataset sheds light on the epidemiological study on population microbiome.

  13. Ontological Problem-Solving Framework for Assigning Sensor Systems and Algorithms to High-Level Missions

    PubMed Central

    Qualls, Joseph; Russomanno, David J.

    2011-01-01

    The lack of knowledge models to represent sensor systems, algorithms, and missions makes opportunistically discovering a synthesis of systems and algorithms that can satisfy high-level mission specifications impractical. A novel ontological problem-solving framework has been designed that leverages knowledge models describing sensors, algorithms, and high-level missions to facilitate automated inference of assigning systems to subtasks that may satisfy a given mission specification. To demonstrate the efficacy of the ontological problem-solving architecture, a family of persistence surveillance sensor systems and algorithms has been instantiated in a prototype environment to demonstrate the assignment of systems to subtasks of high-level missions. PMID:22164081

  14. Bayesian inference based on stationary Fokker-Planck sampling.

    PubMed

    Berrones, Arturo

    2010-06-01

    A novel formalism for bayesian learning in the context of complex inference models is proposed. The method is based on the use of the stationary Fokker-Planck (SFP) approach to sample from the posterior density. Stationary Fokker-Planck sampling generalizes the Gibbs sampler algorithm for arbitrary and unknown conditional densities. By the SFP procedure, approximate analytical expressions for the conditionals and marginals of the posterior can be constructed. At each stage of SFP, the approximate conditionals are used to define a Gibbs sampling process, which is convergent to the full joint posterior. By the analytical marginals efficient learning methods in the context of artificial neural networks are outlined. Offline and incremental bayesian inference and maximum likelihood estimation from the posterior are performed in classification and regression examples. A comparison of SFP with other Monte Carlo strategies in the general problem of sampling from arbitrary densities is also presented. It is shown that SFP is able to jump large low-probability regions without the need of a careful tuning of any step-size parameter. In fact, the SFP method requires only a small set of meaningful parameters that can be selected following clear, problem-independent guidelines. The computation cost of SFP, measured in terms of loss function evaluations, grows linearly with the given model's dimension.

  15. Intelligent control for modeling of real-time reservoir operation, part II: artificial neural network with operating rule curves

    NASA Astrophysics Data System (ADS)

    Chang, Ya-Ting; Chang, Li-Chiu; Chang, Fi-John

    2005-04-01

    To bridge the gap between academic research and actual operation, we propose an intelligent control system for reservoir operation. The methodology includes two major processes, the knowledge acquired and implemented, and the inference system. In this study, a genetic algorithm (GA) and a fuzzy rule base (FRB) are used to extract knowledge based on the historical inflow data with a design objective function and on the operating rule curves respectively. The adaptive network-based fuzzy inference system (ANFIS) is then used to implement the knowledge, to create the fuzzy inference system, and then to estimate the optimal reservoir operation. To investigate its applicability and practicability, the Shihmen reservoir, Taiwan, is used as a case study. For the purpose of comparison, a simulation of the currently used M-5 operating rule curve is also performed. The results demonstrate that (1) the GA is an efficient way to search the optimal input-output patterns, (2) the FRB can extract the knowledge from the operating rule curves, and (3) the ANFIS models built on different types of knowledge can produce much better performance than the traditional M-5 curves in real-time reservoir operation. Moreover, we show that the model can be more intelligent for reservoir operation if more information (or knowledge) is involved.

  16. Discriminative Relational Topic Models.

    PubMed

    Chen, Ning; Zhu, Jun; Xia, Fei; Zhang, Bo

    2015-05-01

    Relational topic models (RTMs) provide a probabilistic generative process to describe both the link structure and document contents for document networks, and they have shown promise on predicting network structures and discovering latent topic representations. However, existing RTMs have limitations in both the restricted model expressiveness and incapability of dealing with imbalanced network data. To expand the scope and improve the inference accuracy of RTMs, this paper presents three extensions: 1) unlike the common link likelihood with a diagonal weight matrix that allows the-same-topic interactions only, we generalize it to use a full weight matrix that captures all pairwise topic interactions and is applicable to asymmetric networks; 2) instead of doing standard Bayesian inference, we perform regularized Bayesian inference (RegBayes) with a regularization parameter to deal with the imbalanced link structure issue in real networks and improve the discriminative ability of learned latent representations; and 3) instead of doing variational approximation with strict mean-field assumptions, we present collapsed Gibbs sampling algorithms for the generalized relational topic models by exploring data augmentation without making restricting assumptions. Under the generic RegBayes framework, we carefully investigate two popular discriminative loss functions, namely, the logistic log-loss and the max-margin hinge loss. Experimental results on several real network datasets demonstrate the significance of these extensions on improving prediction performance.

  17. Using field inversion to quantify functional errors in turbulence closures

    NASA Astrophysics Data System (ADS)

    Singh, Anand Pratap; Duraisamy, Karthik

    2016-04-01

    A data-informed approach is presented with the objective of quantifying errors and uncertainties in the functional forms of turbulence closure models. The approach creates modeling information from higher-fidelity simulations and experimental data. Specifically, a Bayesian formalism is adopted to infer discrepancies in the source terms of transport equations. A key enabling idea is the transformation of the functional inversion procedure (which is inherently infinite-dimensional) into a finite-dimensional problem in which the distribution of the unknown function is estimated at discrete mesh locations in the computational domain. This allows for the use of an efficient adjoint-driven inversion procedure. The output of the inversion is a full-field of discrepancy that provides hitherto inaccessible modeling information. The utility of the approach is demonstrated by applying it to a number of problems including channel flow, shock-boundary layer interactions, and flows with curvature and separation. In all these cases, the posterior model correlates well with the data. Furthermore, it is shown that even if limited data (such as surface pressures) are used, the accuracy of the inferred solution is improved over the entire computational domain. The results suggest that, by directly addressing the connection between physical data and model discrepancies, the field inversion approach materially enhances the value of computational and experimental data for model improvement. The resulting information can be used by the modeler as a guiding tool to design more accurate model forms, or serve as input to machine learning algorithms to directly replace deficient modeling terms.

  18. Automatic inference of multicellular regulatory networks using informative priors.

    PubMed

    Sun, Xiaoyun; Hong, Pengyu

    2009-01-01

    To fully understand the mechanisms governing animal development, computational models and algorithms are needed to enable quantitative studies of the underlying regulatory networks. We developed a mathematical model based on dynamic Bayesian networks to model multicellular regulatory networks that govern cell differentiation processes. A machine-learning method was developed to automatically infer such a model from heterogeneous data. We show that the model inference procedure can be greatly improved by incorporating interaction data across species. The proposed approach was applied to C. elegans vulval induction to reconstruct a model capable of simulating C. elegans vulval induction under 73 different genetic conditions.

  19. Unsupervised Network Analysis of the Plastic Supraoptic Nucleus Transcriptome Predicts Caprin2 Regulatory Interactions

    PubMed Central

    Jahans-Price, Thomas; Greenwood, Michael P.; Greenwood, Mingkwan; Hoe, See-Ziau; Konopacka, Agnieszka

    2017-01-01

    Abstract The supraoptic nucleus (SON) is a group of neurons in the hypothalamus responsible for the synthesis and secretion of the peptide hormones vasopressin and oxytocin. Following physiological cues, such as dehydration, salt-loading and lactation, the SON undergoes a function related plasticity that we have previously described in the rat at the transcriptome level. Using the unsupervised graphical lasso (Glasso) algorithm, we reconstructed a putative network from 500 plastic SON genes in which genes are the nodes and the edges are the inferred interactions. The most active nodal gene identified within the network was Caprin2. Caprin2 encodes an RNA-binding protein that we have previously shown to be vital for the functioning of osmoregulatory neuroendocrine neurons in the SON of the rat hypothalamus. To test the validity of the Glasso network, we either overexpressed or knocked down Caprin2 transcripts in differentiated rat pheochromocytoma PC12 cells and showed that these manipulations had significant opposite effects on the levels of putative target mRNAs. These studies suggest that the predicative power of the Glasso algorithm within an in vivo system is accurate, and identifies biological targets that may be important to the functional plasticity of the SON. PMID:29279858

  20. Enlightening discriminative network functional modules behind Principal Component Analysis separation in differential-omic science studies

    PubMed Central

    Ciucci, Sara; Ge, Yan; Durán, Claudio; Palladini, Alessandra; Jiménez-Jiménez, Víctor; Martínez-Sánchez, Luisa María; Wang, Yuting; Sales, Susanne; Shevchenko, Andrej; Poser, Steven W.; Herbig, Maik; Otto, Oliver; Androutsellis-Theotokis, Andreas; Guck, Jochen; Gerl, Mathias J.; Cannistraci, Carlo Vittorio

    2017-01-01

    Omic science is rapidly growing and one of the most employed techniques to explore differential patterns in omic datasets is principal component analysis (PCA). However, a method to enlighten the network of omic features that mostly contribute to the sample separation obtained by PCA is missing. An alternative is to build correlation networks between univariately-selected significant omic features, but this neglects the multivariate unsupervised feature compression responsible for the PCA sample segregation. Biologists and medical researchers often prefer effective methods that offer an immediate interpretation to complicated algorithms that in principle promise an improvement but in practice are difficult to be applied and interpreted. Here we present PC-corr: a simple algorithm that associates to any PCA segregation a discriminative network of features. Such network can be inspected in search of functional modules useful in the definition of combinatorial and multiscale biomarkers from multifaceted omic data in systems and precision biomedicine. We offer proofs of PC-corr efficacy on lipidomic, metagenomic, developmental genomic, population genetic, cancer promoteromic and cancer stem-cell mechanomic data. Finally, PC-corr is a general functional network inference approach that can be easily adopted for big data exploration in computer science and analysis of complex systems in physics. PMID:28287094

  1. PlaNet: Combined Sequence and Expression Comparisons across Plant Networks Derived from Seven Species[W][OA

    PubMed Central

    Mutwil, Marek; Klie, Sebastian; Tohge, Takayuki; Giorgi, Federico M.; Wilkins, Olivia; Campbell, Malcolm M.; Fernie, Alisdair R.; Usadel, Björn; Nikoloski, Zoran; Persson, Staffan

    2011-01-01

    The model organism Arabidopsis thaliana is readily used in basic research due to resource availability and relative speed of data acquisition. A major goal is to transfer acquired knowledge from Arabidopsis to crop species. However, the identification of functional equivalents of well-characterized Arabidopsis genes in other plants is a nontrivial task. It is well documented that transcriptionally coordinated genes tend to be functionally related and that such relationships may be conserved across different species and even kingdoms. To exploit such relationships, we constructed whole-genome coexpression networks for Arabidopsis and six important plant crop species. The interactive networks, clustered using the HCCA algorithm, are provided under the banner PlaNet (http://aranet.mpimp-golm.mpg.de). We implemented a comparative network algorithm that estimates similarities between network structures. Thus, the platform can be used to swiftly infer similar coexpressed network vicinities within and across species and can predict the identity of functional homologs. We exemplify this using the PSA-D and chalcone synthase-related gene networks. Finally, we assessed how ontology terms are transcriptionally connected in the seven species and provide the corresponding MapMan term coexpression networks. The data support the contention that this platform will considerably improve transfer of knowledge generated in Arabidopsis to valuable crop species. PMID:21441431

  2. Using a genetic algorithm to optimize a water-monitoring network for accuracy and cost effectiveness

    NASA Astrophysics Data System (ADS)

    Julich, R. J.

    2004-05-01

    The purpose of this project is to determine the optimal spatial distribution of water-monitoring wells to maximize important data collection and to minimize the cost of managing the network. We have employed a genetic algorithm (GA) towards this goal. The GA uses a simple fitness measure with two parts: the first part awards a maximal score to those combinations of hydraulic head observations whose net uncertainty is closest to the value representing all observations present, thereby maximizing accuracy; the second part applies a penalty function to minimize the number of observations, thereby minimizing the overall cost of the monitoring network. We used the linear statistical inference equation to calculate standard deviations on predictions from a numerical model generated for the 501-observation Death Valley Regional Flow System as the basis for our uncertainty calculations. We have organized the results to address the following three questions: 1) what is the optimal design strategy for a genetic algorithm to optimize this problem domain; 2) what is the consistency of solutions over several optimization runs; and 3) how do these results compare to what is known about the conceptual hydrogeology? Our results indicate the genetic algorithms are a more efficient and robust method for solving this class of optimization problems than have been traditional optimization approaches.

  3. Personalized recommendation via an improved NBI algorithm and user influence model in a Microblog network

    NASA Astrophysics Data System (ADS)

    Lian, Jie; Liu, Yun; Zhang, Zhen-jiang; Gui, Chang-ni

    2013-10-01

    Bipartite network based recommendations have attracted extensive attentions in recent years. Differing from traditional object-oriented recommendations, the recommendation in a Microblog network has two crucial differences. One is high authority users or one’s special friends usually play a very active role in tweet-oriented recommendation. The other is that the object in a Microblog network corresponds to a set of tweets on same topic instead of an actual and single entity, e.g. goods or movies in traditional networks. Thus repeat recommendations of the tweets in one’s collected topics are indispensable. Therefore, this paper improves network based inference (NBI) algorithm by original link matrix and link weight on resource allocation processes. This paper finally proposes the Microblog recommendation model based on the factors of improved network based inference and user influence model. Adjusting the weights of these two factors could generate the best recommendation results in algorithm accuracy and recommendation personalization.

  4. Data Analysis with Graphical Models: Software Tools

    NASA Technical Reports Server (NTRS)

    Buntine, Wray L.

    1994-01-01

    Probabilistic graphical models (directed and undirected Markov fields, and combined in chain graphs) are used widely in expert systems, image processing and other areas as a framework for representing and reasoning with probabilities. They come with corresponding algorithms for performing probabilistic inference. This paper discusses an extension to these models by Spiegelhalter and Gilks, plates, used to graphically model the notion of a sample. This offers a graphical specification language for representing data analysis problems. When combined with general methods for statistical inference, this also offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper outlines the framework and then presents some basic tools for the task: a graphical version of the Pitman-Koopman Theorem for the exponential family, problem decomposition, and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.

  5. Reproducibility of graph metrics of human brain structural networks.

    PubMed

    Duda, Jeffrey T; Cook, Philip A; Gee, James C

    2014-01-01

    Recent interest in human brain connectivity has led to the application of graph theoretical analysis to human brain structural networks, in particular white matter connectivity inferred from diffusion imaging and fiber tractography. While these methods have been used to study a variety of patient populations, there has been less examination of the reproducibility of these methods. A number of tractography algorithms exist and many of these are known to be sensitive to user-selected parameters. The methods used to derive a connectivity matrix from fiber tractography output may also influence the resulting graph metrics. Here we examine how these algorithm and parameter choices influence the reproducibility of proposed graph metrics on a publicly available test-retest dataset consisting of 21 healthy adults. The dice coefficient is used to examine topological similarity of constant density subgraphs both within and between subjects. Seven graph metrics are examined here: mean clustering coefficient, characteristic path length, largest connected component size, assortativity, global efficiency, local efficiency, and rich club coefficient. The reproducibility of these network summary measures is examined using the intraclass correlation coefficient (ICC). Graph curves are created by treating the graph metrics as functions of a parameter such as graph density. Functional data analysis techniques are used to examine differences in graph measures that result from the choice of fiber tracking algorithm. The graph metrics consistently showed good levels of reproducibility as measured with ICC, with the exception of some instability at low graph density levels. The global and local efficiency measures were the most robust to the choice of fiber tracking algorithm.

  6. Bayesian Inference for Functional Dynamics Exploring in fMRI Data.

    PubMed

    Guo, Xuan; Liu, Bing; Chen, Le; Chen, Guantao; Pan, Yi; Zhang, Jing

    2016-01-01

    This paper aims to review state-of-the-art Bayesian-inference-based methods applied to functional magnetic resonance imaging (fMRI) data. Particularly, we focus on one specific long-standing challenge in the computational modeling of fMRI datasets: how to effectively explore typical functional interactions from fMRI time series and the corresponding boundaries of temporal segments. Bayesian inference is a method of statistical inference which has been shown to be a powerful tool to encode dependence relationships among the variables with uncertainty. Here we provide an introduction to a group of Bayesian-inference-based methods for fMRI data analysis, which were designed to detect magnitude or functional connectivity change points and to infer their functional interaction patterns based on corresponding temporal boundaries. We also provide a comparison of three popular Bayesian models, that is, Bayesian Magnitude Change Point Model (BMCPM), Bayesian Connectivity Change Point Model (BCCPM), and Dynamic Bayesian Variable Partition Model (DBVPM), and give a summary of their applications. We envision that more delicate Bayesian inference models will be emerging and play increasingly important roles in modeling brain functions in the years to come.

  7. Open-Universe Theory for Bayesian Inference, Decision, and Sensing (OUTBIDS)

    DTIC Science & Technology

    2014-01-01

    using a novel dynamic programming algorithm [6]. The second allows for tensor data, in which observations at a given time step exhibit...unlimited. 5 We developed a dynamical tensor model that gives far better estimation and system- identification results than the standard vectorization...inference. Third, unlike prior work that learns different pieces of the model independently, use matching between 3D models and 2D views and/or voting

  8. A new method for automated high-dimensional lesion segmentation evaluated in vascular injury and applied to the human occipital lobe.

    PubMed

    Mah, Yee-Haur; Jager, Rolf; Kennard, Christopher; Husain, Masud; Nachev, Parashkev

    2014-07-01

    Making robust inferences about the functional neuroanatomy of the brain is critically dependent on experimental techniques that examine the consequences of focal loss of brain function. Unfortunately, the use of the most comprehensive such technique-lesion-function mapping-is complicated by the need for time-consuming and subjective manual delineation of the lesions, greatly limiting the practicability of the approach. Here we exploit a recently-described general measure of statistical anomaly, zeta, to devise a fully-automated, high-dimensional algorithm for identifying the parameters of lesions within a brain image given a reference set of normal brain images. We proceed to evaluate such an algorithm in the context of diffusion-weighted imaging of the commonest type of lesion used in neuroanatomical research: ischaemic damage. Summary performance metrics exceed those previously published for diffusion-weighted imaging and approach the current gold standard-manual segmentation-sufficiently closely for fully-automated lesion-mapping studies to become a possibility. We apply the new method to 435 unselected images of patients with ischaemic stroke to derive a probabilistic map of the pattern of damage in lesions involving the occipital lobe, demonstrating the variation of anatomical resolvability of occipital areas so as to guide future lesion-function studies of the region. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies.

    PubMed

    Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence

    2010-11-09

    Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.

  10. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation

    PubMed Central

    Scellier, Benjamin; Bengio, Yoshua

    2017-01-01

    We introduce Equilibrium Propagation, a learning framework for energy-based models. It involves only one kind of neural computation, performed in both the first phase (when the prediction is made) and the second phase of training (after the target or prediction error is revealed). Although this algorithm computes the gradient of an objective function just like Backpropagation, it does not need a special computation or circuit for the second phase, where errors are implicitly propagated. Equilibrium Propagation shares similarities with Contrastive Hebbian Learning and Contrastive Divergence while solving the theoretical issues of both algorithms: our algorithm computes the gradient of a well-defined objective function. Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point or stationary distribution) toward a configuration that reduces prediction error. In the case of a recurrent multi-layer supervised network, the output units are slightly nudged toward their target in the second phase, and the perturbation introduced at the output layer propagates backward in the hidden layers. We show that the signal “back-propagated” during this second phase corresponds to the propagation of error derivatives and encodes the gradient of the objective function, when the synaptic update corresponds to a standard form of spike-timing dependent plasticity. This work makes it more plausible that a mechanism similar to Backpropagation could be implemented by brains, since leaky integrator neural computation performs both inference and error back-propagation in our model. The only local difference between the two phases is whether synaptic changes are allowed or not. We also show experimentally that multi-layer recurrently connected networks with 1, 2, and 3 hidden layers can be trained by Equilibrium Propagation on the permutation-invariant MNIST task. PMID:28522969

  11. Scalable Probabilistic Inference for Global Seismic Monitoring

    NASA Astrophysics Data System (ADS)

    Arora, N. S.; Dear, T.; Russell, S.

    2011-12-01

    We describe a probabilistic generative model for seismic events, their transmission through the earth, and their detection (or mis-detection) at seismic stations. We also describe an inference algorithm that constructs the most probable event bulletin explaining the observed set of detections. The model and inference are called NET-VISA (network processing vertically integrated seismic analysis) and is designed to replace the current automated network processing at the IDC, the SEL3 bulletin. Our results (attached table) demonstrate that NET-VISA significantly outperforms SEL3 by reducing the missed events from 30.3% down to 12.5%. The difference is even more dramatic for smaller magnitude events. NET-VISA has no difficulty in locating nuclear explosions as well. The attached figure demonstrates the location predicted by NET-VISA versus other bulletins for the second DPRK event. Further evaluation on dense regional networks demonstrates that NET-VISA finds many events missed in the LEB bulletin, which is produced by the human analysts. Large aftershock sequences, as produced by the 2004 December Sumatra earthquake and the 2011 March Tohoku earthquake, can pose a significant load for automated processing, often delaying the IDC bulletins by weeks or months. Indeed these sequences can overload the serial NET-VISA inference as well. We describe an enhancement to NET-VISA to make it multi-threaded, and hence take full advantage of the processing power of multi-core and -cpu machines. Our experiments show that the new inference algorithm is able to achieve 80% efficiency in parallel speedup.

  12. New insights into galaxy structure from GALPHAT- I. Motivation, methodology and benchmarks for Sérsic models

    NASA Astrophysics Data System (ADS)

    Yoon, Ilsang; Weinberg, Martin D.; Katz, Neal

    2011-06-01

    We introduce a new galaxy image decomposition tool, GALPHAT (GALaxy PHotometric ATtributes), which is a front-end application of the Bayesian Inference Engine (BIE), a parallel Markov chain Monte Carlo package, to provide full posterior probability distributions and reliable confidence intervals for all model parameters. The BIE relies on GALPHAT to compute the likelihood function. GALPHAT generates scale-free cumulative image tables for the desired model family with precise error control. Interpolation of this table yields accurate pixellated images with any centre, scale and inclination angle. GALPHAT then rotates the image by position angle using a Fourier shift theorem, yielding high-speed, accurate likelihood computation. We benchmark this approach using an ensemble of simulated Sérsic model galaxies over a wide range of observational conditions: the signal-to-noise ratio S/N, the ratio of galaxy size to the point spread function (PSF) and the image size, and errors in the assumed PSF; and a range of structural parameters: the half-light radius re and the Sérsic index n. We characterize the strength of parameter covariance in the Sérsic model, which increases with S/N and n, and the results strongly motivate the need for the full posterior probability distribution in galaxy morphology analyses and later inferences. The test results for simulated galaxies successfully demonstrate that, with a careful choice of Markov chain Monte Carlo algorithms and fast model image generation, GALPHAT is a powerful analysis tool for reliably inferring morphological parameters from a large ensemble of galaxies over a wide range of different observational conditions.

  13. Task-evoked brain functional magnetic susceptibility mapping by independent component analysis (χICA).

    PubMed

    Chen, Zikuan; Calhoun, Vince D

    2016-03-01

    Conventionally, independent component analysis (ICA) is performed on an fMRI magnitude dataset to analyze brain functional mapping (AICA). By solving the inverse problem of fMRI, we can reconstruct the brain magnetic susceptibility (χ) functional states. Upon the reconstructed χ dataspace, we propose an ICA-based brain functional χ mapping method (χICA) to extract task-evoked brain functional map. A complex division algorithm is applied to a timeseries of fMRI phase images to extract temporal phase changes (relative to an OFF-state snapshot). A computed inverse MRI (CIMRI) model is used to reconstruct a 4D brain χ response dataset. χICA is implemented by applying a spatial InfoMax ICA algorithm to the reconstructed 4D χ dataspace. With finger-tapping experiments on a 7T system, the χICA-extracted χ-depicted functional map is similar to the SPM-inferred functional χ map by a spatial correlation of 0.67 ± 0.05. In comparison, the AICA-extracted magnitude-depicted map is correlated with the SPM magnitude map by 0.81 ± 0.05. The understanding of the inferiority of χICA to AICA for task-evoked functional map is an ongoing research topic. For task-evoked brain functional mapping, we compare the data-driven ICA method with the task-correlated SPM method. In particular, we compare χICA with AICA for extracting task-correlated timecourses and functional maps. χICA can extract a χ-depicted task-evoked brain functional map from a reconstructed χ dataspace without the knowledge about brain hemodynamic responses. The χICA-extracted brain functional χ map reveals a bidirectional BOLD response pattern that is unavailable (or different) from AICA. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Anisotropic Lithospheric layering in the North American craton, revealed by Bayesian inversion of short and long period data

    NASA Astrophysics Data System (ADS)

    Roy, Corinna; Calo, Marco; Bodin, Thomas; Romanowicz, Barbara

    2016-04-01

    Competing hypotheses for the formation and evolution of continents are highly under debate, including the theory of underplating by hot plumes or accretion by shallow subduction in continental or arc settings. In order to support these hypotheses, documenting structural layering in the cratonic lithosphere becomes especially important. Recent studies of seismic-wave receiver function data have detected a structural boundary under continental cratons at 100-140 km depths, which is too shallow to be consistent with the lithosphere-asthenosphere boundary, as inferred from seismic tomography and other geophysical studies. This leads to the conclusion that 1) the cratonic lithosphere may be thinner than expected, contradicting tomographic and other geophysical or geochemical inferences, or 2) that the receiver function studies detect a mid-lithospheric discontinuity rather than the LAB. On the other hand, several recent studies documented significant changes in the direction of azimuthal anisotropy with depth that suggest layering in the anisotropic structure of the stable part of the North American continent. In particular, Yuan and Romanowicz (2010) combined long period surface wave and overtone data with core refracted shear wave (SKS) splitting measurements in a joint tomographic inversion. A question that arises is whether the anisotropic layering observed coincides with that obtained from receiver function studies. To address this question, we use a trans-dimensional Markov-chain Monte Carlo (MCMC) algorithm to generate probabilistic 1D radially and azimuthal anisotropic shear wave velocity profiles for selected stations in North America. In the algorithm we jointly invert short period (Ps Receiver Functions, surface wave dispersion for Love and Rayleigh waves) and long period data (SKS waveforms). By including three different data types, which sample different volumes of the Earth and have different sensitivities to 
structure, we overcome the problem of incompatible interpretations of models provided by only one data set. The resulting 1D profiles include both isotropic and anisotropic discontinuities in the upper mantle (above 350 km depth). The huge advantage of our procedure is the avoidance of any intermediate processing steps such as numerical deconvolution or the calculation of splitting parameters, which can be very sensitive to noise. Additionally, the number of layers, as well as the data noise and the presence of anisotropy are treated as unknowns in the transdimensional Monte Carlo Markov chain algorithm. We recently demonstrated the power of this approach in the case of two stations located in different tectonic settings (Bodin et al., 2015, submitted). Here we extend this approach to a broader range of settings within the north American continent.

  15. Inferring gene ontologies from pairwise similarity data

    PubMed Central

    Kramer, Michael; Dutkowski, Janusz; Yu, Michael; Bafna, Vineet; Ideker, Trey

    2014-01-01

    Motivation: While the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene–gene pairwise similarities from -omics data;infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; andrespect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge—none has been evaluated for GO inference. Methods: We consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method’s ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast. Results: For task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20–25% precision, recall). Conclusion: This study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data. Contact: tideker@ucsd.edu PMID:24932003

  16. Inferring microRNA regulation of mRNA with partially ordered samples of paired expression data and exogenous prediction algorithms.

    PubMed

    Godsey, Brian; Heiser, Diane; Civin, Curt

    2012-01-01

    MicroRNAs (miRs) are known to play an important role in mRNA regulation, often by binding to complementary sequences in "target" mRNAs. Recently, several methods have been developed by which existing sequence-based target predictions can be combined with miR and mRNA expression data to infer true miR-mRNA targeting relationships. It has been shown that the combination of these two approaches gives more reliable results than either by itself. While a few such algorithms give excellent results, none fully addresses expression data sets with a natural ordering of the samples. If the samples in an experiment can be ordered or partially ordered by their expected similarity to one another, such as for time-series or studies of development processes, stages, or types, (e.g. cell type, disease, growth, aging), there are unique opportunities to infer miR-mRNA interactions that may be specific to the underlying processes, and existing methods do not exploit this. We propose an algorithm which specifically addresses [partially] ordered expression data and takes advantage of sample similarities based on the ordering structure. This is done within a Bayesian framework which specifies posterior distributions and therefore statistical significance for each model parameter and latent variable. We apply our model to a previously published expression data set of paired miR and mRNA arrays in five partially ordered conditions, with biological replicates, related to multiple myeloma, and we show how considering potential orderings can improve the inference of miR-mRNA interactions, as measured by existing knowledge about the involved transcripts.

  17. Numerical Differentiation Methods for Computing Error Covariance Matrices in Item Response Theory Modeling: An Evaluation and a New Proposal

    ERIC Educational Resources Information Center

    Tian, Wei; Cai, Li; Thissen, David; Xin, Tao

    2013-01-01

    In item response theory (IRT) modeling, the item parameter error covariance matrix plays a critical role in statistical inference procedures. When item parameters are estimated using the EM algorithm, the parameter error covariance matrix is not an automatic by-product of item calibration. Cai proposed the use of Supplemented EM algorithm for…

  18. Space-Time Joint Interference Cancellation Using Fuzzy-Inference-Based Adaptive Filtering Techniques in Frequency-Selective Multipath Channels

    NASA Astrophysics Data System (ADS)

    Hu, Chia-Chang; Lin, Hsuan-Yu; Chen, Yu-Fan; Wen, Jyh-Horng

    2006-12-01

    An adaptive minimum mean-square error (MMSE) array receiver based on the fuzzy-logic recursive least-squares (RLS) algorithm is developed for asynchronous DS-CDMA interference suppression in the presence of frequency-selective multipath fading. This receiver employs a fuzzy-logic control mechanism to perform the nonlinear mapping of the squared error and squared error variation, denoted by ([InlineEquation not available: see fulltext.],[InlineEquation not available: see fulltext.]), into a forgetting factor[InlineEquation not available: see fulltext.]. For the real-time applicability, a computationally efficient version of the proposed receiver is derived based on the least-mean-square (LMS) algorithm using the fuzzy-inference-controlled step-size[InlineEquation not available: see fulltext.]. This receiver is capable of providing both fast convergence/tracking capability as well as small steady-state misadjustment as compared with conventional LMS- and RLS-based MMSE DS-CDMA receivers. Simulations show that the fuzzy-logic LMS and RLS algorithms outperform, respectively, other variable step-size LMS (VSS-LMS) and variable forgetting factor RLS (VFF-RLS) algorithms at least 3 dB and 1.5 dB in bit-error-rate (BER) for multipath fading channels.

  19. On the Accuracy of Language Trees

    PubMed Central

    Pompei, Simone; Loreto, Vittorio; Tria, Francesca

    2011-01-01

    Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it. PMID:21674034

  20. Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology

    PubMed Central

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832

  1. Exact Bayesian Inference for Phylogenetic Birth-Death Models.

    PubMed

    Parag, K V; Pybus, O G

    2018-04-26

    Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.

  2. Bayesian networks in neuroscience: a survey.

    PubMed

    Bielza, Concha; Larrañaga, Pedro

    2014-01-01

    Bayesian networks are a type of probabilistic graphical models lie at the intersection between statistics and machine learning. They have been shown to be powerful tools to encode dependence relationships among the variables of a domain under uncertainty. Thanks to their generality, Bayesian networks can accommodate continuous and discrete variables, as well as temporal processes. In this paper we review Bayesian networks and how they can be learned automatically from data by means of structure learning algorithms. Also, we examine how a user can take advantage of these networks for reasoning by exact or approximate inference algorithms that propagate the given evidence through the graphical structure. Despite their applicability in many fields, they have been little used in neuroscience, where they have focused on specific problems, like functional connectivity analysis from neuroimaging data. Here we survey key research in neuroscience where Bayesian networks have been used with different aims: discover associations between variables, perform probabilistic reasoning over the model, and classify new observations with and without supervision. The networks are learned from data of any kind-morphological, electrophysiological, -omics and neuroimaging-, thereby broadening the scope-molecular, cellular, structural, functional, cognitive and medical- of the brain aspects to be studied.

  3. Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses

    PubMed Central

    Zaneveld, Jesse R. R.; Thurber, Rebecca L. V.

    2014-01-01

    Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses. PMID:25202302

  4. Bayesian networks in neuroscience: a survey

    PubMed Central

    Bielza, Concha; Larrañaga, Pedro

    2014-01-01

    Bayesian networks are a type of probabilistic graphical models lie at the intersection between statistics and machine learning. They have been shown to be powerful tools to encode dependence relationships among the variables of a domain under uncertainty. Thanks to their generality, Bayesian networks can accommodate continuous and discrete variables, as well as temporal processes. In this paper we review Bayesian networks and how they can be learned automatically from data by means of structure learning algorithms. Also, we examine how a user can take advantage of these networks for reasoning by exact or approximate inference algorithms that propagate the given evidence through the graphical structure. Despite their applicability in many fields, they have been little used in neuroscience, where they have focused on specific problems, like functional connectivity analysis from neuroimaging data. Here we survey key research in neuroscience where Bayesian networks have been used with different aims: discover associations between variables, perform probabilistic reasoning over the model, and classify new observations with and without supervision. The networks are learned from data of any kind–morphological, electrophysiological, -omics and neuroimaging–, thereby broadening the scope–molecular, cellular, structural, functional, cognitive and medical– of the brain aspects to be studied. PMID:25360109

  5. IKAP: A heuristic framework for inference of kinase activities from Phosphoproteomics data.

    PubMed

    Mischnik, Marcel; Sacco, Francesca; Cox, Jürgen; Schneider, Hans-Christoph; Schäfer, Matthias; Hendlich, Manfred; Crowther, Daniel; Mann, Matthias; Klabunde, Thomas

    2016-02-01

    Phosphoproteomics measurements are widely applied in cellular biology to detect changes in signalling dynamics. However, due to the inherent complexity of phosphorylation patterns and the lack of knowledge on how phosphorylations are related to functions, it is often not possible to directly deduce protein activities from those measurements. Here, we present a heuristic machine learning algorithm that infers the activities of kinases from Phosphoproteomics data using kinase-target information from the PhosphoSitePlus database. By comparing the estimated kinase activity profiles to the measured phosphosite profiles, it is furthermore possible to derive the kinases that are most likely to phosphorylate the respective phosphosite. We apply our approach to published datasets of the human cell cycle generated from HeLaS3 cells, and insulin signalling dynamics in mouse hepatocytes. In the first case, we estimate the activities of 118 at six cell cycle stages and derive 94 new kinase-phosphosite links that can be validated through either database or motif information. In the second case, the activities of 143 kinases at eight time points are estimated and 49 new kinase-target links are derived. The algorithm is implemented in Matlab and be downloaded from github. It makes use of the Optimization and Statistics toolboxes. https://github.com/marcel-mischnik/IKAP.git. marcel.mischnik@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Limited utility of residue masking for positive-selection inference.

    PubMed

    Spielman, Stephanie J; Dawson, Eric T; Wilke, Claus O

    2014-09-01

    Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. BagReg: Protein inference through machine learning.

    PubMed

    Zhao, Can; Liu, Dao; Teng, Ben; He, Zengyou

    2015-08-01

    Protein inference from the identified peptides is of primary importance in the shotgun proteomics. The target of protein inference is to identify whether each candidate protein is truly present in the sample. To date, many computational methods have been proposed to solve this problem. However, there is still no method that can fully utilize the information hidden in the input data. In this article, we propose a learning-based method named BagReg for protein inference. The method firstly artificially extracts five features from the input data, and then chooses each feature as the class feature to separately build models to predict the presence probabilities of proteins. Finally, the weak results from five prediction models are aggregated to obtain the final result. We test our method on six public available data sets. The experimental results show that our method is superior to the state-of-the-art protein inference algorithms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Inferring Stop-Locations from WiFi.

    PubMed

    Wind, David Kofoed; Sapiezynski, Piotr; Furman, Magdalena Anna; Lehmann, Sune

    2016-01-01

    Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring 'stop locations' (places of stationarity) from GPS or CDR data, or on detection of state (static/active). In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants' GPS data as well as ground truth data collected during a two month period.

  9. Inferring Stop-Locations from WiFi

    PubMed Central

    Wind, David Kofoed; Sapiezynski, Piotr; Furman, Magdalena Anna; Lehmann, Sune

    2016-01-01

    Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring ‘stop locations’ (places of stationarity) from GPS or CDR data, or on detection of state (static/active). In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants’ GPS data as well as ground truth data collected during a two month period. PMID:26901663

  10. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling

    PubMed Central

    Shin, Junha; Lee, Insuk

    2015-01-01

    Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes. PMID:26394049

  11. Scaling Patterns of Natural Urban Places as a Rule for Enhancing Their Urban Functionality Using Trajectory Data

    NASA Astrophysics Data System (ADS)

    Jia, T.; Yu, X.

    2018-04-01

    With the availability of massive trajectory data, it is highly valuable to reveal their activity information for many domains such as understanding the functionality of urban regions. This article utilizes the scaling patterns of human activities to enhance functional distribution of natural urban places. Specifically, we proposed a temporal city clustering algorithm to aggregate the stopping locations into natural urban places, which are reported to follow remarkable power law distributions of sizes and obey a universal law of economy of scale on human interactions with urban infrastructure. Besides, we proposed a novel Bayesian inference model with damping factor to estimate the most likely POI type associated with a stopping location. Our results suggest that hot natural urban places could be effectively identified from their scaling patterns and their functionality can be very well enhanced. For instance, natural urban places containing airport or railway station can be highly stressed by accumulating the massive types of human activities.

  12. Mapping higher-order relations between brain structure and function with embedded vector representations of connectomes.

    PubMed

    Rosenthal, Gideon; Váša, František; Griffa, Alessandra; Hagmann, Patric; Amico, Enrico; Goñi, Joaquín; Avidan, Galia; Sporns, Olaf

    2018-06-05

    Connectomics generates comprehensive maps of brain networks, represented as nodes and their pairwise connections. The functional roles of nodes are defined by their direct and indirect connectivity with the rest of the network. However, the network context is not directly accessible at the level of individual nodes. Similar problems in language processing have been addressed with algorithms such as word2vec that create embeddings of words and their relations in a meaningful low-dimensional vector space. Here we apply this approach to create embedded vector representations of brain networks or connectome embeddings (CE). CE can characterize correspondence relations among brain regions, and can be used to infer links that are lacking from the original structural diffusion imaging, e.g., inter-hemispheric homotopic connections. Moreover, we construct predictive deep models of functional and structural connectivity, and simulate network-wide lesion effects using the face processing system as our application domain. We suggest that CE offers a novel approach to revealing relations between connectome structure and function.

  13. Modulation transfer function estimation of optical lens system by adaptive neuro-fuzzy methodology

    NASA Astrophysics Data System (ADS)

    Petković, Dalibor; Shamshirband, Shahaboddin; Pavlović, Nenad T.; Anuar, Nor Badrul; Kiah, Miss Laiha Mat

    2014-07-01

    The quantitative assessment of image quality is an important consideration in any type of imaging system. The modulation transfer function (MTF) is a graphical description of the sharpness and contrast of an imaging system or of its individual components. The MTF is also known and spatial frequency response. The MTF curve has different meanings according to the corresponding frequency. The MTF of an optical system specifies the contrast transmitted by the system as a function of image size, and is determined by the inherent optical properties of the system. In this study, the adaptive neuro-fuzzy (ANFIS) estimator is designed and adapted to estimate MTF value of the actual optical system. Neural network in ANFIS adjusts parameters of membership function in the fuzzy logic of the fuzzy inference system. The back propagation learning algorithm is used for training this network. This intelligent estimator is implemented using Matlab/Simulink and the performances are investigated. The simulation results presented in this paper show the effectiveness of the developed method.

  14. Bayesian inference and decision theory - A framework for decision making in natural resource management

    USGS Publications Warehouse

    Dorazio, R.M.; Johnson, F.A.

    2003-01-01

    Bayesian inference and decision theory may be used in the solution of relatively complex problems of natural resource management, owing to recent advances in statistical theory and computing. In particular, Markov chain Monte Carlo algorithms provide a computational framework for fitting models of adequate complexity and for evaluating the expected consequences of alternative management actions. We illustrate these features using an example based on management of waterfowl habitat.

  15. Machine Learning for Information Extraction in Informal Domains

    DTIC Science & Technology

    1998-11-01

    Bayes algorithm (BayeslDF), a hybrid of BayeslDF and the grammatical inference algo- rithm Alergia (BayesGI), and a relational learner (SRV). It...State-Merging Methods 59 4.1.3 Alergia 61 4.2 Inferring Transducers 62 4.3 Experiments 66 4.4 Discussion 72 Relational Learning for...65 4.5 Precision/recall results for Alergia and BayesG I on the speaker field, with the alphabet transducer produced using m-estimates, at various

  16. Mining User Dwell Time for Personalized Web Search Re-Ranking

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Songhua; Jiang, Hao; Lau, Francis

    We propose a personalized re-ranking algorithm through mining user dwell times derived from a user's previously online reading or browsing activities. We acquire document level user dwell times via a customized web browser, from which we then infer conceptword level user dwell times in order to understand a user's personal interest. According to the estimated concept word level user dwell times, our algorithm can estimate a user's potential dwell time over a new document, based on which personalized webpage re-ranking can be carried out. We compare the rankings produced by our algorithm with rankings generated by popular commercial search enginesmore » and a recently proposed personalized ranking algorithm. The results clearly show the superiority of our method. In this paper, we propose a new personalized webpage ranking algorithmthrough mining dwell times of a user. We introduce a quantitative model to derive concept word level user dwell times from the observed document level user dwell times. Once we have inferred a user's interest over the set of concept words the user has encountered in previous readings, we can then predict the user's potential dwell time over a new document. Such predicted user dwell time allows us to carry out personalized webpage re-ranking. To explore the effectiveness of our algorithm, we measured the performance of our algorithm under two conditions - one with a relatively limited amount of user dwell time data and the other with a doubled amount. Both evaluation cases put our algorithm for generating personalized webpage rankings to satisfy a user's personal preference ahead of those by Google, Yahoo!, and Bing, as well as a recent personalized webpage ranking algorithm.« less

  17. Predicting drug-target interactions by dual-network integrated logistic matrix factorization

    NASA Astrophysics Data System (ADS)

    Hao, Ming; Bryant, Stephen H.; Wang, Yanli

    2017-01-01

    In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.

  18. Analysing the Costs of Integrated Care: A Case on Model Selection for Chronic Care Purposes

    PubMed Central

    Sánchez-Pérez, Inma; Ibern, Pere; Coderch, Jordi; Inoriza, José María

    2016-01-01

    Background: The objective of this study is to investigate whether the algorithm proposed by Manning and Mullahy, a consolidated health economics procedure, can also be used to estimate individual costs for different groups of healthcare services in the context of integrated care. Methods: A cross-sectional study focused on the population of the Baix Empordà (Catalonia-Spain) for the year 2012 (N = 92,498 individuals). A set of individual cost models as a function of sex, age and morbidity burden were adjusted and individual healthcare costs were calculated using a retrospective full-costing system. The individual morbidity burden was inferred using the Clinical Risk Groups (CRG) patient classification system. Results: Depending on the characteristics of the data, and according to the algorithm criteria, the choice of model was a linear model on the log of costs or a generalized linear model with a log link. We checked for goodness of fit, accuracy, linear structure and heteroscedasticity for the models obtained. Conclusion: The proposed algorithm identified a set of suitable cost models for the distinct groups of services integrated care entails. The individual morbidity burden was found to be indispensable when allocating appropriate resources to targeted individuals. PMID:28316542

  19. Inferring anatomical therapeutic chemical (ATC) class of drugs using shortest path and random walk with restart algorithms.

    PubMed

    Chen, Lei; Liu, Tao; Zhao, Xian

    2018-06-01

    The anatomical therapeutic chemical (ATC) classification system is a widely accepted drug classification scheme. This system comprises five levels and includes several classes in each level. Drugs are classified into classes according to their therapeutic effects and characteristics. The first level includes 14 main classes. In this study, we proposed two network-based models to infer novel potential chemicals deemed to belong in the first level of ATC classification. To build these models, two large chemical networks were constructed using the chemical-chemical interaction information retrieved from the Search Tool for Interactions of Chemicals (STITCH). Two classic network algorithms, shortest path (SP) and random walk with restart (RWR) algorithms, were executed on the corresponding network to mine novel chemicals for each ATC class using the validated drugs in a class as seed nodes. Then, the obtained chemicals yielded by these two algorithms were further evaluated by a permutation test and an association test. The former can exclude chemicals produced by the structure of the network, i.e., false positive discoveries. By contrast, the latter identifies the most important chemicals that have strong associations with the ATC class. Comparisons indicated that the two models can provide quite dissimilar results, suggesting that the results yielded by one model can be essential supplements for those obtained by the other model. In addition, several representative inferred chemicals were analyzed to confirm the reliability of the results generated by the two models. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Estimation of white matter fiber parameters from compressed multiresolution diffusion MRI using sparse Bayesian learning.

    PubMed

    Pisharady, Pramod Kumar; Sotiropoulos, Stamatios N; Duarte-Carvajalino, Julio M; Sapiro, Guillermo; Lenglet, Christophe

    2018-02-15

    We present a sparse Bayesian unmixing algorithm BusineX: Bayesian Unmixing for Sparse Inference-based Estimation of Fiber Crossings (X), for estimation of white matter fiber parameters from compressed (under-sampled) diffusion MRI (dMRI) data. BusineX combines compressive sensing with linear unmixing and introduces sparsity to the previously proposed multiresolution data fusion algorithm RubiX, resulting in a method for improved reconstruction, especially from data with lower number of diffusion gradients. We formulate the estimation of fiber parameters as a sparse signal recovery problem and propose a linear unmixing framework with sparse Bayesian learning for the recovery of sparse signals, the fiber orientations and volume fractions. The data is modeled using a parametric spherical deconvolution approach and represented using a dictionary created with the exponential decay components along different possible diffusion directions. Volume fractions of fibers along these directions define the dictionary weights. The proposed sparse inference, which is based on the dictionary representation, considers the sparsity of fiber populations and exploits the spatial redundancy in data representation, thereby facilitating inference from under-sampled q-space. The algorithm improves parameter estimation from dMRI through data-dependent local learning of hyperparameters, at each voxel and for each possible fiber orientation, that moderate the strength of priors governing the parameter variances. Experimental results on synthetic and in-vivo data show improved accuracy with a lower uncertainty in fiber parameter estimates. BusineX resolves a higher number of second and third fiber crossings. For under-sampled data, the algorithm is also shown to produce more reliable estimates. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Classification-based reasoning

    NASA Technical Reports Server (NTRS)

    Gomez, Fernando; Segami, Carlos

    1991-01-01

    A representation formalism for N-ary relations, quantification, and definition of concepts is described. Three types of conditions are associated with the concepts: (1) necessary and sufficient properties, (2) contingent properties, and (3) necessary properties. Also explained is how complex chains of inferences can be accomplished by representing existentially quantified sentences, and concepts denoted by restrictive relative clauses as classification hierarchies. The representation structures that make possible the inferences are explained first, followed by the reasoning algorithms that draw the inferences from the knowledge structures. All the ideas explained have been implemented and are part of the information retrieval component of a program called Snowy. An appendix contains a brief session with the program.

  2. A Scalable Approach to Probabilistic Latent Space Inference of Large-Scale Networks

    PubMed Central

    Yin, Junming; Ho, Qirong; Xing, Eric P.

    2014-01-01

    We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction. PMID:25400487

  3. Implementing and analyzing the multi-threaded LP-inference

    NASA Astrophysics Data System (ADS)

    Bolotova, S. Yu; Trofimenko, E. V.; Leschinskaya, M. V.

    2018-03-01

    The logical production equations provide new possibilities for the backward inference optimization in intelligent production-type systems. The strategy of a relevant backward inference is aimed at minimization of a number of queries to external information source (either to a database or an interactive user). The idea of the method is based on the computing of initial preimages set and searching for the true preimage. The execution of each stage can be organized independently and in parallel and the actual work at a given stage can also be distributed between parallel computers. This paper is devoted to the parallel algorithms of the relevant inference based on the advanced scheme of the parallel computations “pipeline” which allows to increase the degree of parallelism. The author also provides some details of the LP-structures implementation.

  4. Evaluation of a fault tolerant system for an integrated avionics sensor configuration with TSRV flight data

    NASA Technical Reports Server (NTRS)

    Caglayan, A. K.; Godiwala, P. M.

    1985-01-01

    The performance analysis results of a fault inferring nonlinear detection system (FINDS) using sensor flight data for the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment is presented. First, a statistical analysis of the flight recorded sensor data was made in order to determine the characteristics of sensor inaccuracies. Next, modifications were made to the detection and decision functions in the FINDS algorithm in order to improve false alarm and failure detection performance under real modelling errors present in the flight data. Finally, the failure detection and false alarm performance of the FINDS algorithm were analyzed by injecting bias failures into fourteen sensor outputs over six repetitive runs of the five minute flight data. In general, the detection speed, failure level estimation, and false alarm performance showed a marked improvement over the previously reported simulation runs. In agreement with earlier results, detection speed was faster for filter measurement sensors soon as MLS than for filter input sensors such as flight control accelerometers.

  5. Evolving RBF neural networks for adaptive soft-sensor design.

    PubMed

    Alexandridis, Alex

    2013-12-01

    This work presents an adaptive framework for building soft-sensors based on radial basis function (RBF) neural network models. The adaptive fuzzy means algorithm is utilized in order to evolve an RBF network, which approximates the unknown system based on input-output data from it. The methodology gradually builds the RBF network model, based on two separate levels of adaptation: On the first level, the structure of the hidden layer is modified by adding or deleting RBF centers, while on the second level, the synaptic weights are adjusted with the recursive least squares with exponential forgetting algorithm. The proposed approach is tested on two different systems, namely a simulated nonlinear DC Motor and a real industrial reactor. The results show that the produced soft-sensors can be successfully applied to model the two nonlinear systems. A comparison with two different adaptive modeling techniques, namely a dynamic evolving neural-fuzzy inference system (DENFIS) and neural networks trained with online backpropagation, highlights the advantages of the proposed methodology.

  6. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures.

    PubMed

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/

  7. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures

    PubMed Central

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/ PMID:18971256

  8. Social Circles Detection from Ego Network and Profile Information

    DTIC Science & Technology

    2014-12-19

    response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing... algorithm used to infer k-clique communities is expo- nential, which makes this technique unfeasible when treating egonets with a large number of users...atic when considering RBMs. This inconvenient was positively solved implementing a sparsity treatment with the RBM algorithm . (ii) The ground truth was

  9. Order priors for Bayesian network discovery with an application to malware phylogeny

    DOE PAGES

    Oyen, Diane; Anderson, Blake; Sentz, Kari; ...

    2017-09-15

    Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less

  10. Order priors for Bayesian network discovery with an application to malware phylogeny

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oyen, Diane; Anderson, Blake; Sentz, Kari

    Here, Bayesian networks have been used extensively to model and discover dependency relationships among sets of random variables. We learn Bayesian network structure with a combination of human knowledge about the partial ordering of variables and statistical inference of conditional dependencies from observed data. Our approach leverages complementary information from human knowledge and inference from observed data to produce networks that reflect human beliefs about the system as well as to fit the observed data. Applying prior beliefs about partial orderings of variables is an approach distinctly different from existing methods that incorporate prior beliefs about direct dependencies (or edges)more » in a Bayesian network. We provide an efficient implementation of the partial-order prior in a Bayesian structure discovery learning algorithm, as well as an edge prior, showing that both priors meet the local modularity requirement necessary for an efficient Bayesian discovery algorithm. In benchmark studies, the partial-order prior improves the accuracy of Bayesian network structure learning as well as the edge prior, even though order priors are more general. Our primary motivation is in characterizing the evolution of families of malware to aid cyber security analysts. For the problem of malware phylogeny discovery, we find that our algorithm, compared to existing malware phylogeny algorithms, more accurately discovers true dependencies that are missed by other algorithms.« less

  11. The gene normalization task in BioCreative III

    PubMed Central

    2011-01-01

    Background We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). Results We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. Conclusions By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance. PMID:22151901

  12. The gene normalization task in BioCreative III.

    PubMed

    Lu, Zhiyong; Kao, Hung-Yu; Wei, Chih-Hsuan; Huang, Minlie; Liu, Jingchen; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Dai, Hong-Jie; Okazaki, Naoaki; Cho, Han-Cheol; Gerner, Martin; Solt, Illes; Agarwal, Shashank; Liu, Feifan; Vishnyakova, Dina; Ruch, Patrick; Romacker, Martin; Rinaldi, Fabio; Bhattacharya, Sanmitra; Srinivasan, Padmini; Liu, Hongfang; Torii, Manabu; Matos, Sergio; Campos, David; Verspoor, Karin; Livingston, Kevin M; Wilbur, W John

    2011-10-03

    We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance.

  13. Trial latencies estimation of event-related potentials in EEG by means of genetic algorithms

    NASA Astrophysics Data System (ADS)

    Da Pelo, P.; De Tommaso, M.; Monaco, A.; Stramaglia, S.; Bellotti, R.; Tangaro, S.

    2018-04-01

    Objective. Event-related potentials (ERPs) are usually obtained by averaging thus neglecting the trial-to-trial latency variability in cognitive electroencephalography (EEG) responses. As a consequence the shape and the peak amplitude of the averaged ERP are smeared and reduced, respectively, when the single-trial latencies show a relevant variability. To date, the majority of the methodologies for single-trial latencies inference are iterative schemes providing suboptimal solutions, the most commonly used being the Woody’s algorithm. Approach. In this study, a global approach is developed by introducing a fitness function whose global maximum corresponds to the set of latencies which renders the trial signals most aligned as possible. A suitable genetic algorithm has been implemented to solve the optimization problem, characterized by new genetic operators tailored to the present problem. Main results. The results, on simulated trials, showed that the proposed algorithm performs better than Woody’s algorithm in all conditions, at the cost of an increased computational complexity (justified by the improved quality of the solution). Application of the proposed approach on real data trials, resulted in an increased correlation between latencies and reaction times w.r.t. the output from RIDE method. Significance. The above mentioned results on simulated and real data indicate that the proposed method, providing a better estimate of single-trial latencies, will open the way to more accurate study of neural responses as well as to the issue of relating the variability of latencies to the proper cognitive and behavioural correlates.

  14. Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles

    PubMed Central

    Michailidis, George

    2014-01-01

    Reconstructing transcriptional regulatory networks is an important task in functional genomics. Data obtained from experiments that perturb genes by knockouts or RNA interference contain useful information for addressing this reconstruction problem. However, such data can be limited in size and/or are expensive to acquire. On the other hand, observational data of the organism in steady state (e.g., wild-type) are more readily available, but their informational content is inadequate for the task at hand. We develop a computational approach to appropriately utilize both data sources for estimating a regulatory network. The proposed approach is based on a three-step algorithm to estimate the underlying directed but cyclic network, that uses as input both perturbation screens and steady state gene expression data. In the first step, the algorithm determines causal orderings of the genes that are consistent with the perturbation data, by combining an exhaustive search method with a fast heuristic that in turn couples a Monte Carlo technique with a fast search algorithm. In the second step, for each obtained causal ordering, a regulatory network is estimated using a penalized likelihood based method, while in the third step a consensus network is constructed from the highest scored ones. Extensive computational experiments show that the algorithm performs well in reconstructing the underlying network and clearly outperforms competing approaches that rely only on a single data source. Further, it is established that the algorithm produces a consistent estimate of the regulatory network. PMID:24586224

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    La Russa, D

    Purpose: The purpose of this project is to develop a robust method of parameter estimation for a Poisson-based TCP model using Bayesian inference. Methods: Bayesian inference was performed using the PyMC3 probabilistic programming framework written in Python. A Poisson-based TCP regression model that accounts for clonogen proliferation was fit to observed rates of local relapse as a function of equivalent dose in 2 Gy fractions for a population of 623 stage-I non-small-cell lung cancer patients. The Slice Markov Chain Monte Carlo sampling algorithm was used to sample the posterior distributions, and was initiated using the maximum of the posterior distributionsmore » found by optimization. The calculation of TCP with each sample step required integration over the free parameter α, which was performed using an adaptive 24-point Gauss-Legendre quadrature. Convergence was verified via inspection of the trace plot and posterior distribution for each of the fit parameters, as well as with comparisons of the most probable parameter values with their respective maximum likelihood estimates. Results: Posterior distributions for α, the standard deviation of α (σ), the average tumour cell-doubling time (Td), and the repopulation delay time (Tk), were generated assuming α/β = 10 Gy, and a fixed clonogen density of 10{sup 7} cm−{sup 3}. Posterior predictive plots generated from samples from these posterior distributions are in excellent agreement with the observed rates of local relapse used in the Bayesian inference. The most probable values of the model parameters also agree well with maximum likelihood estimates. Conclusion: A robust method of performing Bayesian inference of TCP data using a complex TCP model has been established.« less

  16. Bridging the Gap between Human Judgment and Automated Reasoning in Predictive Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sanfilippo, Antonio P.; Riensche, Roderick M.; Unwin, Stephen D.

    2010-06-07

    Events occur daily that impact the health, security and sustainable growth of our society. If we are to address the challenges that emerge from these events, anticipatory reasoning has to become an everyday activity. Strong advances have been made in using integrated modeling for analysis and decision making. However, a wider impact of predictive analytics is currently hindered by the lack of systematic methods for integrating predictive inferences from computer models with human judgment. In this paper, we present a predictive analytics approach that supports anticipatory analysis and decision-making through a concerted reasoning effort that interleaves human judgment and automatedmore » inferences. We describe a systematic methodology for integrating modeling algorithms within a serious gaming environment in which role-playing by human agents provides updates to model nodes and the ensuing model outcomes in turn influence the behavior of the human players. The approach ensures a strong functional partnership between human players and computer models while maintaining a high degree of independence and greatly facilitating the connection between model and game structures.« less

  17. Integrating Genetic and Functional Genomic Data to Elucidate Common Disease Tra

    NASA Astrophysics Data System (ADS)

    Schadt, Eric

    2005-03-01

    The reconstruction of genetic networks in mammalian systems is one of the primary goals in biological research, especially as such reconstructions relate to elucidating not only common, polygenic human diseases, but living systems more generally. Here I present a statistical procedure for inferring causal relationships between gene expression traits and more classic clinical traits, including complex disease traits. This procedure has been generalized to the gene network reconstruction problem, where naturally occurring genetic variations in segregating mouse populations are used as a source of perturbations to elucidate tissue-specific gene networks. Differences in the extent of genetic control between genders and among four different tissues are highlighted. I also demonstrate that the networks derived from expression data in segregating mouse populations using the novel network reconstruction algorithm are able to capture causal associations between genes that result in increased predictive power, compared to more classically reconstructed networks derived from the same data. This approach to causal inference in large segregating mouse populations over multiple tissues not only elucidates fundamental aspects of transcriptional control, it also allows for the objective identification of key drivers of common human diseases.

  18. Inference of Spatio-Temporal Functions Over Graphs via Multikernel Kriged Kalman Filtering

    NASA Astrophysics Data System (ADS)

    Ioannidis, Vassilis N.; Romero, Daniel; Giannakis, Georgios B.

    2018-06-01

    Inference of space-time varying signals on graphs emerges naturally in a plethora of network science related applications. A frequently encountered challenge pertains to reconstructing such dynamic processes, given their values over a subset of vertices and time instants. The present paper develops a graph-aware kernel-based kriged Kalman filter that accounts for the spatio-temporal variations, and offers efficient online reconstruction, even for dynamically evolving network topologies. The kernel-based learning framework bypasses the need for statistical information by capitalizing on the smoothness that graph signals exhibit with respect to the underlying graph. To address the challenge of selecting the appropriate kernel, the proposed filter is combined with a multi-kernel selection module. Such a data-driven method selects a kernel attuned to the signal dynamics on-the-fly within the linear span of a pre-selected dictionary. The novel multi-kernel learning algorithm exploits the eigenstructure of Laplacian kernel matrices to reduce computational complexity. Numerical tests with synthetic and real data demonstrate the superior reconstruction performance of the novel approach relative to state-of-the-art alternatives.

  19. Spatio-temporal conditional inference and hypothesis tests for neural ensemble spiking precision

    PubMed Central

    Harrison, Matthew T.; Amarasingham, Asohan; Truccolo, Wilson

    2014-01-01

    The collective dynamics of neural ensembles create complex spike patterns with many spatial and temporal scales. Understanding the statistical structure of these patterns can help resolve fundamental questions about neural computation and neural dynamics. Spatio-temporal conditional inference (STCI) is introduced here as a semiparametric statistical framework for investigating the nature of precise spiking patterns from collections of neurons that is robust to arbitrarily complex and nonstationary coarse spiking dynamics. The main idea is to focus statistical modeling and inference, not on the full distribution of the data, but rather on families of conditional distributions of precise spiking given different types of coarse spiking. The framework is then used to develop families of hypothesis tests for probing the spatio-temporal precision of spiking patterns. Relationships among different conditional distributions are used to improve multiple hypothesis testing adjustments and to design novel Monte Carlo spike resampling algorithms. Of special note are algorithms that can locally jitter spike times while still preserving the instantaneous peri-stimulus time histogram (PSTH) or the instantaneous total spike count from a group of recorded neurons. The framework can also be used to test whether first-order maximum entropy models with possibly random and time-varying parameters can account for observed patterns of spiking. STCI provides a detailed example of the generic principle of conditional inference, which may be applicable in other areas of neurostatistical analysis. PMID:25380339

  20. A note on probabilistic models over strings: the linear algebra approach.

    PubMed

    Bouchard-Côté, Alexandre

    2013-12-01

    Probabilistic models over strings have played a key role in developing methods that take into consideration indels as phylogenetically informative events. There is an extensive literature on using automata and transducers on phylogenies to do inference on these probabilistic models, in which an important theoretical question is the complexity of computing the normalization of a class of string-valued graphical models. This question has been investigated using tools from combinatorics, dynamic programming, and graph theory, and has practical applications in Bayesian phylogenetics. In this work, we revisit this theoretical question from a different point of view, based on linear algebra. The main contribution is a set of results based on this linear algebra view that facilitate the analysis and design of inference algorithms on string-valued graphical models. As an illustration, we use this method to give a new elementary proof of a known result on the complexity of inference on the "TKF91" model, a well-known probabilistic model over strings. Compared to previous work, our proving method is easier to extend to other models, since it relies on a novel weak condition, triangular transducers, which is easy to establish in practice. The linear algebra view provides a concise way of describing transducer algorithms and their compositions, opens the possibility of transferring fast linear algebra libraries (for example, based on GPUs), as well as low rank matrix approximation methods, to string-valued inference problems.

  1. Topics in Bayesian Hierarchical Modeling and its Monte Carlo Computations

    NASA Astrophysics Data System (ADS)

    Tak, Hyung Suk

    The first chapter addresses a Beta-Binomial-Logit model that is a Beta-Binomial conjugate hierarchical model with covariate information incorporated via a logistic regression. Various researchers in the literature have unknowingly used improper posterior distributions or have given incorrect statements about posterior propriety because checking posterior propriety can be challenging due to the complicated functional form of a Beta-Binomial-Logit model. We derive data-dependent necessary and sufficient conditions for posterior propriety within a class of hyper-prior distributions that encompass those used in previous studies. Frequency coverage properties of several hyper-prior distributions are also investigated to see when and whether Bayesian interval estimates of random effects meet their nominal confidence levels. The second chapter deals with a time delay estimation problem in astrophysics. When the gravitational field of an intervening galaxy between a quasar and the Earth is strong enough to split light into two or more images, the time delay is defined as the difference between their travel times. The time delay can be used to constrain cosmological parameters and can be inferred from the time series of brightness data of each image. To estimate the time delay, we construct a Gaussian hierarchical model based on a state-space representation for irregularly observed time series generated by a latent continuous-time Ornstein-Uhlenbeck process. Our Bayesian approach jointly infers model parameters via a Gibbs sampler. We also introduce a profile likelihood of the time delay as an approximation of its marginal posterior distribution. The last chapter specifies a repelling-attracting Metropolis algorithm, a new Markov chain Monte Carlo method to explore multi-modal distributions in a simple and fast manner. This algorithm is essentially a Metropolis-Hastings algorithm with a proposal that consists of a downhill move in density that aims to make local modes repelling, followed by an uphill move in density that aims to make local modes attracting. The downhill move is achieved via a reciprocal Metropolis ratio so that the algorithm prefers downward movement. The uphill move does the opposite using the standard Metropolis ratio which prefers upward movement. This down-up movement in density increases the probability of a proposed move to a different mode.

  2. Positive selection and functional divergence of farnesyl pyrophosphate synthase genes in plants.

    PubMed

    Qian, Jieying; Liu, Yong; Chao, Naixia; Ma, Chengtong; Chen, Qicong; Sun, Jian; Wu, Yaosheng

    2017-02-04

    Farnesyl pyrophosphate synthase (FPS) belongs to the short-chain prenyltransferase family, and it performs a conserved and essential role in the terpenoid biosynthesis pathway. However, its classification, evolutionary history, and the forces driving the evolution of FPS genes in plants remain poorly understood. Phylogeny and positive selection analysis was used to identify the evolutionary forces that led to the functional divergence of FPS in plants, and recombinant detection was undertaken using the Genetic Algorithm for Recombination Detection (GARD) method. The dataset included 68 FPS variation pattern sequences (2 gymnosperms, 10 monocotyledons, 54 dicotyledons, and 2 outgroups). This study revealed that the FPS gene was under positive selection in plants. No recombinant within the FPS gene was found. Therefore, it was inferred that the positive selection of FPS had not been influenced by a recombinant episode. The positively selected sites were mainly located in the catalytic center and functional areas, which indicated that the 98S and 234D were important positively selected sites for plant FPS in the terpenoid biosynthesis pathway. They were located in the FPS conserved domain of the catalytic site. We inferred that the diversification of FPS genes was associated with functional divergence and could be driven by positive selection. It was clear that protein sequence evolution via positive selection was able to drive adaptive diversification in plant FPS proteins. This study provides information on the classification and positive selection of plant FPS genes, and the results could be useful for further research on the regulation of triterpenoid biosynthesis.

  3. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data.

    PubMed

    Chen, Shuonan; Mar, Jessica C

    2018-06-19

    A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.

  4. A Novel Adjustment Method for Shearer Traction Speed through Integration of T-S Cloud Inference Network and Improved PSO

    PubMed Central

    Si, Lei; Wang, Zhongbin; Yang, Yinwei

    2014-01-01

    In order to efficiently and accurately adjust the shearer traction speed, a novel approach based on Takagi-Sugeno (T-S) cloud inference network (CIN) and improved particle swarm optimization (IPSO) is proposed. The T-S CIN is built through the combination of cloud model and T-S fuzzy neural network. Moreover, the IPSO algorithm employs parameter automation adjustment strategy and velocity resetting to significantly improve the performance of basic PSO algorithm in global search and fine-tuning of the solutions, and the flowchart of proposed approach is designed. Furthermore, some simulation examples are carried out and comparison results indicate that the proposed method is feasible, efficient, and is outperforming others. Finally, an industrial application example of coal mining face is demonstrated to specify the effect of proposed system. PMID:25506358

  5. Penalized regression procedures for variable selection in the potential outcomes framework

    PubMed Central

    Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L.

    2015-01-01

    A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple ‘impute, then select’ class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems, and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data and imputation are drawn. A difference LASSO algorithm is defined, along with its multiple imputation analogues. The procedures are illustrated using a well-known right heart catheterization dataset. PMID:25628185

  6. Learning Effective Connectivity Network Structure from fMRI Data Based on Artificial Immune Algorithm

    PubMed Central

    Ji, Junzhong; Liu, Jinduo; Liang, Peipeng; Zhang, Aidong

    2016-01-01

    Many approaches have been designed to extract brain effective connectivity from functional magnetic resonance imaging (fMRI) data. However, few of them can effectively identify the connectivity network structure due to different defects. In this paper, a new algorithm is developed to infer the effective connectivity between different brain regions by combining artificial immune algorithm (AIA) with the Bayes net method, named as AIAEC. In the proposed algorithm, a brain effective connectivity network is mapped onto an antibody, and four immune operators are employed to perform the optimization process of antibodies, including clonal selection operator, crossover operator, mutation operator and suppression operator, and finally gets an antibody with the highest K2 score as the solution. AIAEC is then tested on Smith’s simulated datasets, and the effect of the different factors on AIAEC is evaluated, including the node number, session length, as well as the other potential confounding factors of the blood oxygen level dependent (BOLD) signal. It was revealed that, as contrast to other existing methods, AIAEC got the best performance on the majority of the datasets. It was also found that AIAEC could attain a relative better solution under the influence of many factors, although AIAEC was differently affected by the aforementioned factors. AIAEC is thus demonstrated to be an effective method for detecting the brain effective connectivity. PMID:27045295

  7. General Metropolis-Hastings jump diffusions for automatic target recognition in infrared scenes

    NASA Astrophysics Data System (ADS)

    Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.

    1997-04-01

    To locate and recognize ground-based targets in forward- looking IR (FLIR) images, 3D faceted models with associated pose parameters are formulated to accommodate the variability found in FLIR imagery. Taking a Bayesian approach, scenes are simulated from the emissive characteristics of the CAD models and compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. To accommodate scenes with variable numbers of targets, the posterior distribution is defined over parameter vectors of varying dimension. An inference algorithm based on Metropolis-Hastings jump- diffusion processes empirically samples from the posterior distribution, generating configurations of templates and transformations that match the collected sensor data with high probability. The jumps accommodate the addition and deletion of targets and the estimation of target identities; diffusions refine the hypotheses by drifting along the gradient of the posterior distribution with respect to the orientation and position parameters. Previous results on jumps strategies analogous to the Metropolis acceptance/rejection algorithm, with proposals drawn from the prior and accepted based on the likelihood, are extended to encompass general Metropolis-Hastings proposal densities. In particular, the algorithm proposes moves by drawing from the posterior distribution over computationally tractible subsets of the parameter space. The algorithm is illustrated by an implementation on a Silicon Graphics Onyx/Reality Engine.

  8. Orthology and paralogy constraints: satisfiability and consistency.

    PubMed

    Lafond, Manuel; El-Mabrouk, Nadia

    2014-01-01

    A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family  G. But is a given set  C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for  G? While previous studies have focused on full sets of constraints, here we consider the general case where  C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is  C satisfiable, i.e. can we find an event-labeled gene tree G inducing  C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.

  9. Orthology and paralogy constraints: satisfiability and consistency

    PubMed Central

    2014-01-01

    Background A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family  G. But is a given set  C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for  G? While previous studies have focused on full sets of constraints, here we consider the general case where  C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is  C satisfiable, i.e. can we find an event-labeled gene tree G inducing  C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Results Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships. PMID:25572629

  10. msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

    PubMed Central

    Gilad, Yoav; Pritchard, Jonathan K.; Stephens, Matthew

    2015-01-01

    Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. PMID:26406244

  11. msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.

    PubMed

    Raj, Anil; Shim, Heejung; Gilad, Yoav; Pritchard, Jonathan K; Stephens, Matthew

    2015-01-01

    Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.

  12. The VLF Scattering Pattern of Lightning-Induced Ionospheric Disturbances

    NASA Astrophysics Data System (ADS)

    Cohen, M.; Golkowski, M.

    2016-12-01

    Very Low Frequency (VLF) transmitter remote sensing is a well-employed technique to diagnose the impact of lightning on the D-region ionosphere, from the EMP, quasi-static charge, and radiation belt electron precipitation. When lightning disturbs the ionosphere, propagation of VLF (3-30 kHz) narrow-frequency signals through that region are subsequently scattered, which can be detected as transient changes in amplitude and phase at distant receivers. In principle it is possible to then infer the ionospheric disturbance but in practice this is difficult to do reliably. One of the challenges of this process is that VLF perturbations are like snowflakes - no two events are the same. The transmitter-receiver geometry, lightning properties, and ionospheric condition before the event, all impact the VLF scattering. This makes it very difficult, based on case studies which observe only one or two slivers at a time, to infer the scattering pattern of VLF events, and therefore, to infer what happened to the ionosphere. Our aim is to get around that by looking at a huge database of lightning-induced ionospheric disturbances, taken over several years of recordings. We utilize an automatic extraction algorithm to find, identify, and characterize VLF perturbations on a massive scale. From there, we can investigate how the VLF perturbations change as a function of the parameters of the event. If it turns out that there is exists a "canonical" lightning-induced disturbance as a function of geometry and lightning parameters, it will go a long way toward identifying the causative mechanisms and being able to accurately simulate and reproduce any lightning-induced ionospheric disturbance. We present results of our efforts to do just that.

  13. What kind of computation is intelligence. A framework for integrating different kinds of expertise

    NASA Technical Reports Server (NTRS)

    Chandrasekaran, B.

    1989-01-01

    The view that the deliberative aspect of intelligent behavior is a distinct type of algorithm; in particular, a goal-seeking exploratory process using qualitative representations of knowledge and inference is elaborated. There are other kinds of algorithms that also embody expertise in domains. The different types of expertise and how they can and should be integrated to give full account of expert behavior are discussed.

  14. Algorithmic detectability threshold of the stochastic block model

    NASA Astrophysics Data System (ADS)

    Kawamoto, Tatsuro

    2018-03-01

    The assumption that the values of model parameters are known or correctly learned, i.e., the Nishimori condition, is one of the requirements for the detectability analysis of the stochastic block model in statistical inference. In practice, however, there is no example demonstrating that we can know the model parameters beforehand, and there is no guarantee that the model parameters can be learned accurately. In this study, we consider the expectation-maximization (EM) algorithm with belief propagation (BP) and derive its algorithmic detectability threshold. Our analysis is not restricted to the community structure but includes general modular structures. Because the algorithm cannot always learn the planted model parameters correctly, the algorithmic detectability threshold is qualitatively different from the one with the Nishimori condition.

  15. Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.

    PubMed

    Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong

    2017-07-03

    Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.

  16. Reverse engineering a gene network using an asynchronous parallel evolution strategy

    PubMed Central

    2010-01-01

    Background The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task. Results Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speed-up for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested. Conclusions Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multi-core computing architectures. PMID:20196855

  17. Modeling Belt-Servomechanism by Chebyshev Functional Recurrent Neuro-Fuzzy Network

    NASA Astrophysics Data System (ADS)

    Huang, Yuan-Ruey; Kang, Yuan; Chu, Ming-Hui; Chang, Yeon-Pun

    A novel Chebyshev functional recurrent neuro-fuzzy (CFRNF) network is developed from a combination of the Takagi-Sugeno-Kang (TSK) fuzzy model and the Chebyshev recurrent neural network (CRNN). The CFRNF network can emulate the nonlinear dynamics of a servomechanism system. The system nonlinearity is addressed by enhancing the input dimensions of the consequent parts in the fuzzy rules due to functional expansion of a Chebyshev polynomial. The back propagation algorithm is used to adjust the parameters of the antecedent membership functions as well as those of consequent functions. To verify the performance of the proposed CFRNF, the experiment of the belt servomechanism is presented in this paper. Both of identification methods of adaptive neural fuzzy inference system (ANFIS) and recurrent neural network (RNN) are also studied for modeling of the belt servomechanism. The analysis and comparison results indicate that CFRNF makes identification of complex nonlinear dynamic systems easier. It is verified that the accuracy and convergence of the CFRNF are superior to those of ANFIS and RNN by the identification results of a belt servomechanism.

  18. Nonparametric Hierarchical Bayesian Model for Functional Brain Parcellation

    PubMed Central

    Lashkari, Danial; Sridharan, Ramesh; Vul, Edward; Hsieh, Po-Jang; Kanwisher, Nancy; Golland, Polina

    2011-01-01

    We develop a method for unsupervised analysis of functional brain images that learns group-level patterns of functional response. Our algorithm is based on a generative model that comprises two main layers. At the lower level, we express the functional brain response to each stimulus as a binary activation variable. At the next level, we define a prior over the sets of activation variables in all subjects. We use a Hierarchical Dirichlet Process as the prior in order to simultaneously learn the patterns of response that are shared across the group, and to estimate the number of these patterns supported by data. Inference based on this model enables automatic discovery and characterization of salient and consistent patterns in functional signals. We apply our method to data from a study that explores the response of the visual cortex to a collection of images. The discovered profiles of activation correspond to selectivity to a number of image categories such as faces, bodies, and scenes. More generally, our results appear superior to the results of alternative data-driven methods in capturing the category structure in the space of stimuli. PMID:21841977

  19. MontePython 3: Parameter inference code for cosmology

    NASA Astrophysics Data System (ADS)

    Brinckmann, Thejs; Lesgourgues, Julien; Audren, Benjamin; Benabed, Karim; Prunet, Simon

    2018-05-01

    MontePython 3 provides numerous ways to explore parameter space using Monte Carlo Markov Chain (MCMC) sampling, including Metropolis-Hastings, Nested Sampling, Cosmo Hammer, and a Fisher sampling method. This improved version of the Monte Python (ascl:1307.002) parameter inference code for cosmology offers new ingredients that improve the performance of Metropolis-Hastings sampling, speeding up convergence and offering significant time improvement in difficult runs. Additional likelihoods and plotting options are available, as are post-processing algorithms such as Importance Sampling and Adding Derived Parameter.

  20. Role of Utility and Inference in the Evolution of Functional Information

    PubMed Central

    Sharov, Alexei A.

    2009-01-01

    Functional information means an encoded network of functions in living organisms from molecular signaling pathways to an organism’s behavior. It is represented by two components: code and an interpretation system, which together form a self-sustaining semantic closure. Semantic closure allows some freedom between components because small variations of the code are still interpretable. The interpretation system consists of inference rules that control the correspondence between the code and the function (phenotype) and determines the shape of the fitness landscape. The utility factor operates at multiple time scales: short-term selection drives evolution towards higher survival and reproduction rate within a given fitness landscape, and long-term selection favors those fitness landscapes that support adaptability and lead to evolutionary expansion of certain lineages. Inference rules make short-term selection possible by shaping the fitness landscape and defining possible directions of evolution, but they are under control of the long-term selection of lineages. Communication normally occurs within a set of agents with compatible interpretation systems, which I call communication system. Functional information cannot be directly transferred between communication systems with incompatible inference rules. Each biological species is a genetic communication system that carries unique functional information together with inference rules that determine evolutionary directions and constraints. This view of the relation between utility and inference can resolve the conflict between realism/positivism and pragmatism. Realism overemphasizes the role of inference in evolution of human knowledge because it assumes that logic is embedded in reality. Pragmatism substitutes usefulness for truth and therefore ignores the advantage of inference. The proposed concept of evolutionary pragmatism rejects the idea that logic is embedded in reality; instead, inference rules are constructed within each communication system to represent reality and they evolve towards higher adaptability on a long time scale. PMID:20160960

  1. Subjective randomness as statistical inference.

    PubMed

    Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

    2018-06-01

    Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.

  2. Ozone levels in the Empty Quarter of Saudi Arabia--application of adaptive neuro-fuzzy model.

    PubMed

    Rahman, Syed Masiur; Khondaker, A N; Khan, Rouf Ahmad

    2013-05-01

    In arid regions, primary pollutants may contribute to the increase of ozone levels and cause negative effects on biotic health. This study investigates the use of adaptive neuro-fuzzy inference system (ANFIS) for ozone prediction. The initial fuzzy inference system is developed by using fuzzy C-means (FCM) and subtractive clustering (SC) algorithms, which determines the important rules, increases generalization capability of the fuzzy inference system, reduces computational needs, and ensures speedy model development. The study area is located in the Empty Quarter of Saudi Arabia, which is considered as a source of huge potential for oil and gas field development. The developed clustering algorithm-based ANFIS model used meteorological data and derived meteorological data, along with NO and NO₂ concentrations and their transformations, as inputs. The root mean square error and Willmott's index of agreement of the FCM- and SC-based ANFIS models are 3.5 ppbv and 0.99, and 8.9 ppbv and 0.95, respectively. Based on the analysis of the performance measures and regression error characteristic curves, it is concluded that the FCM-based ANFIS model outperforms the SC-based ANFIS model.

  3. First order augmentation to tensor voting for boundary inference and multiscale analysis in 3D.

    PubMed

    Tong, Wai-Shun; Tang, Chi-Keung; Mordohai, Philippos; Medioni, Gérard

    2004-05-01

    Most computer vision applications require the reliable detection of boundaries. In the presence of outliers, missing data, orientation discontinuities, and occlusion, this problem is particularly challenging. We propose to address it by complementing the tensor voting framework, which was limited to second order properties, with first order representation and voting. First order voting fields and a mechanism to vote for 3D surface and volume boundaries and curve endpoints in 3D are defined. Boundary inference is also useful for a second difficult problem in grouping, namely, automatic scale selection. We propose an algorithm that automatically infers the smallest scale that can preserve the finest details. Our algorithm then proceeds with progressively larger scales to ensure continuity where it has not been achieved. Therefore, the proposed approach does not oversmooth features or delay the handling of boundaries and discontinuities until model misfit occurs. The interaction of smooth features, boundaries, and outliers is accommodated by the unified representation, making possible the perceptual organization of data in curves, surfaces, volumes, and their boundaries simultaneously. We present results on a variety of data sets to show the efficacy of the improved formalism.

  4. PREMER: a Tool to Infer Biological Networks.

    PubMed

    Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R

    2017-10-04

    Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).

  5. Optimizing Monitoring Designs under Alternative Objectives

    DOE PAGES

    Gastelum, Jason A.; USA, Richland Washington; Porter, Ellen A.; ...

    2014-12-31

    This paper describes an approach to identify monitoring designs that optimize detection of CO2 leakage from a carbon capture and sequestration (CCS) reservoir and compares the results generated under two alternative objective functions. The first objective function minimizes the expected time to first detection of CO2 leakage, the second more conservative objective function minimizes the maximum time to leakage detection across the set of realizations. The approach applies a simulated annealing algorithm that searches the solution space by iteratively mutating the incumbent monitoring design. The approach takes into account uncertainty by evaluating the performance of potential monitoring designs across amore » set of simulated leakage realizations. The approach relies on a flexible two-tiered signature to infer that CO2 leakage has occurred. This research is part of the National Risk Assessment Partnership, a U.S. Department of Energy (DOE) project tasked with conducting risk and uncertainty analysis in the areas of reservoir performance, natural leakage pathways, wellbore integrity, groundwater protection, monitoring, and systems level modeling.« less

  6. An immune clock of human pregnancy

    PubMed Central

    Aghaeepour, Nima; Ganio, Edward A.; Mcilwain, David; Tsai, Amy S.; Tingle, Martha; Van Gassen, Sofie; Gaudilliere, Dyani K.; Baca, Quentin; McNeil, Leslie; Okada, Robin; Ghaemi, Mohammad S.; Furman, David; Wong, Ronald J.; Winn, Virginia D.; Druzin, Maurice L.; El-Sayed, Yaser Y.; Quaintance, Cecele; Gibbs, Ronald; Darmstadt, Gary L.; Shaw, Gary M.; Stevenson, David K.; Tibshirani, Robert; Nolan, Garry P.; Lewis, David B.; Angst, Martin S.; Gaudilliere, Brice

    2017-01-01

    The maintenance of pregnancy relies on finely tuned immune adaptations. We demonstrate that these adaptations are precisely timed, reflecting an immune clock of pregnancy in women delivering at term. Using mass cytometry, the abundance and functional responses of all major immune cell subsets were quantified in serial blood samples collected throughout pregnancy. Cell signaling–based Elastic Net, a regularized regression method adapted from the elastic net algorithm, was developed to infer and prospectively validate a predictive model of interrelated immune events that accurately captures the chronology of pregnancy. Model components highlighted existing knowledge and revealed previously unreported biology, including a critical role for the interleukin-2–dependent STAT5ab signaling pathway in modulating T cell function during pregnancy. These findings unravel the precise timing of immunological events occurring during a term pregnancy and provide the analytical framework to identify immunological deviations implicated in pregnancy-related pathologies. PMID:28864494

  7. Pathway analysis from lists of microRNAs: common pitfalls and alternative strategy

    PubMed Central

    Godard, Patrice; van Eyll, Jonathan

    2015-01-01

    MicroRNAs (miRNAs) are involved in the regulation of gene expression at a post-transcriptional level. As such, monitoring miRNA expression has been increasingly used to assess their role in regulatory mechanisms of biological processes. In large scale studies, once miRNAs of interest have been identified, the target genes they regulate are often inferred using algorithms or databases. A pathway analysis is then often performed in order to generate hypotheses about the relevant biological functions controlled by the miRNA signature. Here we show that the method widely used in scientific literature to identify these pathways is biased and leads to inaccurate results. In addition to describing the bias and its origin we present an alternative strategy to identify potential biological functions specifically impacted by a miRNA signature. More generally, our study exemplifies the crucial need of relevant negative controls when developing, and using, bioinformatics methods. PMID:25800743

  8. Fuzzy automata and pattern matching

    NASA Technical Reports Server (NTRS)

    Setzer, C. B.; Warsi, N. A.

    1986-01-01

    A wide-ranging search for articles and books concerned with fuzzy automata and syntactic pattern recognition is presented. A number of survey articles on image processing and feature detection were included. Hough's algorithm is presented to illustrate the way in which knowledge about an image can be used to interpret the details of the image. It was found that in hand generated pictures, the algorithm worked well on following the straight lines, but had great difficulty turning corners. An algorithm was developed which produces a minimal finite automaton recognizing a given finite set of strings. One difficulty of the construction is that, in some cases, this minimal automaton is not unique for a given set of strings and a given maximum length. This algorithm compares favorably with other inference algorithms. More importantly, the algorithm produces an automaton with a rigorously described relationship to the original set of strings that does not depend on the algorithm itself.

  9. A Multi-Class Proportional Myocontrol Algorithm for Upper Limb Prosthesis Control: Validation in Real-Life Scenarios on Amputees.

    PubMed

    Amsuess, Sebastian; Goebel, Peter; Graimann, Bernhard; Farina, Dario

    2015-09-01

    Functional replacement of upper limbs by means of dexterous prosthetic devices remains a technological challenge. While the mechanical design of prosthetic hands has advanced rapidly, the human-machine interfacing and the control strategies needed for the activation of multiple degrees of freedom are not reliable enough for restoring hand function successfully. Machine learning methods capable of inferring the user intent from EMG signals generated by the activation of the remnant muscles are regarded as a promising solution to this problem. However, the lack of robustness of the current methods impedes their routine clinical application. In this study, we propose a novel algorithm for controlling multiple degrees of freedom sequentially, inherently proportionally and with high robustness, allowing a good level of prosthetic hand function. The control algorithm is based on the spatial linear combinations of amplitude-related EMG signal features. The weighting coefficients in this combination are derived from the optimization criterion of the common spatial patterns filters which allow for maximal discriminability between movements. An important component of the study is the validation of the method which was performed on both able-bodied and amputee subjects who used physical prostheses with customized sockets and performed three standardized functional tests mimicking daily-life activities of varying difficulty. Moreover, the new method was compared in the same conditions with one clinical/industrial and one academic state-of-the-art method. The novel algorithm outperformed significantly the state-of-the-art techniques in both subject groups for tests that required the activation of more than one degree of freedom. Because of the evaluation in real time control on both able-bodied subjects and final users (amputees) wearing physical prostheses, the results obtained allow for the direct extrapolation of the benefits of the proposed method for the end users. In conclusion, the method proposed and validated in real-life use scenarios, allows the practical usability of multifunctional hand prostheses in an intuitive way, with significant advantages with respect to previous systems.

  10. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method

    PubMed Central

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-01-01

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli, and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs. PMID:29113310

  11. Inference of time-delayed gene regulatory networks based on dynamic Bayesian network hybrid learning method.

    PubMed

    Yu, Bin; Xu, Jia-Meng; Li, Shan; Chen, Cheng; Chen, Rui-Xin; Wang, Lei; Zhang, Yan; Wang, Ming-Hui

    2017-10-06

    Gene regulatory networks (GRNs) research reveals complex life phenomena from the perspective of gene interaction, which is an important research field in systems biology. Traditional Bayesian networks have a high computational complexity, and the network structure scoring model has a single feature. Information-based approaches cannot identify the direction of regulation. In order to make up for the shortcomings of the above methods, this paper presents a novel hybrid learning method (DBNCS) based on dynamic Bayesian network (DBN) to construct the multiple time-delayed GRNs for the first time, combining the comprehensive score (CS) with the DBN model. DBNCS algorithm first uses CMI2NI (conditional mutual inclusive information-based network inference) algorithm for network structure profiles learning, namely the construction of search space. Then the redundant regulations are removed by using the recursive optimization algorithm (RO), thereby reduce the false positive rate. Secondly, the network structure profiles are decomposed into a set of cliques without loss, which can significantly reduce the computational complexity. Finally, DBN model is used to identify the direction of gene regulation within the cliques and search for the optimal network structure. The performance of DBNCS algorithm is evaluated by the benchmark GRN datasets from DREAM challenge as well as the SOS DNA repair network in Escherichia coli , and compared with other state-of-the-art methods. The experimental results show the rationality of the algorithm design and the outstanding performance of the GRNs.

  12. Hierarchical Dirichlet process model for gene expression clustering

    PubMed Central

    2013-01-01

    Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. PMID:23587447

  13. Gene regulatory network inference from multifactorial perturbation data using both regression and correlation analyses.

    PubMed

    Xiong, Jie; Zhou, Tong

    2012-01-01

    An important problem in systems biology is to reconstruct gene regulatory networks (GRNs) from experimental data and other a priori information. The DREAM project offers some types of experimental data, such as knockout data, knockdown data, time series data, etc. Among them, multifactorial perturbation data are easier and less expensive to obtain than other types of experimental data and are thus more common in practice. In this article, a new algorithm is presented for the inference of GRNs using the DREAM4 multifactorial perturbation data. The GRN inference problem among [Formula: see text] genes is decomposed into [Formula: see text] different regression problems. In each of the regression problems, the expression level of a target gene is predicted solely from the expression level of a potential regulation gene. For different potential regulation genes, different weights for a specific target gene are constructed by using the sum of squared residuals and the Pearson correlation coefficient. Then these weights are normalized to reflect effort differences of regulating distinct genes. By appropriately choosing the parameters of the power law, we constructe a 0-1 integer programming problem. By solving this problem, direct regulation genes for an arbitrary gene can be estimated. And, the normalized weight of a gene is modified, on the basis of the estimation results about the existence of direct regulations to it. These normalized and modified weights are used in queuing the possibility of the existence of a corresponding direct regulation. Computation results with the DREAM4 In Silico Size 100 Multifactorial subchallenge show that estimation performances of the suggested algorithm can even outperform the best team. Using the real data provided by the DREAM5 Network Inference Challenge, estimation performances can be ranked third. Furthermore, the high precision of the obtained most reliable predictions shows the suggested algorithm may be helpful in guiding biological experiment designs.

  14. SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies

    PubMed Central

    Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe

    2012-01-01

    Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising solution to infer fine-scale genetic patterns. PMID:23077494

  15. Application of AI techniques to infer vegetation characteristics from directional reflectance(s)

    NASA Technical Reports Server (NTRS)

    Kimes, D. S.; Smith, J. A.; Harrison, P. A.; Harrison, P. R.

    1994-01-01

    Traditionally, the remote sensing community has relied totally on spectral knowledge to extract vegetation characteristics. However, there are other knowledge bases (KB's) that can be used to significantly improve the accuracy and robustness of inference techniques. Using AI (artificial intelligence) techniques a KB system (VEG) was developed that integrates input spectral measurements with diverse KB's. These KB's consist of data sets of directional reflectance measurements, knowledge from literature, and knowledge from experts which are combined into an intelligent and efficient system for making vegetation inferences. VEG accepts spectral data of an unknown target as input, determines the best techniques for inferring the desired vegetation characteristic(s), applies the techniques to the target data, and provides a rigorous estimate of the accuracy of the inference. VEG was developed to: infer spectral hemispherical reflectance from any combination of nadir and/or off-nadir view angles; infer percent ground cover from any combination of nadir and/or off-nadir view angles; infer unknown view angle(s) from known view angle(s) (known as view angle extension); and discriminate between user defined vegetation classes using spectral and directional reflectance relationships developed from an automated learning algorithm. The errors for these techniques were generally very good ranging between 2 to 15% (proportional root mean square). The system is designed to aid scientists in developing, testing, and applying new inference techniques using directional reflectance data.

  16. SoyNet: a database of co-functional networks for soybean Glycine max.

    PubMed

    Kim, Eiru; Hwang, Sohyun; Lee, Insuk

    2017-01-04

    Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Structural interpretation in composite systems using powder X-ray diffraction: applications of error propagation to the pair distribution function.

    PubMed

    Moore, Michael D; Shi, Zhenqi; Wildfong, Peter L D

    2010-12-01

    To develop a method for drawing statistical inferences from differences between multiple experimental pair distribution function (PDF) transforms of powder X-ray diffraction (PXRD) data. The appropriate treatment of initial PXRD error estimates using traditional error propagation algorithms was tested using Monte Carlo simulations on amorphous ketoconazole. An amorphous felodipine:polyvinyl pyrrolidone:vinyl acetate (PVPva) physical mixture was prepared to define an error threshold. Co-solidified products of felodipine:PVPva and terfenadine:PVPva were prepared using a melt-quench method and subsequently analyzed using PXRD and PDF. Differential scanning calorimetry (DSC) was used as an additional characterization method. The appropriate manipulation of initial PXRD error estimates through the PDF transform were confirmed using the Monte Carlo simulations for amorphous ketoconazole. The felodipine:PVPva physical mixture PDF analysis determined ±3σ to be an appropriate error threshold. Using the PDF and error propagation principles, the felodipine:PVPva co-solidified product was determined to be completely miscible, and the terfenadine:PVPva co-solidified product, although having appearances of an amorphous molecular solid dispersion by DSC, was determined to be phase-separated. Statistically based inferences were successfully drawn from PDF transforms of PXRD patterns obtained from composite systems. The principles applied herein may be universally adapted to many different systems and provide a fundamentally sound basis for drawing structural conclusions from PDF studies.

  18. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks.

    PubMed

    Ghanat Bari, Mehrab; Ung, Choong Yong; Zhang, Cheng; Zhu, Shizhen; Li, Hu

    2017-08-01

    Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 10 8 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

  19. Pathway-based personalized analysis of cancer

    PubMed Central

    Drier, Yotam; Sheffer, Michal; Domany, Eytan

    2013-01-01

    We introduce Pathifier, an algorithm that infers pathway deregulation scores for each tumor sample on the basis of expression data. This score is determined, in a context-specific manner, for every particular dataset and type of cancer that is being investigated. The algorithm transforms gene-level information into pathway-level information, generating a compact and biologically relevant representation of each sample. We demonstrate the algorithm’s performance on three colorectal cancer datasets and two glioblastoma multiforme datasets and show that our multipathway-based representation is reproducible, preserves much of the original information, and allows inference of complex biologically significant information. We discovered several pathways that were significantly associated with survival of glioblastoma patients and two whose scores are predictive of survival in colorectal cancer: CXCR3-mediated signaling and oxidative phosphorylation. We also identified a subclass of proneural and neural glioblastoma with significantly better survival, and an EGF receptor-deregulated subclass of colon cancers. PMID:23547110

  20. Online Updating of Statistical Inference in the Big Data Setting.

    PubMed

    Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

    2016-01-01

    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

  1. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    PubMed

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  2. Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer.

    PubMed

    Pepke, Shirley; Ver Steeg, Greg

    2017-03-15

    De novo inference of clinically relevant gene function relationships from tumor RNA-seq remains a challenging task. Current methods typically either partition patient samples into a few subtypes or rely upon analysis of pairwise gene correlations that will miss some groups in noisy data. Leveraging higher dimensional information can be expected to increase the power to discern targetable pathways, but this is commonly thought to be an intractable computational problem. In this work we adapt a recently developed machine learning algorithm for sensitive detection of complex gene relationships. The algorithm, CorEx, efficiently optimizes over multivariate mutual information and can be iteratively applied to generate a hierarchy of relatively independent latent factors. The learned latent factors are used to stratify patients for survival analysis with respect to both single factors and combinations. These analyses are performed and interpreted in the context of biological function annotations and protein network interactions that might be utilized to match patients to multiple therapies. Analysis of ovarian tumor RNA-seq samples demonstrates the algorithm's power to infer well over one hundred biologically interpretable gene cohorts, several times more than standard methods such as hierarchical clustering and k-means. The CorEx factor hierarchy is also informative, with related but distinct gene clusters grouped by upper nodes. Some latent factors correlate with patient survival, including one for a pathway connected with the epithelial-mesenchymal transition in breast cancer that is regulated by a microRNA that modulates epigenetics. Further, combinations of factors lead to a synergistic survival advantage in some cases. In contrast to studies that attempt to partition patients into a small number of subtypes (typically 4 or fewer) for treatment purposes, our approach utilizes subgroup information for combinatoric transcriptional phenotyping. Considering only the 66 gene expression groups that are found to both have significant Gene Ontology enrichment and are small enough to indicate specific drug targets implies a computational phenotype for ovarian cancer that allows for 3 66 possible patient profiles, enabling truly personalized treatment. The findings here demonstrate a new technique that sheds light on the complexity of gene expression dependencies in tumors and could eventually enable the use of patient RNA-seq profiles for selection of personalized and effective cancer treatments.

  3. Improving Classification of Cancer and Mining Biomarkers from Gene Expression Profiles Using Hybrid Optimization Algorithms and Fuzzy Support Vector Machine

    PubMed Central

    Moteghaed, Niloofar Yousefi; Maghooli, Keivan; Garshasbi, Masoud

    2018-01-01

    Background: Gene expression data are characteristically high dimensional with a small sample size in contrast to the feature size and variability inherent in biological processes that contribute to difficulties in analysis. Selection of highly discriminative features decreases the computational cost and complexity of the classifier and improves its reliability for prediction of a new class of samples. Methods: The present study used hybrid particle swarm optimization and genetic algorithms for gene selection and a fuzzy support vector machine (SVM) as the classifier. Fuzzy logic is used to infer the importance of each sample in the training phase and decrease the outlier sensitivity of the system to increase the ability to generalize the classifier. A decision-tree algorithm was applied to the most frequent genes to develop a set of rules for each type of cancer. This improved the abilities of the algorithm by finding the best parameters for the classifier during the training phase without the need for trial-and-error by the user. The proposed approach was tested on four benchmark gene expression profiles. Results: Good results have been demonstrated for the proposed algorithm. The classification accuracy for leukemia data is 100%, for colon cancer is 96.67% and for breast cancer is 98%. The results show that the best kernel used in training the SVM classifier is the radial basis function. Conclusions: The experimental results show that the proposed algorithm can decrease the dimensionality of the dataset, determine the most informative gene subset, and improve classification accuracy using the optimal parameters of the classifier with no user interface. PMID:29535919

  4. A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

    DOE PAGES

    Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

    2015-01-31

    Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less

  5. Data fusion in cyber security: first order entity extraction from common cyber data

    NASA Astrophysics Data System (ADS)

    Giacobe, Nicklaus A.

    2012-06-01

    The Joint Directors of Labs Data Fusion Process Model (JDL Model) provides a framework for how to handle sensor data to develop higher levels of inference in a complex environment. Beginning from a call to leverage data fusion techniques in intrusion detection, there have been a number of advances in the use of data fusion algorithms in this subdomain of cyber security. While it is tempting to jump directly to situation-level or threat-level refinement (levels 2 and 3) for more exciting inferences, a proper fusion process starts with lower levels of fusion in order to provide a basis for the higher fusion levels. The process begins with first order entity extraction, or the identification of important entities represented in the sensor data stream. Current cyber security operational tools and their associated data are explored for potential exploitation, identifying the first order entities that exist in the data and the properties of these entities that are described by the data. Cyber events that are represented in the data stream are added to the first order entities as their properties. This work explores typical cyber security data and the inferences that can be made at the lower fusion levels (0 and 1) with simple metrics. Depending on the types of events that are expected by the analyst, these relatively simple metrics can provide insight on their own, or could be used in fusion algorithms as a basis for higher levels of inference.

  6. Inferring action structure and causal relationships in continuous sequences of human action.

    PubMed

    Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare

    2015-02-01

    In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.

  7. GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML.

    PubMed

    Vlaic, Sebastian; Hoffmann, Bianca; Kupfer, Peter; Weber, Michael; Dräger, Andreas

    2013-09-01

    GRN2SBML automatically encodes gene regulatory networks derived from several inference tools in systems biology markup language. Providing a graphical user interface, the networks can be annotated via the simple object access protocol (SOAP)-based application programming interface of BioMart Central Portal and minimum information required in the annotation of models registry. Additionally, we provide an R-package, which processes the output of supported inference algorithms and automatically passes all required parameters to GRN2SBML. Therefore, GRN2SBML closes a gap in the processing pipeline between the inference of gene regulatory networks and their subsequent analysis, visualization and storage. GRN2SBML is freely available under the GNU Public License version 3 and can be downloaded from http://www.hki-jena.de/index.php/0/2/490. General information on GRN2SBML, examples and tutorials are available at the tool's web page.

  8. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo

    PubMed Central

    Golightly, Andrew; Wilkinson, Darren J.

    2011-01-01

    Computational systems biology is concerned with the development of detailed mechanistic models of biological processes. Such models are often stochastic and analytically intractable, containing uncertain parameters that must be estimated from time course data. In this article, we consider the task of inferring the parameters of a stochastic kinetic model defined as a Markov (jump) process. Inference for the parameters of complex nonlinear multivariate stochastic process models is a challenging problem, but we find here that algorithms based on particle Markov chain Monte Carlo turn out to be a very effective computationally intensive approach to the problem. Approximations to the inferential model based on stochastic differential equations (SDEs) are considered, as well as improvements to the inference scheme that exploit the SDE structure. We apply the methodology to a Lotka–Volterra system and a prokaryotic auto-regulatory network. PMID:23226583

  9. Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies

    PubMed Central

    2010-01-01

    Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data. PMID:21062443

  10. IZI: INFERRING THE GAS PHASE METALLICITY (Z) AND IONIZATION PARAMETER (q) OF IONIZED NEBULAE USING BAYESIAN STATISTICS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blanc, Guillermo A.; Kewley, Lisa; Vogt, Frédéric P. A.

    2015-01-10

    We present a new method for inferring the metallicity (Z) and ionization parameter (q) of H II regions and star-forming galaxies using strong nebular emission lines (SELs). We use Bayesian inference to derive the joint and marginalized posterior probability density functions for Z and q given a set of observed line fluxes and an input photoionization model. Our approach allows the use of arbitrary sets of SELs and the inclusion of flux upper limits. The method provides a self-consistent way of determining the physical conditions of ionized nebulae that is not tied to the arbitrary choice of a particular SELmore » diagnostic and uses all the available information. Unlike theoretically calibrated SEL diagnostics, the method is flexible and not tied to a particular photoionization model. We describe our algorithm, validate it against other methods, and present a tool that implements it called IZI. Using a sample of nearby extragalactic H II regions, we assess the performance of commonly used SEL abundance diagnostics. We also use a sample of 22 local H II regions having both direct and recombination line (RL) oxygen abundance measurements in the literature to study discrepancies in the abundance scale between different methods. We find that oxygen abundances derived through Bayesian inference using currently available photoionization models in the literature can be in good (∼30%) agreement with RL abundances, although some models perform significantly better than others. We also confirm that abundances measured using the direct method are typically ∼0.2 dex lower than both RL and photoionization-model-based abundances.« less

  11. Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

    NASA Astrophysics Data System (ADS)

    Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit

    2016-07-01

    A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.

  12. Longitudinal data analysis with non-ignorable missing data.

    PubMed

    Tseng, Chi-hong; Elashoff, Robert; Li, Ning; Li, Gang

    2016-02-01

    A common problem in the longitudinal data analysis is the missing data problem. Two types of missing patterns are generally considered in statistical literature: monotone and non-monotone missing data. Nonmonotone missing data occur when study participants intermittently miss scheduled visits, while monotone missing data can be from discontinued participation, loss to follow-up, and mortality. Although many novel statistical approaches have been developed to handle missing data in recent years, few methods are available to provide inferences to handle both types of missing data simultaneously. In this article, a latent random effects model is proposed to analyze longitudinal outcomes with both monotone and non-monotone missingness in the context of missing not at random. Another significant contribution of this article is to propose a new computational algorithm for latent random effects models. To reduce the computational burden of high-dimensional integration problem in latent random effects models, we develop a new computational algorithm that uses a new adaptive quadrature approach in conjunction with the Taylor series approximation for the likelihood function to simplify the E-step computation in the expectation-maximization algorithm. Simulation study is performed and the data from the scleroderma lung study are used to demonstrate the effectiveness of this method. © The Author(s) 2012.

  13. Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sandhu, Rimple; Poirel, Dominique; Pettit, Chris

    2016-07-01

    A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less

  14. Neighbourhood-consensus message passing and its potentials in image processing applications

    NASA Astrophysics Data System (ADS)

    Ružic, Tijana; Pižurica, Aleksandra; Philips, Wilfried

    2011-03-01

    In this paper, a novel algorithm for inference in Markov Random Fields (MRFs) is presented. Its goal is to find approximate maximum a posteriori estimates in a simple manner by combining neighbourhood influence of iterated conditional modes (ICM) and message passing of loopy belief propagation (LBP). We call the proposed method neighbourhood-consensus message passing because a single joint message is sent from the specified neighbourhood to the central node. The message, as a function of beliefs, represents the agreement of all nodes within the neighbourhood regarding the labels of the central node. This way we are able to overcome the disadvantages of reference algorithms, ICM and LBP. On one hand, more information is propagated in comparison with ICM, while on the other hand, the huge amount of pairwise interactions is avoided in comparison with LBP by working with neighbourhoods. The idea is related to the previously developed iterated conditional expectations algorithm. Here we revisit it and redefine it in a message passing framework in a more general form. The results on three different benchmarks demonstrate that the proposed technique can perform well both for binary and multi-label MRFs without any limitations on the model definition. Furthermore, it manifests improved performance over related techniques either in terms of quality and/or speed.

  15. Machine Learning in the Big Data Era: Are We There Yet?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sukumar, Sreenivas Rangan

    In this paper, we discuss the machine learning challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are machine learning algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstandingmore » challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security and healthcare to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.« less

  16. The effect of algorithms on copy number variant detection.

    PubMed

    Tsuang, Debby W; Millard, Steven P; Ely, Benjamin; Chi, Peter; Wang, Kenneth; Raskind, Wendy H; Kim, Sulgi; Brkanac, Zoran; Yu, Chang-En

    2010-12-30

    The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery. We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212. Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed.

  17. Applying dynamic Bayesian networks to perturbed gene expression data.

    PubMed

    Dojer, Norbert; Gambin, Anna; Mizera, Andrzej; Wilczyński, Bartek; Tiuryn, Jerzy

    2006-05-08

    A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the target of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks (DBN) apply to time series microarray data. To our knowledge the DBN technique has not been applied in the context of perturbation experiments. We extend the framework of dynamic Bayesian networks in order to incorporate perturbations. Moreover, an exact algorithm for inferring an optimal network is proposed and a discretization method specialized for time series data from perturbation experiments is introduced. We apply our procedure to realistic simulations data. The results are compared with those obtained by standard DBN learning techniques. Moreover, the advantages of using exact learning algorithm instead of heuristic methods are analyzed. We show that the quality of inferred networks dramatically improves when using data from perturbation experiments. We also conclude that the exact algorithm should be used when it is possible, i.e. when considered set of genes is small enough.

  18. SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2017-03-01

    We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).

  19. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules

    PubMed Central

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    2015-01-01

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods. PMID:25938136

  20. A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.

    PubMed

    Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos

    Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.

  1. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference.

    PubMed

    Bellot, Pau; Olsen, Catharina; Salembier, Philippe; Oliveras-Vergés, Albert; Meyer, Patrick E

    2015-09-29

    In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.

  2. Hardware Acceleration of Adaptive Neural Algorithms.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    James, Conrad D.

    As tradit ional numerical computing has faced challenges, researchers have turned towards alternative computing approaches to reduce power - per - computation metrics and improve algorithm performance. Here, we describe an approach towards non - conventional computing that strengthens the connection between machine learning and neuroscience concepts. The Hardware Acceleration of Adaptive Neural Algorithms (HAANA) project ha s develop ed neural machine learning algorithms and hardware for applications in image processing and cybersecurity. While machine learning methods are effective at extracting relevant features from many types of data, the effectiveness of these algorithms degrades when subjected to real - worldmore » conditions. Our team has generated novel neural - inspired approa ches to improve the resiliency and adaptability of machine learning algorithms. In addition, we have also designed and fabricated hardware architectures and microelectronic devices specifically tuned towards the training and inference operations of neural - inspired algorithms. Finally, our multi - scale simulation framework allows us to assess the impact of microelectronic device properties on algorithm performance.« less

  3. EIT image regularization by a new Multi-Objective Simulated Annealing algorithm.

    PubMed

    Castro Martins, Thiago; Sales Guerra Tsuzuki, Marcos

    2015-01-01

    Multi-Objective Optimization can be used to produce regularized Electrical Impedance Tomography (EIT) images where the weight of the regularization term is not known a priori. This paper proposes a novel Multi-Objective Optimization algorithm based on Simulated Annealing tailored for EIT image reconstruction. Images are reconstructed from experimental data and compared with images from other Multi and Single Objective optimization methods. A significant performance enhancement from traditional techniques can be inferred from the results.

  4. Critical bounds on noise and SNR for robust estimation of real-time brain activity from functional near infra-red spectroscopy.

    PubMed

    Aqil, Muhammad; Jeong, Myung Yung

    2018-04-24

    The robust characterization of real-time brain activity carries potential for many applications. However, the contamination of measured signals by various instrumental, environmental, and physiological sources of noise introduces a substantial amount of signal variance and, consequently, challenges real-time estimation of contributions from underlying neuronal sources. Functional near infra-red spectroscopy (fNIRS) is an emerging imaging modality whose real-time potential is yet to be fully explored. The objectives of the current study are to (i) validate a time-dependent linear model of hemodynamic responses in fNIRS, and (ii) test the robustness of this approach against measurement noise (instrumental and physiological) and mis-specification of the hemodynamic response basis functions (amplitude, latency, and duration). We propose a linear hemodynamic model with time-varying parameters, which are estimated (adapted and tracked) using a dynamic recursive least square algorithm. Owing to the linear nature of the activation model, the problem of achieving robust convergence to an accurate estimation of the model parameters is recast as a problem of parameter error stability around the origin. We show that robust convergence of the proposed method is guaranteed in the presence of an acceptable degree of model misspecification and we derive an upper bound on noise under which reliable parameters can still be inferred. We also derived a lower bound on signal-to-noise-ratio over which the reliable parameters can still be inferred from a channel/voxel. Whilst here applied to fNIRS, the proposed methodology is applicable to other hemodynamic-based imaging technologies such as functional magnetic resonance imaging. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Calibration with confidence: a principled method for panel assessment.

    PubMed

    MacKay, R S; Kenna, R; Low, R J; Parker, S

    2017-02-01

    Frequently, a set of objects has to be evaluated by a panel of assessors, but not every object is assessed by every assessor. A problem facing such panels is how to take into account different standards among panel members and varying levels of confidence in their scores. Here, a mathematically based algorithm is developed to calibrate the scores of such assessors, addressing both of these issues. The algorithm is based on the connectivity of the graph of assessors and objects evaluated, incorporating declared confidences as weights on its edges. If the graph is sufficiently well connected, relative standards can be inferred by comparing how assessors rate objects they assess in common, weighted by the levels of confidence of each assessment. By removing these biases, 'true' values are inferred for all the objects. Reliability estimates for the resulting values are obtained. The algorithm is tested in two case studies: one by computer simulation and another based on realistic evaluation data. The process is compared to the simple averaging procedure in widespread use, and to Fisher's additive incomplete block analysis. It is anticipated that the algorithm will prove useful in a wide variety of situations such as evaluation of the quality of research submitted to national assessment exercises; appraisal of grant proposals submitted to funding panels; ranking of job applicants; and judgement of performances on degree courses wherein candidates can choose from lists of options.

  6. Calibration with confidence: a principled method for panel assessment

    PubMed Central

    MacKay, R. S.; Low, R. J.; Parker, S.

    2017-01-01

    Frequently, a set of objects has to be evaluated by a panel of assessors, but not every object is assessed by every assessor. A problem facing such panels is how to take into account different standards among panel members and varying levels of confidence in their scores. Here, a mathematically based algorithm is developed to calibrate the scores of such assessors, addressing both of these issues. The algorithm is based on the connectivity of the graph of assessors and objects evaluated, incorporating declared confidences as weights on its edges. If the graph is sufficiently well connected, relative standards can be inferred by comparing how assessors rate objects they assess in common, weighted by the levels of confidence of each assessment. By removing these biases, ‘true’ values are inferred for all the objects. Reliability estimates for the resulting values are obtained. The algorithm is tested in two case studies: one by computer simulation and another based on realistic evaluation data. The process is compared to the simple averaging procedure in widespread use, and to Fisher's additive incomplete block analysis. It is anticipated that the algorithm will prove useful in a wide variety of situations such as evaluation of the quality of research submitted to national assessment exercises; appraisal of grant proposals submitted to funding panels; ranking of job applicants; and judgement of performances on degree courses wherein candidates can choose from lists of options. PMID:28386432

  7. A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks.

    PubMed

    Yue, Kun; Fang, Qiyu; Wang, Xiaoling; Li, Jin; Liu, Weiyi

    2015-12-01

    Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by pairs. Further, in view of the dynamic characteristics of the changing data, we give the concept of influence degree to measure the coincidence of the current BN with new data, and then propose the corresponding two-pass MapReduce-based algorithms for BNs incremental learning. Experimental results show the efficiency, scalability, and effectiveness of our methods.

  8. Robust Online Hamiltonian Learning

    NASA Astrophysics Data System (ADS)

    Granade, Christopher; Ferrie, Christopher; Wiebe, Nathan; Cory, David

    2013-05-01

    In this talk, we introduce a machine-learning algorithm for the problem of inferring the dynamical parameters of a quantum system, and discuss this algorithm in the example of estimating the precession frequency of a single qubit in a static field. Our algorithm is designed with practicality in mind by including parameters that control trade-offs between the requirements on computational and experimental resources. The algorithm can be implemented online, during experimental data collection, or can be used as a tool for post-processing. Most importantly, our algorithm is capable of learning Hamiltonian parameters even when the parameters change from experiment-to-experiment, and also when additional noise processes are present and unknown. Finally, we discuss the performance of the our algorithm by appeal to the Cramer-Rao bound. This work was financially supported by the Canadian government through NSERC and CERC and by the United States government through DARPA. NW would like to acknowledge funding from USARO-DTO.

  9. Sparse Bayesian Inference of White Matter Fiber Orientations from Compressed Multi-resolution Diffusion MRI

    PubMed Central

    Pisharady, Pramod Kumar; Duarte-Carvajalino, Julio M; Sotiropoulos, Stamatios N; Sapiro, Guillermo; Lenglet, Christophe

    2017-01-01

    The RubiX [1] algorithm combines high SNR characteristics of low resolution data with high spacial specificity of high resolution data, to extract microstructural tissue parameters from diffusion MRI. In this paper we focus on estimating crossing fiber orientations and introduce sparsity to the RubiX algorithm, making it suitable for reconstruction from compressed (under-sampled) data. We propose a sparse Bayesian algorithm for estimation of fiber orientations and volume fractions from compressed diffusion MRI. The data at high resolution is modeled using a parametric spherical deconvolution approach and represented using a dictionary created with the exponential decay components along different possible directions. Volume fractions of fibers along these orientations define the dictionary weights. The data at low resolution is modeled using a spatial partial volume representation. The proposed dictionary representation and sparsity priors consider the dependence between fiber orientations and the spatial redundancy in data representation. Our method exploits the sparsity of fiber orientations, therefore facilitating inference from under-sampled data. Experimental results show improved accuracy and decreased uncertainty in fiber orientation estimates. For under-sampled data, the proposed method is also shown to produce more robust estimates of fiber orientations. PMID:28845484

  10. Intelligent Diagnostic Assistant for Complicated Skin Diseases through C5's Algorithm.

    PubMed

    Jeddi, Fatemeh Rangraz; Arabfard, Masoud; Kermany, Zahra Arab

    2017-09-01

    Intelligent Diagnostic Assistant can be used for complicated diagnosis of skin diseases, which are among the most common causes of disability. The aim of this study was to design and implement a computerized intelligent diagnostic assistant for complicated skin diseases through C5's Algorithm. An applied-developmental study was done in 2015. Knowledge base was developed based on interviews with dermatologists through questionnaires and checklists. Knowledge representation was obtained from the train data in the database using Excel Microsoft Office. Clementine Software and C5's Algorithms were applied to draw the decision tree. Analysis of test accuracy was performed based on rules extracted using inference chains. The rules extracted from the decision tree were entered into the CLIPS programming environment and the intelligent diagnostic assistant was designed then. The rules were defined using forward chaining inference technique and were entered into Clips programming environment as RULE. The accuracy and error rates obtained in the training phase from the decision tree were 99.56% and 0.44%, respectively. The accuracy of the decision tree was 98% and the error was 2% in the test phase. Intelligent diagnostic assistant can be used as a reliable system with high accuracy, sensitivity, specificity, and agreement.

  11. Sparse Bayesian Inference of White Matter Fiber Orientations from Compressed Multi-resolution Diffusion MRI.

    PubMed

    Pisharady, Pramod Kumar; Duarte-Carvajalino, Julio M; Sotiropoulos, Stamatios N; Sapiro, Guillermo; Lenglet, Christophe

    2015-10-01

    The RubiX [1] algorithm combines high SNR characteristics of low resolution data with high spacial specificity of high resolution data, to extract microstructural tissue parameters from diffusion MRI. In this paper we focus on estimating crossing fiber orientations and introduce sparsity to the RubiX algorithm, making it suitable for reconstruction from compressed (under-sampled) data. We propose a sparse Bayesian algorithm for estimation of fiber orientations and volume fractions from compressed diffusion MRI. The data at high resolution is modeled using a parametric spherical deconvolution approach and represented using a dictionary created with the exponential decay components along different possible directions. Volume fractions of fibers along these orientations define the dictionary weights. The data at low resolution is modeled using a spatial partial volume representation. The proposed dictionary representation and sparsity priors consider the dependence between fiber orientations and the spatial redundancy in data representation. Our method exploits the sparsity of fiber orientations, therefore facilitating inference from under-sampled data. Experimental results show improved accuracy and decreased uncertainty in fiber orientation estimates. For under-sampled data, the proposed method is also shown to produce more robust estimates of fiber orientations.

  12. Applications of machine-learning algorithms for infrared colour selection of Galactic Wolf-Rayet stars

    NASA Astrophysics Data System (ADS)

    Morello, Giuseppe; Morris, P. W.; Van Dyk, S. D.; Marston, A. P.; Mauerhan, J. C.

    2018-01-01

    We have investigated and applied machine-learning algorithms for infrared colour selection of Galactic Wolf-Rayet (WR) candidates. Objects taken from the Spitzer Galactic Legacy Infrared Midplane Survey Extraordinaire (GLIMPSE) catalogue of the infrared objects in the Galactic plane can be classified into different stellar populations based on the colours inferred from their broad-band photometric magnitudes [J, H and Ks from 2 Micron All Sky Survey (2MASS), and the four Spitzer/IRAC bands]. The algorithms tested in this pilot study are variants of the k-nearest neighbours approach, which is ideal for exploratory studies of classification problems where interrelations between variables and classes are complicated. The aims of this study are (1) to provide an automated tool to select reliable WR candidates and potentially other classes of objects, (2) to measure the efficiency of infrared colour selection at performing these tasks and (3) to lay the groundwork for statistically inferring the total number of WR stars in our Galaxy. We report the performance results obtained over a set of known objects and selected candidates for which we have carried out follow-up spectroscopic observations, and confirm the discovery of four new WR stars.

  13. Bayesian pedigree inference with small numbers of single nucleotide polymorphisms via a factor-graph representation.

    PubMed

    Anderson, Eric C; Ng, Thomas C

    2016-02-01

    We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed. Published by Elsevier Inc.

  14. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data.

    PubMed

    Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S

    2015-02-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.

  15. Variable neighborhood search for reverse engineering of gene regulatory networks.

    PubMed

    Nicholson, Charles; Goodwin, Leslie; Clark, Corey

    2017-01-01

    A new search heuristic, Divided Neighborhood Exploration Search, designed to be used with inference algorithms such as Bayesian networks to improve on the reverse engineering of gene regulatory networks is presented. The approach systematically moves through the search space to find topologies representative of gene regulatory networks that are more likely to explain microarray data. In empirical testing it is demonstrated that the novel method is superior to the widely employed greedy search techniques in both the quality of the inferred networks and computational time. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Data free inference with processed data products

    DOE PAGES

    Chowdhary, K.; Najm, H. N.

    2014-07-12

    Here, we consider the context of probabilistic inference of model parameters given error bars or confidence intervals on model output values, when the data is unavailable. We introduce a class of algorithms in a Bayesian framework, relying on maximum entropy arguments and approximate Bayesian computation methods, to generate consistent data with the given summary statistics. Once we obtain consistent data sets, we pool the respective posteriors, to arrive at a single, averaged density on the parameters. This approach allows us to perform accurate forward uncertainty propagation consistent with the reported statistics.

  17. Static analysis of class invariants in Java programs

    NASA Astrophysics Data System (ADS)

    Bonilla-Quintero, Lidia Dionisia

    2011-12-01

    This paper presents a technique for the automatic inference of class invariants from Java bytecode. Class invariants are very important for both compiler optimization and as an aid to programmers in their efforts to reduce the number of software defects. We present the original DC-invariant analysis from Adam Webber, talk about its shortcomings and suggest several different ways to improve it. To apply the DC-invariant analysis to identify DC-invariant assertions, all that one needs is a monotonic method analysis function and a suitable assertion domain. The DC-invariant algorithm is very general; however, the method analysis can be highly tuned to the problem in hand. For example, one could choose shape analysis as the method analysis function and use the DC-invariant analysis to simply extend it to an analysis that would yield class-wide invariants describing the shapes of linked data structures. We have a prototype implementation: a system we refer to as "the analyzer" that infers DC-invariant unary and binary relations and provides them to the user in a human readable format. The analyzer uses those relations to identify unnecessary array bounds checks in Java programs and perform null-reference analysis. It uses Adam Webber's relational constraint technique for the class-invariant binary relations. Early results with the analyzer were very imprecise in the presence of "dirty-called" methods. A dirty-called method is one that is called, either directly or transitively, from any constructor of the class, or from any method of the class at a point at which a disciplined field has been altered. This result was unexpected and forced an extensive search for improved techniques. An important contribution of this paper is the suggestion of several ways to improve the results by changing the way dirty-called methods are handled. The new techniques expand the set of class invariants that can be inferred over Webber's original results. The technique that produces better results uses in-line analysis. Final results are promising: we can infer sound class invariants for full-scale, not just toy applications.

  18. TOM: a web-based integrated approach for identification of candidate disease genes.

    PubMed

    Rossi, Simona; Masotti, Daniele; Nardini, Christine; Bonora, Elena; Romeo, Giovanni; Macii, Enrico; Benini, Luca; Volinia, Stefano

    2006-07-01

    The massive production of biological data by means of highly parallel devices like microarrays for gene expression has paved the way to new possible approaches in molecular genetics. Among them the possibility of inferring biological answers by querying large amounts of expression data. Based on this principle, we present here TOM, a web-based resource for the efficient extraction of candidate genes for hereditary diseases. The service requires the previous knowledge of at least another gene responsible for the disease and the linkage area, or else of two disease associated genetic intervals. The algorithm uses the information stored in public resources, including mapping, expression and functional databases. Given the queries, TOM will select and list one or more candidate genes. This approach allows the geneticist to bypass the costly and time consuming tracing of genetic markers through entire families and might improve the chance of identifying disease genes, particularly for rare diseases. We present here the tool and the results obtained on known benchmark and on hereditary predisposition to familial thyroid cancer. Our algorithm is available at http://www-micrel.deis.unibo.it/~tom/.

  19. Tracking children's mental states while solving algebra equations.

    PubMed

    Anderson, John R; Betts, Shawn; Ferris, Jennifer L; Fincham, Jon M

    2012-11-01

    Behavioral and function magnetic resonance imagery (fMRI) data were combined to infer the mental states of students as they interacted with an intelligent tutoring system. Sixteen children interacted with a computer tutor for solving linear equations over a six-day period (days 0-5), with days 1 and 5 occurring in an fMRI scanner. Hidden Markov model algorithms combined a model of student behavior with multi-voxel imaging pattern data to predict the mental states of students. We separately assessed the algorithms' ability to predict which step in a problem-solving sequence was performed and whether the step was performed correctly. For day 1, the data patterns of other students were used to predict the mental states of a target student. These predictions were improved on day 5 by adding information about the target student's behavioral and imaging data from day 1. Successful tracking of mental states depended on using the combination of a behavioral model and multi-voxel pattern analysis, illustrating the effectiveness of an integrated approach to tracking the cognition of individuals in real time as they perform complex tasks. Copyright © 2011 Wiley Periodicals, Inc.

  20. IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics.

    PubMed

    Levitsky, Lev I; Ivanov, Mark V; Lobas, Anna A; Bubis, Julia A; Tarasova, Irina A; Solovyeva, Elizaveta M; Pridatchenko, Marina L; Gorshkov, Mikhail V

    2018-06-18

    We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel "autotune" feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms and allows customization of the search workflow for specialized applications.

Top